Content analysis is a systematic approach of partitioning word usage in
large data sources used in the field of communication (George,
1954, Hodson, 1999)
.
The main element of this approach is to quantify the occurrence of a
particular event or comment statistically to establish its the frequency of use.
For example, if the question was raised, “do women appear in historical
documents in Western Europe between the years 400 BC to AD 800?”
The task of reading every piece of literature over this period and
quantifying every occurrence is not feasible.
The methodology a content analysis allows a review of a representative
percentage of all literature to find the frequency of female references and
assume it is representative of the whole.
Ideally, a content analysis would include ALL documents, but this is not
always possible.
The benefit of this technique is that it attempts to remove the
researcher from the research and gives a relatively unbiased evaluation of any
topic being evaluated.
This is especially important when the question is an abstract concept
such as a “violent act” or an “insulting comment” or, in Agricultural
Records, “cold” or “pleasant” (Reinard,
1998)
.
The way the researcher is “removed” is through a series of steps that
are necessary to accomplishing a proper content analysis.
First, the area of communication is to be defined, such as the Agricultural
Records of England.
Second, the semantics must be categorized in a logical manner to allow
for a coding system of the word usage to be developed.
This must be selected in a fashion that will allow for all possible units
be covered with a minimal of units falling into an “other” or
“miscellaneous” category.
Next, the sample size must be defined and be large enough to cover the
entire topic.
This can be an arbitrary definition, but needs to attempt to cover the
majority of the documents.
Logically, the more complete the coverage, the more accurate the content
analysis.
The bulk of the analysis falls in the next category – coding the
message content.
This must be reproducible by an independent researcher with a margin of
error under 5% using only the coding system.
The Agricultural Records intercodal dependency was under 2% margin of
error, meaning an independent researcher performed the same analysis on the 1500
year period using the same text and coding system and was less than 2% different
in their results than this study.
Being that the intercodal dependency was below the 5% required, the next
step is to analyze the data in either numerically or graphically.
The final step is to interpret the results with the stern warning that a
content analysis cannot give cause-effect conclusions (George,
1954)
.