GIS and Statistical Inference in Arizona: Monte Carlo
Significance Tests
Department of Anthropology &
Center for Advanced Spatial Technologies
University of Arkansas
Fayetteville, AR 72701 USA
Analyzing prehistoric locational behavior
Interest frequently lies in making statements concerning
possible
relationships between archaeological distributions and features
of a region's environment. The features of interest may reflect
the social environment as viewsheds, cognitive landscapes, or
perhaps distance or cost-surfaces to ceremonial or economic
centers. More commonly, archaeologists have analyzed
characteristics of the physical environment, including
topographic, landform, soils, hydrographic, and geological
features for statistical associations with archaeological
distributions.
Monte Carlo significance tests

In grid-based or raster GIS contexts we have the advantage
that
an entire spatial population (of rows x columns) can be encoded
in digital form. This circumstance allows an alternative to
traditional statistical testing through use of Monte Carlo
significance tests. Our interest typically lies in a sample of
archaeological locations, or a subset (S') composed of
n cases from the population. Additional k samples
of size n may also be selected at random from the same
population (S1, S2,
S3,..., Sk). For
each sample, including the sample of interest, a summary
statistic of interest, t, is computed yielding k+1
values (t', t1, t2,
t3,..., tk).
Assuming that each sample is a realizable and equally probable
outcome from the population, the statistical significance of
differences in the sample of interest relative to the population
may be estimated by ranking the values of t and computing
p=R(t')/(k+1). For example, if
k=999 and the rank of t' for the sample of interest
is R(t') <=50 or R(t') >=950, then
the significance of this outcome is either less than or equal to
.05 or greater than or equal to .95, respectively.

Example application

The region of interest is an area of east-central Arizona,
measuring 9 x 8 km (72 sq. km), encoded within a raster composed
of 100 x 100 m grid cells (for a finite population of N=7,200
locations). Within this region are n=30 multi-room
villages (pueblos) dating primarily from the 13-14th centuries.
Two variables are examined which have decidedly non-normal
distributions: slope and distance to nearest water. Focusing on
the slope data and the sample mean, m, as the statistic of
interest, the Monte Carlo test (with k=999) yields the
third smallest mean (R[m]=3), giving p
=.003. For the distance to water data, the Monte Carlo test
shows the archaeological sample mean to be the most extreme
(R[m]=1), yielding p =.001. These findings
provide convincing evidence that the archaeological samples are
unusually located with respect to these variables. The
archaeological interpretation is that the prehistoric inhabitants
selected locations with level ground and close to water for their
settlements.

Advantages of Monte Carlo significance tests
Use of a Monte Carlo test for statistical inference can be
advantageous in certain contexts, and may even provide a greater
degree of freedom and flexibility when compared with limitations
imposed by conventional statistical tests.
- In the simplest case, a randomization test might be employed
when we have a random sample of archaeological sites from a
region, but cannot meet the assumptions of a traditional
statistical test (e.g., normality).
- Alternatively, we might be dealing with a complex variable
such as viewshed. Here, the population would consist of all
possible (rows x columns) viewsheds, making the determination of
population parameters computationally difficult even for
moderately sized regions. A randomization approach requiring
computation of only k+1 viewsheds presents a more
tractable problem.
- In many contexts we are faced with the common problem of not
having a random sample of archaeological sites. In this case,
randomization methods allow comparison of the unusualness of the
realized sample (in terms of t) against k other
samples drawn from the same region.
- Finally, randomization techniques potentially allow the
examination of other sampling models and provide some freedom in
the face of autocorrelation problems. For example, we may not be
able to assume that archaeological sites are independently
placed, but that there is some spatial dependency between them.
In other words, it might be more appropriate to assume that the
placement of a site is partially dependent on the locations of
preexisting sites. If the dependency rules can be specified then
it may be possible to compare a realized sample against k
others obtained under the same sampling criteria.
Comparison with conventional statistical tests
A one-sample parametric test for means compares the
archaeological
sample mean, m, computed from the n sites, against
the population mean divided by the standard error, using as a
referent the standard normal distribution. A benefit of GIS is
that the entire raster of N = r x c (rows x columns) defines a
finite population from which we can easily compute the population
parameters. Although this test assumes a normally distributed
population, because the statistic m is based on a sum the
forces behind the Central Limit Theorem insure that regardless of
the population's distributional form, the sampling distribution
of m will approximate normality if n is large.
By way of contrast we might also consider a parametric test of
variance which depends heavily on a normality assumption. It
makes great sense to perform a variance test in regional
archaeological location studies. If we assume that past peoples
were selecting for particular contexts at which to place their
activities or settlements -- places that were advantageous in
terms of view quality, shelter, soil quality, or access to water
or other resources, for example -- then such contexts would
represent a small subset (or niche) compared with the entire
range possible in the environment. Consequently, archaeological
samples should yield relatively small variance statistics. In
this case we may compute (n-1) times the ratio of the
sample to population variance, which is distributed as chi-square
with n-1 degrees of freedom, but only when the population
is normally distributed.
The Arizona data exhibit strongly non-normal populations.
Although the parametric means test is robust against departures
from normality, comparison between the empirical sampling
distribution of z generated by the 999 Monte Carlo runs
with the theoretical distribution reveals that problems exist,
particularly in the tail areas where the data reveal
z-scores as great as +5, an unlikely circumstance in true
normal populations.
The theoretical and empirical chi-square distributions reveal
great divergence, clearly showing the weakness of the
conventional test when applied to non-normal data. For example,
according to theory about 10% of a chi-square distribution with
29 df should fall below a computed value of 19.8; the empirical
results reveal that about 27% of the samples fall below this
value. We can therefore infer that in some contexts it is quite
likely that strongly different conclusions could arise between
conventional and Monte Carlo tests, and that the former could be
in error owing to a failure to meet required assumptions.

References
- Randomization Methods for Statistical Inference in Raster GIS
Contexts. In The Colloquia of the XIII International Congress
of Prehistoric and Protohistoric Sciences, Vol. 1: Theoretical
and Methodological Problems. A. Bietti. A. Cazzella, I.
Johnson, and A. Voorrips, eds., pp. 107-114, ABACO, Forli, Italy
(1996).
(last updated: 2/99)