GIS and Statistical Inference in Arizona: Monte Carlo Significance Tests

by Kenneth L. Kvamme

Department of Anthropology &
Center for Advanced Spatial Technologies
University of Arkansas
Fayetteville, AR 72701 USA


Analyzing prehistoric locational behavior

Interest frequently lies in making statements concerning possible relationships between archaeological distributions and features of a region's environment. The features of interest may reflect the social environment as viewsheds, cognitive landscapes, or perhaps distance or cost-surfaces to ceremonial or economic centers. More commonly, archaeologists have analyzed characteristics of the physical environment, including topographic, landform, soils, hydrographic, and geological features for statistical associations with archaeological distributions.


Monte Carlo significance tests

In grid-based or raster GIS contexts we have the advantage that an entire spatial population (of rows x columns) can be encoded in digital form. This circumstance allows an alternative to traditional statistical testing through use of Monte Carlo significance tests. Our interest typically lies in a sample of archaeological locations, or a subset (S') composed of n cases from the population. Additional k samples of size n may also be selected at random from the same population (S1, S2, S3,..., Sk). For each sample, including the sample of interest, a summary statistic of interest, t, is computed yielding k+1 values (t', t1, t2, t3,..., tk). Assuming that each sample is a realizable and equally probable outcome from the population, the statistical significance of differences in the sample of interest relative to the population may be estimated by ranking the values of t and computing p=R(t')/(k+1). For example, if k=999 and the rank of t' for the sample of interest is R(t') <=50 or R(t') >=950, then the significance of this outcome is either less than or equal to .05 or greater than or equal to .95, respectively.


Example application

The region of interest is an area of east-central Arizona, measuring 9 x 8 km (72 sq. km), encoded within a raster composed of 100 x 100 m grid cells (for a finite population of N=7,200 locations). Within this region are n=30 multi-room villages (pueblos) dating primarily from the 13-14th centuries. Two variables are examined which have decidedly non-normal distributions: slope and distance to nearest water. Focusing on the slope data and the sample mean, m, as the statistic of interest, the Monte Carlo test (with k=999) yields the third smallest mean (R[m]=3), giving p =.003. For the distance to water data, the Monte Carlo test shows the archaeological sample mean to be the most extreme (R[m]=1), yielding p =.001. These findings provide convincing evidence that the archaeological samples are unusually located with respect to these variables. The archaeological interpretation is that the prehistoric inhabitants selected locations with level ground and close to water for their settlements.


Advantages of Monte Carlo significance tests

Use of a Monte Carlo test for statistical inference can be advantageous in certain contexts, and may even provide a greater degree of freedom and flexibility when compared with limitations imposed by conventional statistical tests.

  1. In the simplest case, a randomization test might be employed when we have a random sample of archaeological sites from a region, but cannot meet the assumptions of a traditional statistical test (e.g., normality).

  2. Alternatively, we might be dealing with a complex variable such as viewshed. Here, the population would consist of all possible (rows x columns) viewsheds, making the determination of population parameters computationally difficult even for moderately sized regions. A randomization approach requiring computation of only k+1 viewsheds presents a more tractable problem.

  3. In many contexts we are faced with the common problem of not having a random sample of archaeological sites. In this case, randomization methods allow comparison of the unusualness of the realized sample (in terms of t) against k other samples drawn from the same region.

  4. Finally, randomization techniques potentially allow the examination of other sampling models and provide some freedom in the face of autocorrelation problems. For example, we may not be able to assume that archaeological sites are independently placed, but that there is some spatial dependency between them. In other words, it might be more appropriate to assume that the placement of a site is partially dependent on the locations of preexisting sites. If the dependency rules can be specified then it may be possible to compare a realized sample against k others obtained under the same sampling criteria.


Comparison with conventional statistical tests

A one-sample parametric test for means compares the archaeological sample mean, m, computed from the n sites, against the population mean divided by the standard error, using as a referent the standard normal distribution. A benefit of GIS is that the entire raster of N = r x c (rows x columns) defines a finite population from which we can easily compute the population parameters. Although this test assumes a normally distributed population, because the statistic m is based on a sum the forces behind the Central Limit Theorem insure that regardless of the population's distributional form, the sampling distribution of m will approximate normality if n is large.

By way of contrast we might also consider a parametric test of variance which depends heavily on a normality assumption. It makes great sense to perform a variance test in regional archaeological location studies. If we assume that past peoples were selecting for particular contexts at which to place their activities or settlements -- places that were advantageous in terms of view quality, shelter, soil quality, or access to water or other resources, for example -- then such contexts would represent a small subset (or niche) compared with the entire range possible in the environment. Consequently, archaeological samples should yield relatively small variance statistics. In this case we may compute (n-1) times the ratio of the sample to population variance, which is distributed as chi-square with n-1 degrees of freedom, but only when the population is normally distributed.

The Arizona data exhibit strongly non-normal populations. Although the parametric means test is robust against departures from normality, comparison between the empirical sampling distribution of z generated by the 999 Monte Carlo runs with the theoretical distribution reveals that problems exist, particularly in the tail areas where the data reveal z-scores as great as +5, an unlikely circumstance in true normal populations.

The theoretical and empirical chi-square distributions reveal great divergence, clearly showing the weakness of the conventional test when applied to non-normal data. For example, according to theory about 10% of a chi-square distribution with 29 df should fall below a computed value of 19.8; the empirical results reveal that about 27% of the samples fall below this value. We can therefore infer that in some contexts it is quite likely that strongly different conclusions could arise between conventional and Monte Carlo tests, and that the former could be in error owing to a failure to meet required assumptions.


References

BACK TO PROJECTS

(last updated: 2/99)