Kvamme, Kenneth L. (1993). A Simple Pseudo-Random Normal Deviate Generator. Archaeological Computing Newsletter 37:8-10.

(Copyright is retained by the author; download PDF version, 138 KB)

More about Archaeological Computing Newsletter (ACN)

A SIMPLE PSEUDO-RANDOM
NORMAL DEVIATE GENERATOR

Pseudo-random number generators (hereafter random number generators) that are features of most programming languages, or even data base programs, produce rectangular distributions. That is, a floating-point or real number is produced in the 0-1 range such that each value has an equal probability of occurrence. A graph of this distribution (Figure 1a) looks like a rectangle, hence the above name, but it is more commonly referred to as the uniform distribution or U{0,1} in short-hand notation where the zero and one represent the lower and upper bounds, respectively, of its range.

Random number generators form the basis of most computer simulations (e.g., see Hodder 1978). In the simulation of human or other biological systems the uniform distribution is not ideal because frequently these phenomena follow a normal distribution. Likewise, in statistical simulations randomly generated normal deviates are frequently required because many statistical tests are based on the assumption that the data follow normal distributions. Sadly, archaeological simulations too often have incorrectly utilized the uniform distribution when, in fact, the normal was called for. It therefore is not only useful, but necessary, to possess an easy means to generate normally distributed random deviates.


THE NORMAL DISTRIBUTION

The normal distribution follows a bell-shaped curve with observations most probable about a characteristic central value, the arithmetic mean, and probability of occurrence decreasing as one moves in either direction away from the mean to the tail areas (Figures 1b,c). The normal is a two-parameter distribution where M is the mean, about which the distribution is symmetric, and V is the variance which determines the width of the distribution (compare Figures 1b,c). In functional form the normal distribution is specified by:

f(x) = exp[-(x-M)**2/(2.0*V)]/sqrt(2.0*3.14159*V)

(where exp is the exponential function, sqrt is the square root function, * indicates multiply, **2 indicates square, and 3.14159 represents the constant pi). The square root of the variance, V, is known as the standard deviation, S=sqrt(V), which has special significance when working with the normal. It is well-known, and illustrated in tables in any statistics book, that approximately two-thirds of normally distributed data lie within (+/-)1S of the mean; about 95% lie within 2S; and roughly 99% fall within 3S. Finally, we should make note of a special normal distribution, referred to as the "standard normal," where M=0 and V=1 or, in short-hand notation, the N{0,1}. If we obtain data following a standard normal, written as x ~ N{0,1} (where x is the random variable and "~" means "is distributed as"), then it is possible to transform the data into any other normal distribution. For example, if we desire y ~ N{10,4} (i.e., a normal with M=10, V=4), we simply multiply x by the standard deviation and add the mean: y=x*sqrt(V)+M = x*2+10. The latter distribution is, of course, much more variable than the former and is centered about the value of 10 rather than zero (compare Figures 1b,c). This principle will be used in the program below.


GENERATING RANDOM NORMAL DEVIATES

It is, in fact, rather easy to program a random normal generator that is capable of outputting observations from any hypothetical normal population (with any specified mean and variance). While there are more efficient procedures for generating normally distributed deviates (e.g., see Ross 1989:486) the approach used here has the advantage of illustrating very well the workings of statistical theory by making use of a fundamental statistical principle known as the Central Limit Theorem (CLT). The CLT basically states that the sum of n identically distributed random variables tends toward normality as n grows large (e.g., see Hays 1988:232). So if we take n U{0,1} random variables (each having an identical distribution), and add these up, the resultant sum should be normally distributed if n is sufficiently large. With many distributions, including the uniform, n does not have to be great owing to the power of the CLT. In fact, one can readily show that with n as small as three, the resultant sum of three uniform variables portrays a distribution that is bell-like in form, if not actually normally distributed. (In other words, sum three U{0,1} variables; do this 1000 times and make a histogram: its form should be roughly bell-shaped.)

Statisticians would typically employ an n that is a good deal larger; n=12 is not only sufficiently large for most work, it is convenient owing to a property of the uniform distribution. The variance of a uniform random variable is given by: V=(b-a)**2/12, where a is the lower and b is the upper bound of the distribution's range (Olkin 1980:285). For a single U{0,1}, then, the variance is V=(1-0)**2/12=1/12. Another well-known theorem in statistics is: "the variance of a sum is equal to the sum of the variances" (see Hays 1988:870). Consequently, if we sum twelve U{0,1} random variables the resultant variance must equal unity. The CLT ensures that this sum is normally distributed, but note that the range of the result must be between 0 and 12 (zero occurs when all twelve U{0,1}=0 and "12" occurs when all the U{0,1}=1). Simply by subtracting six from the sum we can center the distribution on zero, produce a data range between -6 and +6, and achieve a standard normal deviate. Stated differently,

[U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1} +U{0,1}+U{0,1}+U{0,1}]-6 ~ N{0,1}

Since in a true normal population less than .0000001 lies beyond six standard deviations of the mean, the truncated range should not pose significant problems in most contexts. The aforementioned transformation y=x*sqrt(V)+M can then be employed to produce any normal distribution with specified mean and variance (N{M,V}).

EXAMPLE CODE (FORTRAN 77)

SUBROUTINE rannor (x,m,s)

Generates a pseudo-random normal deviate (x) with mean (m) and standard deviation (s) using a sum of uniform deviates tactic. Calls system subroutine RANDOM, a psuedo-random uniform U{0,1} generator with argument (uniform) that contains the uniform variate. RANDOM must be initialized with a proper seed prior to the first call of rannor.

input: m,s - mean and standard deviation of normal
output: x - pseudo random normal deviate

REAL x,m,s,uniform
INTEGER i
x = 0.0
DO i = 1,12

  CALL RANDOM ( uniform )
  x = uniform + x

ENDDO
x = s * ( x - 6.0 ) + m
END

 

REFERENCES

Hays, W.L.
1988 Statistics (4th ed.). Holt, Rinehart and Winston, New York.

Hodder, Ian (ed.)
1978 Simulation Studies in Archaeology. Cambridge University Press, Cambridge.

Olkin, I., L.J. Gleser, and C. Derman
1980 Probability Models and Applications. Macmillan Publishing, New York.

Ross, Sheldon M.
1989 Introduction to Probability Models (4th ed.). Academic Press, Boston.


Kenneth L. Kvamme