|
Kvamme, Kenneth L. (1993). A Simple Pseudo-Random Normal Deviate Generator.
Archaeological Computing Newsletter 37:8-10.
(Copyright is retained by the author; download
PDF version, 138 KB)
More
about Archaeological Computing Newsletter (ACN)

 |
A SIMPLE PSEUDO-RANDOM
NORMAL DEVIATE GENERATOR |
Pseudo-random number generators (hereafter random number generators)
that are features of most programming languages, or even data base programs,
produce rectangular distributions. That is, a floating-point or real number
is produced in the 0-1 range such that each value has an equal probability
of occurrence. A graph of this distribution (Figure 1a) looks like a rectangle,
hence the above name, but it is more commonly referred to as the uniform
distribution or U{0,1} in short-hand notation where the zero and one represent
the lower and upper bounds, respectively, of its range.
Random number generators form the basis of most computer simulations
(e.g., see Hodder 1978). In the simulation of human or other biological
systems the uniform distribution is not ideal because frequently these
phenomena follow a normal distribution. Likewise, in statistical simulations
randomly generated normal deviates are frequently required because many
statistical tests are based on the assumption that the data follow normal
distributions. Sadly, archaeological simulations too often have incorrectly
utilized the uniform distribution when, in fact, the normal was called
for. It therefore is not only useful, but necessary, to possess an easy
means to generate normally distributed random deviates.

THE NORMAL DISTRIBUTION
The normal distribution follows a bell-shaped curve with observations
most probable about a characteristic central value, the arithmetic mean,
and probability of occurrence decreasing as one moves in either direction
away from the mean to the tail areas (Figures 1b,c). The normal is a two-parameter
distribution where M is the mean, about which the distribution is symmetric,
and V is the variance which determines the width of the distribution (compare
Figures 1b,c). In functional form the normal distribution is specified
by:
f(x) = exp[-(x-M)**2/(2.0*V)]/sqrt(2.0*3.14159*V)
(where exp is the exponential function, sqrt is the square root function,
* indicates multiply, **2 indicates square, and 3.14159 represents the
constant pi). The square root of the variance, V, is known as the standard
deviation, S=sqrt(V), which has special significance when working with
the normal. It is well-known, and illustrated in tables in any statistics
book, that approximately two-thirds of normally distributed data lie within
(+/-)1S of the mean; about 95% lie within 2S; and roughly 99% fall within
3S. Finally, we should make note of a special normal distribution, referred
to as the "standard normal," where M=0 and V=1 or, in short-hand
notation, the N{0,1}. If we obtain data following a standard normal, written
as x ~ N{0,1} (where x is the random variable and "~" means
"is distributed as"), then it is possible to transform the data
into any other normal distribution. For example, if we desire y ~ N{10,4}
(i.e., a normal with M=10, V=4), we simply multiply x by the standard
deviation and add the mean: y=x*sqrt(V)+M = x*2+10. The latter distribution
is, of course, much more variable than the former and is centered about
the value of 10 rather than zero (compare Figures 1b,c). This principle
will be used in the program below.
GENERATING RANDOM NORMAL DEVIATES
It is, in fact, rather easy to program a random normal generator that
is capable of outputting observations from any hypothetical normal population
(with any specified mean and variance). While there are more efficient
procedures for generating normally distributed deviates (e.g., see Ross
1989:486) the approach used here has the advantage of illustrating very
well the workings of statistical theory by making use of a fundamental
statistical principle known as the Central Limit Theorem (CLT). The CLT
basically states that the sum of n identically distributed random variables
tends toward normality as n grows large (e.g., see Hays 1988:232). So
if we take n U{0,1} random variables (each having an identical distribution),
and add these up, the resultant sum should be normally distributed if
n is sufficiently large. With many distributions, including the uniform,
n does not have to be great owing to the power of the CLT. In fact, one
can readily show that with n as small as three, the resultant sum of three
uniform variables portrays a distribution that is bell-like in form, if
not actually normally distributed. (In other words, sum three U{0,1} variables;
do this 1000 times and make a histogram: its form should be roughly bell-shaped.)
Statisticians would typically employ an n that is a good deal larger;
n=12 is not only sufficiently large for most work, it is convenient owing
to a property of the uniform distribution. The variance of a uniform random
variable is given by: V=(b-a)**2/12, where a is the lower and b is the
upper bound of the distribution's range (Olkin 1980:285). For a single
U{0,1}, then, the variance is V=(1-0)**2/12=1/12. Another well-known theorem
in statistics is: "the variance of a sum is equal to the sum of the
variances" (see Hays 1988:870). Consequently, if we sum twelve U{0,1}
random variables the resultant variance must equal unity. The CLT ensures
that this sum is normally distributed, but note that the range of the
result must be between 0 and 12 (zero occurs when all twelve U{0,1}=0
and "12" occurs when all the U{0,1}=1). Simply by subtracting
six from the sum we can center the distribution on zero, produce a data
range between -6 and +6, and achieve a standard normal deviate. Stated
differently,
[U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}+U{0,1}
+U{0,1}+U{0,1}+U{0,1}]-6 ~ N{0,1}
Since in a true normal population less than .0000001 lies beyond six
standard deviations of the mean, the truncated range should not pose significant
problems in most contexts. The aforementioned transformation y=x*sqrt(V)+M
can then be employed to produce any normal distribution with specified
mean and variance (N{M,V}).
|
EXAMPLE
CODE (FORTRAN 77)
SUBROUTINE
rannor (x,m,s)
|
Generates
a pseudo-random normal deviate (x) with mean (m) and standard
deviation (s) using a sum of uniform deviates tactic. Calls
system subroutine RANDOM, a psuedo-random uniform U{0,1} generator
with argument (uniform) that contains the uniform variate.
RANDOM must be initialized with a proper seed prior to the
first call of rannor.
input:
m,s - mean and standard deviation of normal
output: x - pseudo random normal deviate
|
REAL
x,m,s,uniform
INTEGER i
x = 0.0
DO i = 1,12
| |
CALL
RANDOM ( uniform )
x = uniform + x
|
ENDDO
x = s * ( x - 6.0 ) + m
END
|
REFERENCES
Hays, W.L.
1988 Statistics (4th ed.). Holt, Rinehart and Winston, New York.
Hodder, Ian (ed.)
1978 Simulation Studies in Archaeology. Cambridge University Press,
Cambridge.
Olkin, I., L.J. Gleser, and C. Derman
1980 Probability Models and Applications. Macmillan Publishing,
New York.
Ross, Sheldon M.
1989 Introduction to Probability Models (4th ed.). Academic Press,
Boston.

Kenneth L. Kvamme
|