Central Limit Theorem
Choosing a discrete or continuous probability distribution for representing a
process depends on prior knowledge (however scanty) of the physical system
that we assume underlies the process. We have many choices available, of
course, and there is not room to discuss them here. However, the widespread
use of the normal distribution in statistical inference is due to the fact that the
sampling distribution of means tends to be normal. More precisely, the central limit theorem states that as the number of independent, identically distributed random variables with finite variance increases, the distribution
of their mean becomes increasingly normal. Furthermore, the variance of the
mean decreases proportionally to the sample size. We call the square root of the variance of the mean the standard error of the mean.
There are several things to try in this applet:
1) Notice how the sampling distribution of the mean (blue curve) compares to the population distribution (red curve). The population distribution has a larger standard deviation by a factor of &radicn.
2) Pick the arcsine distribution and set the left slider to 80 or 100. Notice how samples from an extremely non-normal distribution can have means that are fairly normally distributed when n is relatively large. If you have fine motor control, try setting the left slider almost at zero. You will see a sample from the arcsine distribution for n=1.
3) Pick the binomial distribution and run the left slider from 0 to 100. Notice how the histogram of means tends toward the normal when the slider is near 100. When n is relatively small (left slider set at 20) the distribution of means looks approximately normal even though it is a discrete distribution. The blue normal curve representing the sampling distribution of means is lower than the heights of the spikes for small n because there is a lot of white area between the spikes that is not part of the sample distribution.
4) Pick the exponential distribution. Notice how a skewed distribution like this produces a skewed distribution of sample means (when you set the left slider well below 20). It takes larger sample sizes to rub out skewness.
5) Set both sliders to the far right and run through the distributions at the top. Notice how small the standard error of the mean is for samples from the normal population. The Central Limit Theorem will help you the most if your data are normal to begin with.
6) If you want to see the two theoretical distributions without any sample data, just set the right slider to zero. You can then move the left slider to see how the sampling distribution of means changes with n. You will also see that the buttons at the top are ordered by the size of the variance of the sampling distribution of means (smallest on left, largest on right). What features of the population distributions do you think account for this ordering?
7) Try clicking repeatedly on the distribution buttons at the top of the display. Each click will generate a new sample, so you can get an idea of the variability across samples for a given sample size and number of samples.