Confidence Intervals for the Mean
This section explains the need for confidence intervals and why a confidence interval is not the probability the interval contains the parameter. Then, it discusses how to compute a confidence interval on the mean when sigma is unknown and needs to be estimated. It also explains when to use t-distribution or a normal distribution. Next, it covers the difference between the shape of the t distribution and the normal distribution and how this difference is affected by degrees of freedom. Finally, it explains the procedure to compute a confidence interval on the difference between means.
- State the difference between the shape of the distribution and the normal distribution
- State how the difference between the shape of the distribution and normal distribution is affected by the degrees of freedom
- Use a table to find the value of to use in a confidence interval
- Use the calculator to find the value of to use in a confidence interval
In the introduction to normal distributions it was shown that 95% of the area of a normal distribution is within 1.96 standard deviations of the mean. Therefore, if you randomly sampled a value from a normal distribution with a mean of 100, the probability it would be within of 100 is 0.95. Similarly, if you sample values from the population, the probability that the sample mean ( ) will be within of 100 is 0.95.
Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample values and compute the sample mean ( ) and estimate the standard error of the mean ( with . What is the probability that will be within of the population mean ( )? This is a difficult problem because there are two ways in which could be more than from : (1) could, by chance, be either very high or very low and (2) could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated). But exactly how much smaller? Fortunately, the way to work out this type of problem was solved in the early 20th century by W. S. Gosset who determined the distribution of a mean divided by an estimate of its standard error. This distribution is called the Student's distribution or sometimes just the distribution. Gosset worked out the distribution and associated statistical tests while working for a brewery in Ireland. Because of a contractual agreement with the brewery, he published the article under the pseudonym "Student". That is why the test is called the "Student's test".
The distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Figure 1 shows t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails. The t distribution is therefore leptokurtic. The distribution approaches the normal distribution as the degrees of freedom increase.
Figure 1. A comparison of distributions with 2, 4, and 10 and the standard normal distribution. The distribution with the lowest peak is the 2 distribution, the next lowest is 4 , the lowest after that is 10 , and the highest is the standard normal distribution.
Since the t distribution is leptokurtic, the percentage of the distribution within 1.96 standard deviations of the mean is less than the 95% for the normal distribution. Table 1 shows the number of standard deviations from the mean required to contain 95% and 99% of the area of the t distribution for various degrees of freedom. These are the values of that you use in a confidence interval. The corresponding values for the normal distribution are 1.96 and 2.58 respectively. Notice that with few degrees of freedom, the values of t are much higher than the corresponding values for a normal distribution and that the difference decreases as the degrees of freedom increase. The values in Table 1 can be obtained from the "Find t for a confidence interval" calculator.
Table 1. Abbreviated t table.
Returning to the problem posed at the beginning of this section, suppose you sampled 9 values from a normal population and estimated the standard error of the mean ( ) with . What is the probability that would be within of ? Since the sample size is 9, there are . From Table 1 you can see that with 8 the probability is 0.95 that the mean will be within 2.306 of . The probability that it will be within 1.96 of is therefore lower than 0.95.
As shown in Figure 2, the "t distribution" calculator can be used to find that 0.086 of the area of a distribution is more than 1.96 standard deviations from the mean, so the probability that would be less than from is 1 - 0.086 = 0.914.
As expected, this probability is less than 0.95 that would have been obtained if had been known instead of estimated.