Basic Sample Statistics and Parameters
|Course:||MA121: Introduction to Statistics|
|Book:||Basic Sample Statistics and Parameters|
|Printed by:||Guest user|
|Date:||Saturday, June 10, 2023, 2:06 PM|
First, we'll discuss the basic concepts of sample statistics and population parameters. Then, we'll talk about the degree of freedom, which is the number of independent pieces of information that a point estimate is based on. Finally, we will talk about variance, which depends on the degrees of freedom.
Introduction to Estimation
- Define statistic
- Define parameter
- Define point estimate
- Define interval estimate
- Define margin of error
One of the major applications of statistics is estimating population parameters from sample statistics. For example, a poll may seek to estimate the proportion of adult residents of a city that support a proposition to build a new sports stadium. Out of a random sample of 200 people, 106 say they support the proposition. Thus in the sample, 0.53 of the people supported the proposition. This value of 0.53 is called a point estimate of the population proportion. It is called a point estimate because the estimate consists of a single value or point.
Point estimates are usually supplemented by interval estimates called confidence intervals. Confidence intervals are intervals constructed using a method that contains the population parameter a specified proportion of the time. For example, if the pollster used a method that contains the parameter 95% of the time it is used, he or she would arrive at the following 95% confidence interval: 0.46 < π < 0.60. The pollster would then conclude that somewhere between 0.46 and 0.60 of the population supports the proposal. The media usually reports this type of result by saying that 53% favor the proposition with a margin of error of 7%.
In an experiment on memory for chess positions, the mean recall for tournament players was 63.8 and the mean for non-players was 33.1. Therefore a point estimate of the difference between population means is 30.7. The 95% confidence interval on the difference between means extends from 19.05 to 42.35. You will see how to compute this kind of interval in another section.
Source: David M. Lane , https://onlinestatbook.com/2/estimation/intro.html
This work is in the Public Domain.
- A parameter is a value calculated in a population. A statistic is a value computed in a sample to estimate a parameter.
- The proportion of 0.63 is a statistic and point estimate because it is the proportion obtained from the sample and an estimate of the population proportion.
Degrees of Freedom
- Define degrees of freedom
- Estimate the variance from a sample of 1 if the population mean is known
- State why deviations from the sample mean are not independent
- State the general formula for degrees of freedom in terms of the number of values and the number of estimated parameters
Some estimates are based on more information than others. For example, an estimate of the variance based on a sample size of 100 is based on more information than an estimate of the variance based on a sample size of 5. The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based.
As an example, let's say that we know that the mean height of Martians is 6 and wish to estimate the variance of their heights. We randomly sample one Martian and find that its height is 8. Recall that the variance is defined as the mean squared deviation of the values from their population mean. We can compute the squared deviation of our value of 8 from the population mean of 6 to find a single squared deviation from the mean. This single squared deviation from the mean, (8-6)2 = 4, is an estimate of the mean squared deviation for all Martians. Therefore, based on this sample of one, we would estimate that the population variance is 4. This estimate is based on a single piece of information and therefore has 1 df. If we sampled another Martian and obtained a height of 5, then we could compute a second estimate of the variance, (5-6)2 = 1. We could then average our two estimates (4 and 1) to obtain an estimate of 2.5. Since this estimate is based on two independent pieces of information, it has two degrees of freedom. The two estimates are independent because they are based on two independently and randomly selected Martians. The estimates would not be independent if after sampling one Martian, we decided to choose its brother as our second Martian.
As you are probably thinking, it is pretty rare that we know the population mean when we are estimating the variance. Instead, we have to first estimate the population mean ( with the sample mean (). The process of estimating the mean affects our degrees of freedom as shown below.
Returning to our problem of estimating the variance in Martian heights, let's assume we do not know the population mean and therefore we have to estimate it from the sample. We have sampled two Martians and found that their heights are 8 and 5. Therefore , our estimate of the population mean, is
We can now compute two estimates of variance:
Estimate 1 = (8-6.5)2 = 2.25
Estimate 2 = (5-6.5)2 = 2.25
Now for the key question: Are these two estimates independent? The answer is no because each height contributed to the calculation of . Since the first Martian's height of 8 influenced , it also influenced Estimate 2. If the first height had been, for example, 10, then M would have been 7.5 and Estimate 2 would have been (5-7.5)2 = 6.25 instead of 2.25. The important point is that the two estimates are not independent and therefore we do not have two degrees of freedom. Another way to think about the non-independence is to consider that if you knew the mean and one of the scores, you would know the other score. For example, if one score is 5 and the mean is 6.5, you can compute that the total of the two scores is 13 and therefore that the other score must be 13-5 = 8.
In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. In the Martians example, there are two values (8 and 5) and we had to estimate one parameter () on the way to estimating the parameter of interest (). Therefore, the estimate of variance has 2 - 1 = 1 degree of freedom. If we had sampled 12 Martians, then our estimate of variance would have had 11 degrees of freedom. Therefore, the degrees of freedom of an estimate of variance is equal to , where is the number of observations.
Recall from the section on variability that the formula for estimating the variance in a sample is:
The denominator of this formula is the degrees of freedom.
- There are 10 independent pieces of information, so there are 10 degrees of freedom.
- The degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. You have 15 values in your sample, and you need to estimate one parameter, the mean, in order to find the standard deviation. 15 - 1 = 14.
- 2 degrees of freedom gives the least information. It had the smallest sample used to compute the statistic and is therefore the most likely to be a poor representation of the population parameter.