Read this chapter, which introduces you to the three major uses of the chi-squared distribution: the goodness-of-fit test, the test of independence, and the test of a single variance. Attempt the practice problems and homework at the end of the chapter.
Test of Independence
Tests of independence involve using a contingency table of observed (data) values.
The test statistic for a test of independence is similar to that of a goodness-of-fit test:
where:
- O = observed values
- E = expected values
- i = the number of rows in the table
- j = the number of columns in the table
There are terms of the form
.
A test of independence determines whether two factors are independent or not. You first encountered the term independence in 3.2 Independent and Mutually Exclusive Events earlier. As a review, consider the following example.
Note
The expected value inside each cell needs to be at least five in order for you to use this test.
Example 11.8
Suppose A = a speeding violation in the last year and B = a cell phone user while driving. If A and B are independent then P(A ∩ B) = P(A)P(B). A ∩ B is the event that a driver received a speeding violation last year and also used a cell phone while driving. Suppose, in a study of drivers who received speeding violations in the last year, and who used cell phone while driving, that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 used cell phones while driving and 450 did not.
Let y = expected number of drivers who used a cell phone while driving and received speeding violations.
If A and B are independent, then P(A ∩ B) = P(A)P(B). By substitution,
Solve for y:
About 28 people from the sample are expected to use cell phones while driving and to receive speeding violations.
In a test of independence, we state the null and alternative hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternative hypothesis states that they are not independent (dependent). If we do a test of independence using the example, then the null hypothesis is:
H0: Being a cell phone user while driving and receiving a speeding violation are independent events; in other words, they have no effect on each other.
If the null hypothesis were true, we would expect about 28 people to use cell phones while driving and to receive a speeding violation.
The test of independence is always right-tailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chi-square curve, as it is in a goodness-of-fit.
The number of degrees of freedom for the test of independence is:
df = (number of columns - 1)(number of rows - 1)
The following formula calculates the expected number (E):
Try It 11.8
A sample of 300 students is taken. Of the students surveyed, 50 were music students, while 250 were not. Ninety-seven of the 300 surveyed were on the honor roll, while 203 were not. If we assume being a music student and being on the honor roll are independent events, what is the expected number of music students who are also on the honor roll?
Example 11.9
A volunteer group, provides from one to nine hours each week with disabled senior citizens. The program recruits among community college students, four-year college students, and nonstudents. In Table 11.14 is a sample of the adult volunteers and the number of hours they volunteer per week.
Type of volunteer | 1–3 Hours | 4–6 Hours | 7–9 Hours | Row total |
---|---|---|---|---|
Community college students | 111 | 96 | 48 | 255 |
Four-year college students | 96 | 133 | 61 | 290 |
Nonstudents | 91 | 150 | 53 | 294 |
Column total | 298 | 379 | 162 | 839 |
Is the number of hours volunteered independent of the type of volunteer?
H0: The number of hours volunteered is independent of the type of volunteer.
Ha: The number of hours volunteered is dependent on the type of volunteer.
Type of volunteer | 1-3 Hours | 4-6 Hours | 7-9 Hours |
---|---|---|---|
Community college students | 90.57 | 115.19 | 49.24 |
Four-year college students | 103.00 | 131.00 | 56.00 |
Nonstudents | 104.42 | 132.81 | 56.77 |
Table 11.15 Number of Hours Worked Per Week by Volunteer Type (Expected) The table contains expected (E) values (data).

Make a decision: Because the calculated test statistic is in the tail we cannot accept H0. This means that the factors are not independent.
Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another.
Try It 11.9
Industry sector | 2000 | 2010 | 2020 | Total |
---|---|---|---|---|
Nonagriculture wage and salary | 13,243 | 13,044 | 15,018 | 41,305 |
Goods-producing, excluding agriculture | 2,457 | 1,771 | 1,950 | 6,178 |
Services-providing | 10,786 | 11,273 | 13,068 | 35,127 |
Agriculture, forestry, fishing, and hunting | 240 | 214 | 201 | 655 |
Nonagriculture self-employed and unpaid family worker | 931 | 894 | 972 | 2,797 |
Secondary wage and salary jobs in agriculture and private household industries | 14 | 11 | 11 | 36 |
Secondary jobs as a self-employed or unpaid family worker | 196 | 144 | 152 | 492 |
Total | 27,867 | 27,351 | 31,372 | 86,590 |
Table 11.16
Example 11.10
Need to succeed in school | High anxiety |
Med-high anxiety |
Medium anxiety |
Med-low anxiety |
Low anxiety |
Row total |
---|---|---|---|---|---|---|
High need | 35 | 42 | 53 | 15 | 10 | 155 |
Medium need | 18 | 48 | 63 | 33 | 31 | 193 |
Low need | 4 | 5 | 11 | 15 | 17 | 52 |
Column total | 57 | 95 | 127 | 63 | 58 | 400 |
Table 11.17 Need to Succeed in School vs. Anxiety Level
a. The column total for a high anxiety level is 57. The row total for high need to succeed in school is 155. The sample size or total surveyed is 400.
Problem
b. If the two variables are independent, how many students do you expect to have a low need to succeed in school and a med-low level of anxiety?