Read this chapter and complete the questions at the end of each section. While these sections are optional, studying ANOVA may help you if you are interested in taking the Saylor Direct Credit exam for this course.
Unequal Sample Sizes
Learning Objectives
- State why unequal
can be a problem
- Define confounding
- Compute weighted and unweighted means
- Distinguish between Type I and Type III sums of squares
- Describe why the cause of the unequal sample sizes makes a difference in the interpretation
The Problem of Confounding
Whether by design, accident, or necessity, the number of subjects in each of the conditions in an experiment may not be equal. For example, the sample sizes for the "Bias Against Associates of the Obese" case study are shown in Table 1. Although the sample sizes were approximately equal, the "Acquaintance Typical" condition had the most subjects. Since is used to refer to the sample size of an individual group, designs with unequal sample sizes are sometimes referred to as designs with unequal
.
Table 1. Sample Sizes for "Bias Against Associates of the Obese" Study.
Companion Weight | |||
---|---|---|---|
Obese | Typical | ||
Relationship | Girlfriend | 40 | 42 |
Acquaintance | 40 | 54 |
|
Exercise | ||
---|---|---|---|
Moderate | None | ||
Diet | Low Fat | 5 | 0 |
High Fat | 0 | 5 |
|
Exercise | |||
---|---|---|---|---|
Moderate | None | Mean | ||
Diet | Low Fat | -20 | |
-25 |
-25 | ||||
-30 | ||||
-35 | ||||
-15 | ||||
High Fat | |
-20 | -5 | |
6 | ||||
-10 | ||||
-6 | ||||
5 | ||||
|
Mean | -25 | -5 | -15 |
Weighted and Unweighted Means
The difference between weighted and unweighted means is a difference critical for understanding how to deal with the confounding resulting from unequalWeighted and unweighted means will be explained using the data shown in Table 4. Here, Diet and Exercise are confounded because 80% of the subjects in the low-fat condition exercised as compared to 20% of those in the high-fat condition. However, there is not complete confounding as there was with the data in Table 3.
|
Exercise | ||||
---|---|---|---|---|---|
Moderate | None | Weighted Mean | Unweighted Mean | ||
Diet | Low Fat | -20 | -20 | -26 | -23.750 |
-25 | |||||
-30 | |||||
-35 | |||||
M=-27.5 | M=-20.0 | ||||
High Fat | -15 | 6 | -4 | -8.125 | |
-6 | |||||
5 | |||||
-10 | |||||
M=-15.0 | M=-1.25 | ||||
|
Weighted Mean | -25 | -5 |
|
|
Unweighted Mean | -21.25 | -10.625 |
Statistical analysis programs use different terms for means that are computed controlling for other effects. SPSS calls them estimated marginal means, whereas SAS and SAS JMP call them least squares means.
Types of Sums of Squares
The section on Multi-Factor ANOVA stated that when there are unequal sample sizes, the sum of squares total is not equal to the sum of the sums of squares for all the other sources of variation. This is because the confounded sums of squares are not apportioned to any source of variation. For the data in Table 4, the sum of squares for Diet is 390.625, the sum of squares for Exercise is 180.625, and the sum of squares confounded between these two factors is 819.375 (the calculation of this value is beyond the scope of this introductory text). In the ANOVA Summary Table shown in Table 5, this large portion of the sums of squares is not apportioned to any source of variation and represents the "missing" sums of squares. That is, if you add up the sums of squares for Diet, Exercise, D x E, and Error, you get 902.625. If you add the confounded sum of squares of 819.375 to this value, you get the total sum of squares of 1722.000. When confounded sums of squares are not apportioned to any source of variation, the sums of squares are called Type III sums of squares. Type III sums of squares are, by far, the most common and if sums of squares are not otherwise labeled, it can safely be assumed that they are Type III.Source | df | SSQ | MS | F | p |
---|---|---|---|---|---|
Diet | 1 | 390.625 | 390.625 | 7.42 | 0.034 |
Exercise | 1 | 180.625 | 180.625 | 3.43 | 0.113 |
D x E | 1 | 15.625 | 15.625 | 0.30 | 0.605 |
Error | 6 | 315.750 | 52.625 |
|
|
Total | 9 | 1722.000 |
|
|
|
Source | df | SSQ | MS | F | p |
---|---|---|---|---|---|
Diet | 1 | 1210.000 | 1210.000 | 22.99 | 0.003 |
Exercise | 1 | 180.625 | 180.625 | 3.43 | 0.113 |
D x E | 1 | 15.625 | 15.625 | 0.30 | 0.605 |
Error | 6 | 315.750 | 52.625 |
|
|
Total | 9 | 1722.000 |
|
|
|
Which Type of Sums of Squares to Use (optional)
Type I sums of squares allow the variance confounded between two main effects to be apportioned to one of the main effects. Unless there is a strong argument for how the confounded variance should be apportioned (which is rarely, if ever, the case), Type I sums of squares are not recommended.There is not a consensus about whether Type II or Type III sums of squares is to be preferred. On the one hand, if there is no interaction, then Type II sums of squares will be more powerful for two reasons: (1) variance confounded between the main effect and interaction is properly assigned to the main effect and (2) weighting the means by sample sizes gives better estimates of the effects. To take advantage of the greater power of Type II sums of squares, some have suggested that if the interaction is not significant, then Type II sums of squares should be used. Maxwell and Delaney (2003) caution that such an approach could result in a Type II error in the test of the interaction. That is, it could lead to the conclusion that there is no interaction in the population when there really is one. This, in turn, would increase the Type I error rate for the test of the main effect. As a result, their general recommendation is to use Type III sums of squares.
Maxwell and Delaney (2003) recognized that some researchers prefer Type II sums of squares when there are strong theoretical reasons to suspect a lack of interaction and the p value is much higher than the typical α level of 0.05. However, this argument for the use of Type II sums of squares is not entirely convincing. As Tukey (1991) and others have argued, it is doubtful that any effect, whether a main effect or an interaction, is exactly 0 in the population. Incidentally, Tukey argued that the role of significance testing is to determine whether a confident conclusion can be made about the direction of an effect, not simply to conclude that an effect is not exactly 0.
Finally, if one assumes that there is no interaction, then an ANOVA model with no interaction term should be used rather than Type II sums of squares in a model that includes an interaction term. (Models without interaction terms are not covered in this book).
There are situations in which Type II sums of squares are justified even if there is strong interaction. This is the case because the hypotheses tested by Type II and Type III sums of squares are different, and the choice of which to use should be guided by which hypothesis is of interest. Recall that Type II sums of squares weight cells based on their sample sizes whereas Type III sums of squares weight all cells the same. Consider Figure 1 which shows data from a hypothetical A(2) x B(2) design. The sample sizes are shown numerically and are represented graphically by the areas of the endpoints.

First, let's consider the hypothesis for the main effect of B tested by the Type III sums of squares. Type III sums of squares weight the means equally and, for these data, the marginal means for
Thus, there is no main effect of B when tested using Type III sums of squares. For Type II sums of squares, the means are weighted by sample size.
Since the weighted marginal mean for