Hypothesis Testing with Two Samples

Site: Saylor Academy
Course: BUS204: Business Statistics
Book: Hypothesis Testing with Two Samples
Printed by: Guest user
Date: Monday, May 20, 2024, 2:14 AM

Description

Read this chapter, which discusses how to compare data from two similar groups. This is useful when, for example, you want to analyze things like how someone's income relates to another sample that you are interested in. Make sure you read the introduction as well as sections 10.1 through 10.6. Attempt the practice problems and homework at the end of the chapter.

Introduction

Figure 10.1 If you want to test a claim that involves two groups (the types of breakfasts eaten east and west of the Mississi

Figure 10.1 If you want to test a claim that involves two groups (the types of breakfasts eaten east and west of the Mississippi River) you can use a slightly different technique when conducting a hypothesis test.

Studies often compare two groups. For example, researchers are interested in the effect aspirin has in preventing heart attacks. Over the last few years, newspapers and magazines have reported various aspirin studies involving two groups. Typically, one group is given aspirin and the other group is given a placebo. Then, the heart attack rate is studied over several years.

There are other situations that deal with the comparison of two groups. For example, studies compare various diet and exercise programs. Politicians compare the proportion of individuals from different income brackets who might vote for them. Students are interested in whether SAT or GRE preparatory courses really help raise their scores. Many business applications require comparing two groups. It may be the investment returns of two different investment strategies, or the differences in production efficiency of different management styles.

To compare two means or two proportions, you work with two groups. The groups are classified either as independent or matched pairs. Independent groups consist of two samples that are independent, that is, sample values selected from one population are not related in any way to sample values selected from the other population. Matched pairs consist of two samples that are dependent. The parameter tested using matched pairs is the population mean. The parameters tested using independent groups are either population means or population proportions of each group.


Source: OpenStax, https://openstax.org/books/introductory-business-statistics/pages/10-introduction
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

Comparing Two Independent Population Means

The comparison of two independent population means is very common and provides a way to test the hypothesis that the two groups differ from each other. Is the night shift less productive than the day shift, are the rates of return from fixed asset investments different from those from common stock investments, and so on? An observed difference between two sample means depends on both the means and the sample standard deviations. Very different means can occur by chance if there is great variation among the individual samples. The test statistic will have to account for this fact. The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch t-test. The degrees of freedom formula we will see later was developed by Aspin-Welch.

When we developed the hypothesis test for the mean and proportions we began with the Central Limit Theorem. We recognized that a sample mean came from a distribution of sample means, and sample proportions came from the sampling distribution of sample proportions. This made our sample parameters, the sample means and sample proportions, into random variables. It was important for us to know the distribution that these random variables came from. The Central Limit Theorem gave us the answer: the normal distribution. Our Z and t statistics came from this theorem. This provided us with the solution to our question of how to measure the probability that a sample mean came from a distribution with a particular hypothesized value of the mean or proportion. In both cases that was the question: what is the probability that the mean (or proportion) from our sample data came from a population distribution with the hypothesized value we are interested in?

Now we are interested in whether or not two samples have the same mean. Our question has not changed: Do these two samples come from the same population distribution? To approach this problem we create a new random variable. We recognize that we have two sample means, one from each set of data, and thus we have two random variables coming from two unknown distributions. To solve the problem we create a new random variable, the difference between the sample means. This new random variable also has a distribution and, again, the Central Limit Theorem tells us that this new distribution is normally distributed, regardless of the underlying distributions of the original data. A graph may help to understand this concept.

Figure 10.2
Figure 10.2

Pictured are two distributions of data, X1 and X2, with unknown means and standard deviations. The second panel shows the sampling distribution of the newly created random variable (\overline X_1−\overline X_2). This distribution is the theoretical distribution of many sample means from population 1 minus sample means from population 2. The Central Limit Theorem tells us that this theoretical sampling distribution of differences in sample means is normally distributed, regardless of the distribution of the actual population data shown in the top panel. Because the sampling distribution is normally distributed, we can develop a standardizing formula and calculate probabilities from the standard normal distribution in the bottom panel, the Z distribution.

The Central Limit Theorem, as before, provides us with the standard deviation of the sampling distribution, and further, that the expected value of the mean of the distribution of differences in sample means is equal to the differences in the population means. Mathematically this can be stated:

E(µ\overline x_1− µ \overline x_2)=µ_1−µ_2

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error, of the difference in sample means, \overline X_1−\overline X_2.

The standard error is:

\sqrt{\dfrac{(s_1)^2}{n_1}+\dfrac{(s_2)^2}{n_2}}

We remember that substituting the sample variance for the population variance when we did not have the population variance was the technique we used when building the confidence interval and the test statistic for the test of hypothesis for a single mean back in Confidence Intervals and Hypothesis Testing with One Sample. The test statistic (t-score) is calculated as follows:

t_c=\dfrac{(\overline x_1–\overline x_2)–δ_0}{\sqrt{\dfrac{(s_1)^2}{n_1}+\dfrac{(s_2)^2}{n_2}}}

where:

  • s1 and s2, the sample standard deviations, are estimates of σ1 and σ2, respectively and
  • σ1 and σ2 are the unknown population standard deviations.
  • \overline x_1 and \overline x_2 are the sample means. μ1 and μ2 are the unknown population means.

The number of degrees of freedom (df) requires a somewhat complicated calculation. The df are not always a whole number. The test statistic above is approximated by the Student's t-distribution with df as follows:

The standard error is:

df=\dfrac{(\dfrac{(s_1)^2}{n_1}+\dfrac{(s_2)^2}{n_2})^2}{(\dfrac{1}{n_1–1})(\dfrac{(s_1)^2}{n_1})^2+(\dfrac{1}{n_2–1})(\dfrac{(s_2)^2}{n_2})^2}

When both sample sizes n1 and n2 are 30 or larger, the Student's t approximation is very good. If each sample has more than 30 observations then the degrees of freedom can be calculated as n1 + n2 - 2.

The format of the sampling distribution, differences in sample means, specifies that the format of the null and alternative hypothesis is:

H_0:µ_1−µ_2=δ_0

H_a:µ_1−µ_2≠δ_0

where δ_0 is the hypothesized difference between the two means. If the question is simply “is there any difference between the means?” then δ_0 = 0 and the null and alternative hypotheses becomes:

H_0:µ_1=µ_2

H_a:µ_1≠µ_2

An example of when δ_0 might not be zero is when the comparison of the two groups requires a specific difference for the decision to be meaningful. Imagine that you are making a capital investment. You are considering changing from your current model machine to another. You measure the productivity of your machines by the speed they produce the product. It may be that a contender to replace the old model is faster in terms of product throughput, but is also more expensive. The second machine may also have more maintenance costs, setup costs, etc. The null hypothesis would be set up so that the new machine would have to be better than the old one by enough to cover these extra costs in terms of speed and cost of production. This form of the null and alternative hypothesis shows how valuable this particular hypothesis test can be. For most of our work we will be testing simple hypotheses asking if there is any difference between the two distribution means.


Example 10.1

Independent groups
The Kona Iki Corporation produces coconut milk. They take coconuts and extract the milk inside by drilling a hole and pouring the milk into a vat for processing. They have both a day shift (called the B shift) and a night shift (called the G shift) to do this part of the process. They would like to know if the day shift and the night shift are equally efficient in processing the coconuts. A study is done sampling 9 shifts of the G shift and 16 shifts of the B shift. The results of the number of hours required to process 100 pounds of coconuts is presented in Table 10.1. A study is done and data are collected, resulting in the data in Table 10.1.

Sample Size Average Number of Hours to Process 100 Pounds of Coconuts Sample Standard Deviation
G Shift 9 2 0.866
B Shift 16 3.2 1.00

Table 10.1

Problem
Is there a difference in the mean amount of time for each shift to process 100 pounds of coconuts? Test at the 5% level of significance.

Solution 1
The population standard deviations are not known and cannot be assumed to equal each other. Let g be the subscript for the G Shift and b be the subscript for the B Shift. Then, μg is the population mean for G Shift and μb is the population mean for B Shift. This is a test of two independent groups, two population means.

Random variable: \overline X_g− \overline X_b= difference in the sample mean amount of time between the G Shift and the B Shift takes to process the coconuts.

H0: μg = μb  H0: μg – μb = 0
Ha: μg ≠ μb  Ha: μg – μb ≠ 0

The words "the same" tell you H0 has an "=". Since there are no other words to indicate Ha, is either faster or slower. This is a two tailed test.

Distribution for the test: Use tdf where df is calculated using the df formula for independent groups, two population means above. Using a calculator, df is approximately 18.8462.

Graph:

Figure 10.3
Figure 10.3

t_c=\dfrac{(\overline X_1−\overline X_2)−δ_0}{\sqrt{\dfrac{S^2_1}{n_1}+\dfrac{S^2_2}{n_2}}} = −3.01

We next find the critical value on the t-table using the degrees of freedom from above. The critical value, 2.093, is found in the .025 column, this is α/2, at 19 degrees of freedom. (The convention is to round up the degrees of freedom to make the conclusion more conservative.) Next we calculate the test statistic and mark this on the t-distribution graph.

Make a decision
: Since the calculated t-value is in the tail we cannot accept the null hypothesis that there is no difference between the two groups. The means are different.

The graph has included the sampling distribution of the differences in the sample means to show how the t-distribution aligns with the sampling distribution data. We see in the top panel that the calculated difference in the two means is -1.2 and the bottom panel shows that this is 3.01 standard deviations from the mean. Typically we do not need to show the sampling distribution graph and can rely on the graph of the test statistic, the t-distribution in this case, to reach our conclusion.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that the G Shift takes to process 100 pounds of coconuts is different from the B Shift (mean number of hours for the B Shift is greater than the mean number of hours for the G Shift).

NOTE
When the sum of the sample sizes is larger than 30 (n1 + n2 > 30) you can use the normal distribution to approximate the Student's t.


Example 10.2
A study is done to determine if Company A retains its workers longer than Company B. It is believed that Company A has a higher retention than Company B. The study finds that in a sample of 11 workers at Company A their average time with the company is four years with a standard deviation of 1.5 years. A sample of 9 workers at Company B finds that the average time with the company was 3.5 years with a standard deviation of 1 year. Test this proposition at the 1% level of significance.

Problem
a. Is this a test of two means or two proportions?

Solution 1
a. two means because time is a continuous random variable.

Problem
b. Are the populations standard deviations known or unknown?

Solution 2
b. unknown

Problem
c. Which distribution do you use to perform the test?

Solution 3
c. Student's t

Problem
d. What is the random variable?

Solution 4
d. \overline X_A− \overline X_B

Problem
e. What are the null and alternate hypotheses?

Solution 5
e.
  • H_o:μ_A≤μ_B
  • H_a:μ_A>μ_B
Problem
f. Is this test right-, left-, or two-tailed?

Solution 6
f. right one-tailed test

Figure 10.4
Figure 10.4

Problem
g. What is the value of the test statistic?

Solution 7
tc=\dfrac{(\overline X_1− \overline X_2)−δ_0}{\sqrt{\dfrac{S^2_1}{n_1}+\dfrac{S^2_2}{n_2}}}=0.89

Problem
h. Can you accept/reject the null hypothesis?

Solution 8
h. Cannot reject the null hypothesis that there is no difference between the two groups. Test statistic is not in the tail. The critical value of the t-distribution is 2.764 with 10 degrees of freedom. This example shows how difficult it is to reject a null hypothesis with a very small sample. The critical values require very large test statistics to reach the tail.

Problem
i. Conclusion:

Solution 9
i. At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that the retention of workers at Company A is longer than Company B, on average.


Example 10.3
Problem
An interesting research question is the effect, if any, that different types of teaching formats have on the grade outcomes of students. To investigate this issue one sample of students' grades was taken from a hybrid class and another sample taken from a standard lecture format class. Both classes were for the same subject. The mean course grade in percent for the 35 hybrid students is 74 with a standard deviation of 16. The mean grades of the 40 students form the standard lecture class was 76 percent with a standard deviation of 9. Test at 5% to see if there is any significant difference in the population mean grades between standard lecture course and hybrid class.

Solution 1
We begin by noting that we have two groups, students from a hybrid class and students from a standard lecture format class. We also note that the random variable, what we are interested in, is students' grades, a continuous random variable. We could have asked the research question in a different way and had a binary random variable. For example, we could have studied the percentage of students with a failing grade, or with an A grade. Both of these would be binary and thus a test of proportions and not a test of means as is the case here. Finally, there is no presumption as to which format might lead to higher grades so the hypothesis is stated as a two-tailed test.

H_0: µ_1 = µ_2
H_a: µ_1 ≠ µ_2

As would virtually always be the case, we do not know the population variances of the two distributions and thus our test statistic is:

tc=\dfrac{(\overline X_1− \overline X_2)−δ_0}{\sqrt{\dfrac{S^2_1}{n_1}+\dfrac{S^2_2}{n_2}}}= \dfrac{(74−76)−0}{\sqrt{\dfrac{16^2}{35}+\dfrac{9^2}{40}}}=−0.65

To determine the critical value of the Student's t we need the degrees of freedom. For this case we use: df = n1 + n2 - 2 = 35 + 40 -2 = 73. This is large enough to consider it the normal distribution thus ta/2 = 1.96. Again as always we determine if the calculated value is in the tail determined by the critical value. In this case we do not even need to look up the critical value: the calculated value of the difference in these two average grades is not even one standard deviation apart. Certainly not in the tail.


Conclusion: Cannot reject the null at α=5%. Therefore, evidence does not exist to prove that the grades in hybrid and standard classes differ.

Cohen's Standards for Small, Medium, and Large Effect Sizes

Cohen's d is a measure of "effect size" based on the differences between two means. Cohen's d, named for United States statistician Jacob Cohen, measures the relative strength of the differences between the means of two populations based on sample data. The calculated value of effect size is then compared to Cohen's standards of small, medium, and large effect sizes.

Size of effect d
Small 0.2
Medium 0.5
Large 0.8

Table 10.2 Cohen's Standard Effect Sizes

Cohen's d is the measure of the difference between two means divided by the pooled standard deviation: d=\dfrac{\overline x_1– \overline x_2}{s_{pooled}}
where s_{pooled}=\sqrt{\dfrac{(n_1–1)s^2_1+(n_2–1)s^2_2}{n_1+n_2^{–2}}}

It is important to note that Cohen's d does not provide a level of confidence as to the magnitude of the size of the effect comparable to the other tests of hypothesis we have studied. The sizes of the effects are simply indicative.


Example 10.4
Problem
Calculate Cohen's d for example 10.2. Is the size of the effect small, medium, or large? Explain what the size of the effect means for this problem.

Solution 1

\overline x_1 = 4 s_1 = 1.5 n_1 = 11

\overline x_2 = 3.5 s_2 = 1 n_2 = 9

d = 0.384

The effect is small because 0.384 is between Cohen's value of 0.2 for small effect size and 0.5 for medium effect size. The size of the differences of the means for the two companies is small indicating that there is not a significant difference between them.

Test for Differences in Means: Assuming Equal Population Variances

Typically we can never expect to know any of the population parameters, mean, proportion, or standard deviation. When testing hypotheses concerning differences in means we are faced with the difficulty of two unknown variances that play a critical role in the test statistic. We have been substituting the sample variances just as we did when testing hypotheses for a single mean. And as we did before, we used a Student's t to compensate for this lack of information on the population variance. There may be situations, however, when we do not know the population variances, but we can assume that the two populations have the same variance. If this is true then the pooled sample variance will be smaller than the individual sample variances. This will give more precise estimates and reduce the probability of discarding a good null. The null and alternative hypotheses remain the same, but the test statistic changes to:

tc=\dfrac{(\overline x_1−\overline x_2)−δ_0}{\sqrt{S_p^2(\dfrac{1}{n_1}+\dfrac{1}{n_2})}}

where S_p^2 is the pooled variance given by the formula:

S_p^2=\dfrac{(n_1−1)s^2_1−(n_2−1)s^2_2}{n_1+n_2−2}

Example 10.5

Problem
A drug trial is attempted using a real drug and a pill made of just sugar. 18 people are given the real drug in hopes of increasing the production of endorphins. The increase in endorphins is found to be on average 8 micrograms per person, and the sample standard deviation is 5.4 micrograms. 11 people are given the sugar pill, and their average endorphin increase is 4 micrograms with a standard deviation of 2.4. From previous research on endorphins it is determined that it can be assumed that the variances within the two samples can be assumed to be the same. Test at 5% to see if the population mean for the real drug had a significantly greater impact on the endorphins than the population mean with the sugar pill.

Solution 1
First we begin by designating one of the two groups Group 1 and the other Group 2. This will be needed to keep track of the null and alternative hypotheses. Let's set Group 1 as those who received the actual new medicine being tested and therefore Group 2 is those who received the sugar pill. We can now set up the null and alternative hypothesis as:

H0: µ1 ≤ µ2
H1: µ1 > µ2

This is set up as a one-tailed test with the claim in the alternative hypothesis that the medicine will produce more endorphins than the sugar pill. We now calculate the test statistic which requires us to calculate the pooled variance, S_p^2 using the formula above.

 t_c=\dfrac{(\overline x_1−\overline x _2)−δ_0}{\sqrt{S_p^2(\dfrac{1}{n_1}+\dfrac{1}{n_2})}} =\dfrac{(8−4)−0}{\sqrt{20.4933(\dfrac{1}{18}+\dfrac{1}{11})}}=2.31

tα, allows us to compare the test statistic and the critical value.

t_α=1.703 at df=n_1+n_2−2=18+11−2=27

The test statistic is clearly in the tail, 2.31 is larger than the critical value of 1.703, and therefore we cannot maintain the null hypothesis. Thus, we conclude that there is significant evidence at the 95% level of confidence that the new medicine produces the effect desired.

Comparing Two Independent Population Proportions

When conducting a hypothesis test that compares two independent population proportions, the following characteristics should be present:

  1. The two independent samples are random samples that are independent.
  2. The number of successes is at least five, and the number of failures is at least five, for each of the samples.
  3. Growing literature states that the population must be at least ten or even perhaps 20 times the size of the sample. This keeps each population from being over-sampled and causing biased results.

Comparing two proportions, like comparing two means, is common. If two estimated proportions are different, it may be due to a difference in the populations or it may be due to chance in the sampling. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the two population proportions.

Like the case of differences in sample means, we construct a sampling distribution for differences in sample proportions: (p′_A−p′_B) where p′_A=X \dfrac{A}{n_A} and p′_B=X\dfrac{B}{n_B} are the sample proportions for the two sets of data in question. XA and XB are the number of successes in each sample group respectively, and nA and nB are the respective sample sizes from the two groups. Again we go the Central Limit theorem to find the distribution of this sampling distribution for the differences in sample proportions. And again we find that this sampling distribution, like the ones past, are normally distributed as proved by the Central Limit Theorem, as seen in Figure 10.5 .

Figure 10.5

Figure 10.5

Generally, the null hypothesis allows for the test of a difference of a particular value, 𝛿_0, just as we did for the case of differences in means.

H_0:p_1−p_2=δ_0

H_1:p_1−p_2≠δ_0

Most common, however, is the test that the two proportions are the same. That is,

H_0:p_A=p_B

H_a:p_A≠p_B

To conduct the test, we use a pooled proportion, pc.

The pooled proportion is calculated as follows:

pc=\dfrac{x_A+x_B}{n_A+n_B}

The test statistic (z-score) is:

Z_c=\frac{\left(\mathrm{P}_A^{\prime}-\mathrm{P}_B^{\prime}\right)-\delta_0}{\sqrt{P_c\left(1-P_c\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}}=1.47

where δ0 is the hypothesized differences between the two proportions and pc is the pooled variance from the formula above.


Example 10.6

Problem
A bank has recently acquired a new branch and thus has customers in this new territory. They are interested in the default rate in their new territory. They wish to test the hypothesis that the default rate is different from their current customer base. They sample 200 files in area A, their current customers, and find that 20 have defaulted. In area B, the new customers, another sample of 200 files shows 12 have defaulted on their loans. At a 10% level of significance can we say that the default rates are the same or different?

Solution 1
This is a test of proportions. We know this because the underlying random variable is binary, default or not default. Further, we know it is a test of differences in proportions because we have two sample groups, the current customer base and the newly acquired customer base. Let A and B be the subscripts for the two customer groups. Then pA and pB are the two population proportions we wish to test.

Random Variable:
P′_A – P′_B = difference in the proportions of customers who defaulted in the two groups.

H_0:p_A=p_B

H_a:p_A≠p_B

The words "is a difference" tell you the test is two-tailed.

Distribution for the test: Since this is a test of two binomial population proportions, the distribution is normal:

p_c=\dfrac{x_A+x_B}{n_A+n_B}=\dfrac{20+12}{200+200}=0.081–p_c=0.92

(p′_A – p′_B) = 0.04 follows an approximate normal distribution.

Estimated proportion for group A: p′_A=\dfrac{x_A}{n_A}=\dfrac{20}{200}=0.1

Estimated proportion for group B: p′_B=\dfrac{x_B}{n_B}=\dfrac{12}{200}=0.06

The estimated difference between the two groups is : p′_A – p′_B = 0.1 – 0.06 = 0.04.

Figure 10.6


Figure 10.6

 Z_c=\frac{\left(\mathrm{P}_A^{\prime}-\mathrm{P}_B^{\prime}\right)-\delta_0}{\sqrt{P_c\left(1-P_c\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}}=1.47

The calculated test statistic is 1.47 and is not in the tail of the distribution.

Make a decision: Since the calculate test statistic is not in the tail of the distribution we cannot reject H0.

Conclusion: At a 1% level of significance, from the sample data, there is not sufficient evidence to conclude that there is a difference between the proportions of customers who defaulted in the two groups.


Try It 10.6

Two types of valves are being tested to determine if there is a difference in pressure tolerances. Fifteen out of a random sample of 100 of Valve A cracked under 4,500 psi. Six out of a random sample of 100 of Valve B cracked under 4,500 psi. Test at a 5% level of significance.

Two Population Means with Known Standard Deviations

Even though this situation is not likely (knowing the population standard deviations is very unlikely), the following example illustrates hypothesis testing for independent means with known population standard deviations. The sampling distribution for the difference between the means is normal in accordance with the central limit theorem. The random variable is \overline X_1 – \overline X_2. The normal distribution has the following format:

The standard deviation is:

\sqrt{\dfrac{(σ_1)^2}{n_1}+\dfrac{(σ_2)^2}{n_2}}

The test statistic (z-score) is:

Z_c=\dfrac{(\overline x_1– \overline x_2)–δ_0}{\sqrt{\dfrac{(σ_1)^2}{n_1}+\dfrac{(σ_2)^2}{n_2}}}

Example 10.7

Independent groups, population standard deviations known: The mean lasting time of two competing floor waxes is to be compared. Twenty floors are randomly assigned to test each wax. Both populations have a normal distributions. The data are recorded in Table 10.3.

Wax Sample mean number of months floor wax lasts Population standard deviation
1 3 0.33
2 2.9 0.36

Table 10.3

Problem
Does the data indicate that wax 1 is more effective than wax 2? Test at a 5% level of significance.

Solution 1
This is a test of two independent groups, two population means, population standard deviations known.

Random Variable
: \overline X_1 – \overline X_2 = difference in the mean number of months the competing floor waxes last.

H_0:μ_1≤μ_2

H_a:μ_1>μ_2

The words "is more effective" says that wax 1 lasts longer than wax 2, on average. "Longer" is a ">" symbol and goes into Ha. Therefore, this is a right-tailed test.

Distribution for the test: The population standard deviations are known so the distribution is normal. Using the formula for the test statistic we find the calculated value for the problem.

Z_c=\dfrac{(μ_1−μ_2)−δ_0}{\sqrt{\dfrac{σ^2_1}{n_1}+\dfrac{σ^2_2}{n_2}}}=0.1

Figure 10.7
Figure 10.7

The estimated difference between he two means is : \overline X_1 – \overline X_2= 3 – 2.9 = 0.1

Compare calculated value and critical value and Zα: We mark the calculated value on the graph and find the calculated value is not in the tail therefore we cannot reject the null hypothesis.

Make a decision: the calculated value of the test statistic is not in the tail, therefore you cannot reject H0.

Conclusion: At the 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the mean time wax 1 lasts is longer (wax 1 is more effective) than the mean time wax 2 lasts.


Try It 10.7
The means of the number of revolutions per minute of two competing engines are to be compared. Thirty engines of each type are randomly assigned to be tested. Both populations have normal distributions. Table 10.4 shows the result. Do the data indicate that Engine 2 has higher RPM than Engine 1? Test at a 5% level of significance.

Engine Sample mean number of RPM Population standard deviation
1 1,500 50
2 1,600 60

Table 10.4


Example 10.8
An interested citizen wanted to know if Democratic U. S. senators are older than Republican U.S. senators, on average. On May 26 2013, the mean age of 30 randomly selected Republican Senators was 61 years 247 days old (61.675 years) with a standard deviation of 10.17 years. The mean age of 30 randomly selected Democratic senators was 61 years 257 days old (61.704 years) with a standard deviation of 9.55 years.

Problem
Do the data indicate that Democratic senators are older than Republican senators, on average? Test at a 5% level of significance.

Solution 1
This is a test of two independent groups, two population means. The population standard deviations are unknown, but the sum of the sample sizes is 30 + 30 = 60, which is greater than 30, so we can use the normal approximation to the Student’s-t distribution. Subscripts: 1: Democratic senators 2: Republican senators

Random variable: \overline X_1 – \overline X_2= difference in the mean age of Democratic and Republican U.S. senators.

H_0:μ_1≤μ_2  H_0:μ_1−μ_2≤0

H_a:μ_1 > μ_2 H_a:μ_1−μ_2 > 0

The words "older than" translates as a ">" symbol and goes into Ha. Therefore, this is a right-tailed test.

Figure 10.8

Figure 10.8

Make a decision: The p-value is larger than 5%, therefore we cannot reject the null hypothesis. By calculating the test statistic we would find that the test statistic does not fall in the tail, therefore we cannot reject the null hypothesis. We reach the same conclusion using either method of a making this statistical decision.

Conclusion: At the 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the mean age of Democratic senators is greater than the mean age of the Republican senators.

Matched or Paired Samples

In most cases of economic or business data we have little or no control over the process of how the data are gathered. In this sense the data are not the result of a planned controlled experiment. In some cases, however, we can develop data that are part of a controlled experiment. This situation occurs frequently in quality control situations. Imagine that the production rates of two machines built to the same design, but at different manufacturing plants, are being tested for differences in some production metric such as speed of output or meeting some production specification such as strength of the product. The test is the same in format to what we have been testing, but here we can have matched pairs for which we can test if differences exist. Each observation has its matched pair against which differences are calculated. First, the differences in the metric to be tested between the two lists of observations must be calculated, and this is typically labeled with the letter "d". Then, the average of these matched differences, \overline X_d is calculated as is its standard deviation, Sd. We expect that the standard deviation of the differences of the matched pairs will be smaller than unmatched pairs because presumably fewer differences should exist because of the correlation between the two groups.

When using a hypothesis test for matched or paired samples, the following characteristics may be present:

  1. Simple random sampling is used.
  2. Sample sizes are often small.
  3. Two measurements (samples) are drawn from the same pair of individuals or objects.
  4. Differences are calculated from the matched or paired samples.
  5. The differences form the sample that is used for the hypothesis test.
  6. Either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so that distribution of the sample mean of differences is approximately normal.

In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are calculated. The differences are the data. The population mean for the differences, μd, is then tested using a Student's-t test for a single population mean with n – 1 degrees of freedom, where n is the number of differences, that is, the number of pairs not the number of observations.

The null and alternative hypotheses for this test are:

H_0:µ_d=0

H_a:µ_d≠0

The test statistic is:

 tc=\dfrac{\overline x_d−μ_d}{(\dfrac{s_d}{\sqrt n)}}


Example 10.9

Problem
A company has developed a training program for its entering employees because they have become concerned with the results of the six-month employee review. They hope that the training program can result in better six-month reviews. Each trainee constitutes a "pair", the entering score the employee received when first entering the firm and the score given at the six-month review. The difference in the two scores were calculated for each employee and the means for before and after the training program was calculated. The sample mean before the training program was 20.4 and the sample mean after the training program was 23.9. The standard deviation of the differences in the two scores across the 20 employees was 3.8 points. Test at the 10% significance level the null hypothesis that the two population means are equal against the alternative that the training program helps improve the employees' scores.

Solution 1
The first step is to identify this as a two sample case: before the training and after the training. This differentiates this problem from simple one sample issues. Second, we determine that the two samples are "paired". Each observation in the first sample has a paired observation in the second sample. This information tells us that the null and alternative hypotheses should be:

H_0:µ_d≤0

H_a:µ_d > 0

This form reflects the implied claim that the training course improves scores; the test is one-tailed and the claim is in the alternative hypothesis. Because the experiment was conducted as a matched paired sample rather than simply taking scores from people who took the training course those who didn't, we use the matched pair test statistic:

Test Statistic: t_c=\dfrac{\overline X_d−µ_d}{\dfrac{S_d}{\sqrt n}} =\dfrac{(23.9−20.4)−0}{(\dfrac{3.8}{\sqrt{20}})}=4.12

In order to solve this equation, the individual scores, pre-training course and post-training course need to be used to calculate the individual differences. These scores are then averaged and the average difference is calculated:

\overline X_d= \overline x_1−\overline x_2

From these differences we can calculate the standard deviation across the individual differences:

S_d=\dfrac{Σ(d_i−\overline X_d)^2}{n−1} where d_i=x_{1i}−x_{2i}

We can now compare the calculated value of the test statistic, 4.12, with the critical value. The critical value is a Student's t with degrees of freedom equal to the number of pairs, not observations, minus 1. In this case 20 pairs and at 90% confidence level ta/2 = ±1.729 at df = 20 - 1 = 19. The calculated test statistic is most certainly in the tail of the distribution and thus we cannot accept the null hypothesis that there is no difference from the training program. Evidence seems indicate that the training aids employees in gaining higher scores.


Example 10.10

Problem

A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results for randomly selected subjects are shown in Table 10.5. A lower score indicates less pain. The "before" value is matched to an "after" value and the differences are calculated. Are the sensory measurements, on average, lower after hypnotism? Test at a 5% significance level.

Subject: A B C D E F G H
Before 6.6 6.5 9.0 10.3 11.3 8.1 6.3 11.6
After 6.8 2.4 7.4 8.5 8.1 6.1 3.4 2.0

Table 10.5

Solution 1
Corresponding "before" and "after" values form matched pairs. (Calculate "after" – "before").

After data Before data Difference
6.8 6.6 0.2
2.4 6.5 -4.1
7.4 9 -1.6
8.5 10.3 -1.8
8.1 11.3 -3.2
6.1 8.1 -2
3.4 6.3 -2.9
2 11.6 -9.6

Table 10.6

The data for the test are the differences: {0.2, –4.1, –1.6, –1.8, –3.2, –2, –2.9, –9.6}

The sample mean and sample standard deviation of the differences are: \overline x_d=–3.13 and s_d=2.91 Verify these values.

Let μ_d be the population mean for the differences. We use the subscript d to denote "differences".

Random variable: \overline X_d = the mean difference of the sensory measurements

H_0: μ_d ≥ 0

The null hypothesis is zero or positive, meaning that there is the same or more pain felt after hypnotism. That means the subject shows no improvement. μd is the population mean of the differences).

H_a: μ_d < 0

The alternative hypothesis is negative, meaning there is less pain felt after hypnotism. That means the subject shows improvement. The score should be lower after hypnotism, so the difference ought to be negative to indicate improvement.

Distribution for the test: The distribution is a Student's t with df = n – 1 = 8 – 1 = 7. Use t7. (Notice that the test is for a single population mean).

Calculate the test statistic and look up the critical value using the Student's-t distribution: The calculated value of the test statistic is 3.06 and the critical value of the t-distribution with 7 degrees of freedom at the 5% level of confidence is 1.895 with a one-tailed test.
Normal distribution curve of the average difference of sensory measurements with values of -3.13 and 0. A vertical upward line extends from -3.13 to the curve, and the p-value is indicated in the area to the left of this value.

Figure 10.9
Figure 10.9

\overline X_d is the random variable for the differences.

The sample mean and sample standard deviation of the differences are:

\overline x_d= –3.13

\overline s_d= 2.91

Compare the critical value for alpha against the calculated test statistic.

The conclusion from using the comparison of the calculated test statistic and the critical value will gives us the result. In this question the calculated test statistic is 3.06 and the critical value is 1.895. The test statistic is clearly in the tail and thus we cannot accept the null hypotheses that there is no difference between the two situations, hypnotized and not hypnotized.

Make a decision: Cannot accept the null hypothesis, H0. This means that μd < 0 and there is a statistically significant improvement.

Conclusion: At a 5% level of significance, from the sample data, there is sufficient evidence to conclude that the sensory measurements, on average, are lower after hypnotism. Hypnotism appears to be effective in reducing pain.


Example 10.11
A college football coach was interested in whether the college's strength development class increased his players' maximum lift (in pounds) on the bench press exercise. He asked four of his players to participate in a study. The amount of weight they could each lift was recorded before they took the strength development class. After completing the class, the amount of weight they could each lift was again measured. The data are as follows:

Weight (in pounds) Player 1 Player 2 Player 3 Player 4
Amount of weight lifted prior to the class 205 241 338 368
Amount of weight lifted after the class 295 252 330 360

Table 10.7

The coach wants to know if the strength development class makes his players stronger, on average.

Record the differences data. Calculate the differences by subtracting the amount of weight lifted prior to the class from the weight lifted after completing the class. The data for the differences are: {90, 11, -8, -8}.

\overline x_d= 21.3, s_d = 46.7

Using the difference data, this becomes a test of a single mean.

Define the random variable: \overline X_d mean difference in the maximum lift per player.

The distribution for the hypothesis test is a student's t with 3 degrees of freedom.

H_0: μ_d ≤ 0, H_a: μ_d > 0

Figure 10.10
Figure 10.10

Calculate the test statistic look up the critical value: Critical value of the test statistic is 0.91. The critical value of the student's t at 5% level of significance and 3 degrees of freedom is 2.353.

Decision: If the level of significance is 5%, we cannot reject the null hypothesis, because the calculated value of the test statistic is not in the tail.

What is the conclusion?
At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the strength development class helped to make the players stronger, on average.