## The Observed Significance of a Test

This section explains what the observed significance of a test is, including how to compute and use it in the p-value approach.

### The p-value Approach to Hypothesis Testing

In Note 8.27 "Example 4 in Section 8.2 "Large Sample Tests for a Population Mean" the test was performed at the 5% level of significance: the definition of "rare" event was probability $\alpha=0.05$ or less. We saw above that the observed significance of the test was $p=0.0294$ or about 3%. Since $p=0.0294$ or 3% is less than 5%, the decision turned out to be to reject: what was observed was sufficiently unlikely to qualify as an event so rare as to be regarded as (practically) incompatible with $H_{\mathrm{O}}$.

In Note 8.28 "Example 5" in Section 8.2 "Large Sample Tests for a Population Mean" the test was performed at the 1% level of significance: the definition of "rare" event was probability $\alpha=0.01$ or less. The observed significance of the test was computed in Note 8.34 "Example 6" as $p=0.0128$  or about 1.3%. Since $p=0.0128 > 0.01 = \alpha$ (or 1.3% is greater than 1%), the decision turned out to be not to reject. The event observed was unlikely, but not sufficiently unlikely to lead to rejection of the null hypothesis.

The reasoning just presented is the basis for a slightly different but equivalent formulation of the hypothesis testing process. The first three steps are the same as before, but instead of using $\alpha$ to compute critical values and construct a rejection region, one computes the $p$-value $p$ of the test and compares it to $\alpha$, rejecting $H_{\mathrm{O}}$ if $p \leq \alpha$ and not rejecting if $p > \alpha$.

#### Systematic Hypothesis Testing Procedure: $p-Value$ Approach

1. Identify the null and alternative hypotheses.
2. Identify the relevant test statistic and its distribution.
3. Compute from the data the value of the test statistic.
4. Compute the $p$-value of the test.
5. Compare the value computed in Step 4 to significance level $\alpha$ and make a decision: reject $H_{\mathrm{o}}$ if $p \leq \alpha$ and do not reject $H_{\mathrm{O}}$ if $p > \alpha.$ Formulate the decision in the context of the problem, if applicable.

#### EXAMPLE 7

The total score in a professional basketball game is the sum of the scores of the two teams. An expert commentator claims that the average total score for NBA games is 202.5. A fan suspects that this is an overstatement and that the actual average is less than 202.5. He selects a random sample of 85 games and obtains a mean total score of 199.2 with standard deviation 19.63. Determine, at the 5% level of significance, whether there is sufficient evidence in the sample to reject the expert commentator's claim.

#### Solution:

• Step 1. Let $\mu$ be the true average total game score of all NBA games. The relevant test is

\begin{aligned} H_{0}: \mu &=202.5 \\ \text { vs. } H_{a}: \mu & < 202.5 @ \alpha=0.05\end{aligned}

• Step 2. The sample is large and the population standard deviation is unknown. Thus the test statistic is

$Z=\frac{\bar{x}-\mu_{0}}{s / \sqrt{n}}$

and has the standard normal distribution.

• Step 3. Inserting the data into the formula for the test statistic gives

$Z=\frac{\bar{x}-\mu_{0}}{s / \sqrt{n}}=\frac{199.2-202.5}{19.63 / \sqrt{85}}=-1.55$

• Step 4. The area of the left tail cut off by $z=-1.55$ is, by Figure 12.2 "Cumulative Normal Probability", 0.0606, as illustrated in Figure 8.8 "Test Statistic for". Since the test is left-tailed, the $p$-value is just this number, $p=0.0606$.

• Step 5. Since $p=0.0606 > 0.05=\alpha$, the decision is not to reject $H_{0}$. In the context of the problem our conclusion is:

The data do not provide sufficient evidence, at the 5% level of significance, to conclude that the average total score of NBA games is less than 202.5.

Figure 8.8

Test Statistic for Note 8.36 "Example 7"

#### Example 8

Mr. Prospero has been teaching Algebra II from a particular textbook at Remote Isle High School for many years. Over the years students in his Algebra II classes have consistently scored an average of 67 on the end of course exam (EOC). This year Mr. Prospero used a new textbook in the hope that the average score on the EOC test would be higher. The average EOC test score of the 64 students who took Algebra II from Mr. Prospero this year had mean 69.4 and sample standard deviation 6.1. Determine whether these data provide sufficient evidence, at the 1% level of significance, to conclude that the average EOC test score is higher with the new textbook.

#### Solution:

• Step 1. Let $\mu$ be the true average score on the EOC exam of all Mr. Prospero's students who take the Algebra II course with the new textbook. The natural statement that would be assumed true unless there were strong evidence to the contrary is that the new book is about the same as the old one. The alternative, which it takes evidence to establish, is that the new book is better, which corresponds to a higher value of $\mu$. Thus the relevant test is

\begin{aligned} H_{0}: \mu &=67 \\ \text { vs. } H_{a}: \mu & > 67 @ \alpha=0.01 \end{aligned}

• Step 2. The sample is large and the population standard deviation is unknown. Thus the test statistic is

$Z=\frac{\bar{x}-\mu_{0}}{s / \sqrt{n}}$

and has the standard normal distribution.

• Step 3. Inserting the data into the formula for the test statistic gives

$Z=\frac{\bar{x}-\mu_{0}}{s / \sqrt{n}}=\frac{69.4-67}{6.1 / \sqrt{64}}=3.15$

• Step 4. The area of the right tail cut off by $z=3.15$ is, by Figure 12.2 "Cumulative Normal Probability", $1-0.9992=0.0008$, as shown in Figure 8.9 "Test Statistic for ". Since the test is right-tailed, the $p$-value is just this number, $p=0.0008$.
• Step 5. Since $p=0.0008 < 0.01=\alpha$, the decision is to reject $H_{0}$. In the context of the problem our conclusion is:
The data provide sufficient evidence, at the 1% level of significance, to conclude that the average EOC exam score of students taking the Algebra II course from Mr. Prospero using the new book is higher than the average score of those taking the course from him but using the old book.

Figure 8.9

Test Statistic for Note 8.37 "Example 8"

#### EXAMPLE 9

For the surface water in a particular lake, local environmental scientists would like to maintain an average pH level at 7.4. Water samples are routinely collected to monitor the average pH level. If there is evidence of a shift in pH value, in either direction, then remedial action will be taken. On a particular day 30 water samples are taken and yield average pH reading of 7.7 with sample standard deviation 0.5. Determine, at the 1% level of significance, whether there is sufficient evidence in the sample to indicate that remedial action should be taken.

#### Solution:

• Step 1. Let $\mu$ be the true average $\mathrm{pH}$ level at the time the samples were taken. The relevant test is

\begin{aligned} H_{0}: \mu &=7.4 \\ \text { vs. } H_{a}: \mu & \neq 7.4 @ \alpha=0.01 \end{aligned}

• Step 2. The sample is large and the population standard deviation is unknown. Thus the test statistic is

$Z=\frac{\bar{x}-\mu_{0}}{s / \sqrt{n}}$

and has the standard normal distribution.

• Step 3. Inserting the data into the formula for the test statistic gives

$Z=\frac{\bar{x}-\mu_{0}}{s / \sqrt{n}}=\frac{7.7-7.4}{0.5 / \sqrt{30}}=3.29$

• Step 4. The area of the right tail cut off by $z=3.29$ is, by Figure 12.2 "Cumulative Normal Probability", $1-0.9995=0.0005$, as illustrated in Figure 8.10 "Test Statistic for". Since the test is two-tailed, the $p$-value is the double of this number, $p=2 \times 0.0005=0.0010$.
• Step 5. Since $p=0.0010 < 0.01=\alpha$, the decision is to reject $H_{0}$. In the context of the problem our conclusion is:

The data provide sufficient evidence, at the 1% level of significance, to conclude that the average pH of surface water in the lake is different from 7.4. That is, remedial action is indicated.

Figure 8.10

Test Statistic for Note 8.38 "Example 9"