MA121 Study Guide


Unit 5: Hypothesis Test

5a. Differentiate between type I and type II errors, and find the probability of these errors

  • What is hypothesis testing? How is it related to confidence intervals?
  • What is a null and alternate hypothesis?
  • What is an error in the context of hypothesis testing? What is the difference between a Type I and Type II error? How do they relate to each other? Which is more serious?
  • How are the Type I and Type II errors calculated or determined?

Hypothesis testing is a form of inferential statistics similar to confidence intervals. We have a null hypothesis (H0) which we assume to be true by default, and an alternative hypothesis (H1 or Ha) which we can prove or fail to prove, based on sample data.

We have three types of alternate hypothesis:

  1. A right-tailed test has an alternate hypothesis that is greater than the null,
  2. A left-tailed test has a lower alternate hypothesis,
  3. A two-tailed test tests for both higher and lower. While this may seem more convenient, the downside of a two-tailed test is that it reduces the power of the test, making it less likely to detect a change from the null hypothesis.

The conclusion of a hypothesis test is to reject or fail to reject, the null hypothesis. In other words, we assume the null is true, then we either find evidence (based on finding the p value) that it is false, or fail to find evidence that the null is false.

You can think about this situation as a jury trial. The null hypothesis is innocence, and the prosecutor tries to get a guilty verdict by providing evidence which makes the null hypothesis unlikely if all that evidence is true.

No hypothesis test is perfect, nor can it produce results that are absolutely, 100 percent reliable. This is because the samples are not completely like the population.

  • A Type I Error results when the null hypothesis is incorrectly rejected. For our jury example, this means the jury just convicted an innocent defendant.
  • A Type II Error is the opposite: failing to detect a difference from the null and incorrectly failing to reject the null hypothesis. For our jury example, the jury failed to convict a guilty defendant.

Remember: the term error does not necessarily mean the researcher made a mistake in their calculation. An error in statistics occurs when we do not have access to the population, and we may, by random chance, get a sample that does not properly represent the population.

For example, a drug study may show that a drug is ineffective simply because a large percentage of the sample the researcher used had a genetic tendency that made the drug less effective. The drug should have shown that it was more effective. A mistake did not cause the error. The sample was unusual. By random chance, the researcher simply chose a group that was less helped by the drug.

Type I and II Errors are related in that, all other things being equal, they are inversely related. The researchers chose the Type I Error (𝛂). They calculate the Type II Error (𝜷) based on possible alternate values for the mean, or whatever else they are estimating. When you decrease one, if all else is equal, you inevitably increase the other.

Calculating a Type II Error is beyond the scope of this course. However, this shows why researchers do not simply choose a tiny number for Type I Error: this would increase the Type II Error. Consequently, it will be harder to detect a true difference from the null. This creates a situation that is more serious although the severity depends on the situation.

For example, in our jury trial scenario we want to err on the side of not sending an innocent person to prison. Since this is represented by Type I Error, we might consider Type I Error to be more serious, and thus lower 𝛂, taking the chance that it will raise 𝜷.

If in a drug study, the null hypothesis is that a drug is safe for consumption, a Type II error would fail to find that the drug is dangerous, so that would be more serious. In this case, you might choose a more conservative value for alpha, even though it increases the probability that a safe drug will be rejected.

Review this material in:


5b. Describe and conduct hypothesis testing, calculate the p-value, and accept or reject the null hypothesis

  • What is the p value in hypothesis testing, what does it represent?
  • What does the p value tell you about whether to accept or reject the null hypothesis?

The p-value of a hypothesis test provides the key to getting the conclusion of the test. The p-value refers to the probability of obtaining a sample value equal to, or more extreme (see note below) than, the one we got, if we assume the null hypothesis is true.

A very low p-value (let's say 0.005) means that "if the defendant really is innocent, the probability that we could have obtained the blood and DNA evidence we did is extremely small". This is why a smaller p-value will cause us to reject the null hypothesis.

A very large p-value (usually greater than 0.10) means that, for example, "we assume by default the drug is ineffective ... there is a 10 percent chance we could have gotten the results we did even if the drug is ineffective". Well then this is less impressive, and might lead us to fail to reject the null hypothesis, since we have not found enough evidence to prove the drug effective. 

The proper cutoff (where below would reject the null, and above would fail to reject) is subjective. The standard is 0.05, but can be as low as 0.01 for an aggressive test, or as high as 0.10 for a more conservative test. See above where we talk about cases where a Type I or Type II error is more serious.

Note: The definition of extreme depends on whether we are conducting a right-tail test (the probability of a result higher than the result we got), a left-tail test (lower result), or a two-tail test (higher OR lower).

Review this material in:


5c. Explain how to conduct hypothesis tests for a single population mean and population proportion, when the population standard deviation is unknown; perform this task; and interpret the results

  • How do you know when to use the z or t distributions in a hypothesis test for the mean? What about for a proportion? 
  • Under what circumstances could neither z, nor t be used?

You would use either Z or T distributions based on the same criteria you would use to generate a confidence interval. If you know the population standard deviation, and you have a reasonably large sample size, you use the Z distribution form of the given equations. If the population standard deviation is unknown or the sample size small, you would use T.

Remember, that the Central Limit Theorem still applies. In other words if the sample size is small, you MUST have a normally distributed population, or else you cannot use either Z or T.

The steps for performing a p-value test are:

  1. Decide or know what value of Type I Error you will use (𝛂).
  2. Use the appropriate formula to calculate test statistic (or test value). The correct formula is determined by the parameter you are testing (mean, proportion, etc.) and within each, which distribution you are using (Z or T test for the mean).
  3. Use technology or a distribution table to look up the probability of getting a value more than the test value (right-tailed), less than the test value (left-tailed), or the combination of higher and lower (two-tailed). If you are running a two-tailed test, for example, and you get a test value of 1.85, you want to find p(Z > 1.85) + p(Z < −1.85). This probability is your p-value.
  4. Compare the p-value to alpha. If it is lower, reject the null hypothesis, if it is higher, we will fail to reject.

Review this material in:


Unit 5 Vocabulary

  • Alternative hypothesis
  • Central Limit Theorem
  • Error
  • Fail to reject the null hypothesis
  • Left-tailed test
  • Null hypothesis
  • P-value
  • Reject the null hypothesis
  • Right-tailed test
  • Test statistic
  • Two-tailed test
  • Type I Error
  • Type II Error