Hypothesis Testing
Site: | Saylor Academy |
Course: | CS250: Python for Data Science |
Book: | Hypothesis Testing |
Printed by: | Guest user |
Date: | Friday, 4 April 2025, 5:47 AM |
Description
In addition to calculating confidence intervals, hypothesis testing is another way to make statistical inferences. This process involves considering two opposing hypotheses regarding a given data set (referred to as the null hypothesis and the alternative hypothesis). Hypothesis testing determines whether the null hypothesis can be accepted or rejected.
Null and Alternative Hypotheses
The actual test begins by considering two hypotheses. They are called the null hypothesis and the alternative hypothesis. These hypotheses contain opposing viewpoints.
H0, the - null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.
Ha - , the alternative hypothesis: a claim about the population that is contradictory to H0 and what we conclude when we reject H0.
Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.
After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are reject H0 if the sample information favors the alternative hypothesis or do not reject H0 or decline to reject H0 if the sample information is insufficient to reject the null hypothesis.
Mathematical Symbols Used in H0 and Ha:
H0 | Ha |
---|---|
equal (=) | not equal (≠) or greater than (>) or less than (<) |
greater than or equal to (≥) | less than (<) |
less than or equal to (≤) | more than (>) |
Note
Example 9.1
Try It 9.1
Example 9.2
H0 : μ = 2.0
Try It 9.2
- H0 : μ __ 66
- Ha : μ __ 66
Example 9.3
H0 : μ ≥ 5
Try It 9.3
- H0 : μ __ 45
- Ha : μ __ 45
Example 9.4
H0 : p ≤ 0.066
Try It 9.4
- H0 : p __ 0.40
- Ha : p __ 0.40
Collaborative Exercise
Source: OpenStax, https://openstax.org/books/statistics/pages/9-introduction
This work is licensed under a Creative Commons Attribution 4.0 License.
Outcomes and the Type I and Type II Errors
When you perform a hypothesis test, there are four possible outcomes depending on the actual truth, or falseness, of the null hypothesis H0 and the decision to reject or not. The outcomes are summarized in the following table:
ACTION | H0 IS ACTUALLY | ... |
---|---|---|
True | False | |
Do not reject H0 | Correct outcome | Type II error |
Reject H0 | Type I error | Correct outcome |
Table 9.2
The four possible outcomes in the table are as follows:
- The decision is not to reject H0 when H0 is true (correct decision).
- The decision is to reject H0 when, in fact, H0 is true (incorrect decision known as a Type I error).
- The decision is not to reject H0 when, in fact, H0 is false (incorrect decision known as a Type II error).
- The decision is to reject H0 when H0 is false (correct decision whose probability is called the Power of the Test).
α and β should be as small as possible because they are probabilities of errors. They are rarely zero.
The Power of the Test is 1 – β. Ideally, we want a high power that is as close to one as possible. Increasing the sample size can increase the Power of the Test.
Example 9.5
Suppose the null hypothesis, H0, is: Frank's rock climbing equipment is safe.Type I error: Frank does not go rock climbing because he considers that the equipment is not safe, when in fact, the equipment is really safe. Frank is making the mistake of rejecting the null hypothesis, when the equipment is actually safe!
Type II error: Frank goes climbing, thinking that his equipment is safe, but this is a mistake, and he painfully realizes that his equipment is not as safe as it should have been. Frank assumed that the null hypothesis was true, when it was not.
α = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. β = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.
Try It 9.5
Example 9.6
Suppose the null hypothesis, H0, is: a tomato plant is alive when a class visits the school garden.Type I error: The null hypothesis claims that the tomato plant is alive, and it is true, but the students make the mistake of thinking that the plant is already dead.
Type II error: The tomato plant is already dead (the null hypothesis is false), but the students do not notice it, and believe that the tomato plant is alive.
α = probability that the class thinks the tomato plant is dead when, in fact, it is alive = P(Type I error). β = probability that the class thinks the tomato plant is alive when, in fact, it is dead = P(Type II error).
Try It 9.6
Example 9.7
It's a Boy Genetic Labs, a genetics company, claims to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, H0, is: It's a Boy Genetic Labs has no effect on gender outcome.Type I error: This error results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that It's a Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, α.
Type II error: This error results when we fail to reject a false null hypothesis. In context, we would state that It's a Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, β.
Try It 9.7
Example 9.8
A certain experimental drug claims a cure rate of at least 75 percent for males with a disease. Describe both the Type I and Type II errors in context. Which error is the more serious?Type I: A patient believes the cure rate for the drug is less than 75 percent when it actually is at least 75 percent.
Type II: A patient believes the experimental drug has at least a 75 percent cure rate when it has a cure rate that is less than 75 percent.
Try It 9.8
Determine both Type I and Type II errors for the following scenario:Assume a null hypothesis, H0, that states the percentage of adults with jobs is at least 88 percent.
Identify the Type I and Type II errors from these four possible choices.
- Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when that percentage is actually less than 88 percent
- Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when the percentage is actually at least 88 percent
- Reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when the percentage is actually at least 88 percent
- Reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when that percentage is actually less than 88 percent
Distribution Needed for Hypothesis Testing
Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal). We perform tests of a population proportion using a normal distribution (usually is large).
Assumptions
When you perform a hypothesis test of a single population mean μ using a Student's t-distribution (often called a t-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. Note that if the sample size is sufficiently large, a t-test will work even if the population is not approximately normally distributed.
When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.
When you perform a hypothesis test of a single population proportion , you take a simple random sample from the population. You must meet the conditions for a binomial distribution, which are the following: there are a certain number
of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success
. The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities
and
must both be greater than five
. Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with
and
. Remember that
.