BUS204 Study Guide

Unit 5: Estimation and Hypothesis Testing

5a. Estimate intervals over which the population parameter could exist

What is a point estimate, and how is it used to calculate a confidence interval?
What is the purpose of a confidence interval?
What is meant by, say, a 90% confidence interval as opposed to a 99% confidence interval? If we can choose, why not a 100% confidence interval? What would that require?
Is a confidence interval always symmetric?

Inferential statistics are procedures used to infer properties of the population (parameters) from a sample (statistics) of the data.

A confidence interval is a range of values in which the population parameter is likely to occur. For many statistics, it is made up of the sample statistic plus or minus some margin of error.

Confidence levels can vary, and typically we will refer to 90%, 95%, or 99% confidence intervals. A 95% confidence interval, technically speaking, means that if the sample statistic, such as the sample mean, is equal to the population mean with a particular standard deviation, then 95% of those confidence intervals will contain the true population mean when taking samples of a particular size and finding the associated confidence intervals. Some people interpret a confidence interval as "there's a 95% probability that the population mean is between x and y". That is pretty close in practice and probably good enough for our purposes, but not technically true.

The first step is to designate a point estimate (a sample mean) and then (for means and proportions) calculate a margin of error and add and subtract that from the point estimate to get your margin of error.

For other parameters, such as variance, the sampling distribution is not symmetric, so the low and high ends of the interval are not equally distant from the point estimate, so each must be calculated separately.

To review, see Computing Confidence Intervals and Confidence Intervals.

5b. Determine and differentiate between the null and alternative hypotheses in hypothesis testing

What is the difference between a null and an alternate hypothesis? Which one are we trying to prove is correct/incorrect?
What are the three directions that an alternative hypothesis can run in?

A null hypothesis is the default statement in running a hypothesis test. It is signified using the symbol H₀ and an equals sign. It is believed to be true unless there is evidence (in the form of a p-value or critical/test value pair) that it is not.

An alternate hypothesis ( $H_a$ ) is what the tester is trying to show to prove the null hypothesis false. It is generally stated as being not equal to ≠, less than < , or greater than > the null hypothesis.

A good analogy is the adage "innocent until proven guilty". A person is assumed innocent ( $H_0$ ) until proven guilty ( $H_a$ ). "Proof of innocence" is not a standard, and in the same way, we can never prove the null hypothesis true.

A hypothesis test can be right-tailed if the alternate hypothesis is greater than the null, left-tailed if the alternate hypothesis is less than the null, and two-tailed if it is in a different form. By "different", we mean significantly different – for example, you might have a null hypothesis where the mean is 100, and the test data might give you a sample mean of 100.5. That is different, but in the context of hypothesis testing, we would ask, "is the difference significant enough so that it cannot be due to random sampling error?" It is a combination of greater than or less than, and the p-value will double the right or left tail area for a two-tailed test.

To review, see Hypothesis Testing with One Sample.

5c. Identify when to use the z- and t-distributions and use these distributions to find probabilities

How do you decide whether to use a z-distribution or a t-distribution? What is the difference in the distributions and the processes used? Is more information needed when using a t-distribution? When can you not use either one?
If the sample size is small, what must be true about the population distribution regardless of what test you use?
What significance does the Central Limit Theorem have in hypothesis testing?
What is a degree of freedom, and where is it used?

When running a hypothesis test for the mean of a single population, part of the process involves finding a test value and possibly a critical value.

The standard normal (z) distribution or the "Student's t" (or just t) distribution is used, depending on two factors: whether the population's standard deviation $\sigma$ is known and what the sample size is. The rule of thumb is to use the t-distribution whenever the standard deviation is unknown, and the sample size is small. The generally accepted definition of "small" is less than 30, but you can get away with a slightly smaller sample size if you are certain that the population distribution is normal.

Recall the Central Limit Theorem. The sampling distributions of the mean are normal if you have a large enough sample or the underlying population distribution is normal. Just use caution since small sample sizes still require a normal distribution. If the population distribution is non-normal or unknown and the sample size is under 30, then neither z nor t may be used!

If a t-distribution is used, we need another parameter known as degrees of freedom. For most tests, including tests for the mean and proportion, the degree of freedom is equal to $n-1$ – that is, it is one less than the sample size.

Some resources will tell you to use the t-distribution whenever $\sigma$ is unknown, regardless of sample size, and to give a conservative estimate. However, as the sample size increases, the difference between z-distributions and t-distributions becomes insignificant anyway. In fact, it can be mathematically shown that the z-distribution is the t-distribution with infinite degrees of freedom. If you look at a t-distribution table where there are infinite degrees of freedom, you will see the critical values used for z-distributions. For sample sizes above 30, the difference between using z-distributions and t-distributions will not be very substantial.

If you truly can't decide between a z-distribution or a t-distribution for a one- or two-means test, then go with a t-distribution. It is always the more conservative option and will always work as long as the conditions for the Central Limit Theorem hold.

To review, see Distribution Needed for Hypothesis Testing.

5d. Test hypotheses of the population mean and population proportion using one or two samples

If you're using the p-value method for a two-tailed test, what must you do (to find the p-value) with the area to the right of the critical value?
In which instances can you use the Z distribution, in which cases should it be a T distribution, and in which cases can you use neither?
What is the difference between the two modes of hypothesis testing (critical value vs. p-value)? Which one is more frequently used in real situations? Why do you think this might be?

There are two general methods for running a hypothesis test:

The critical value method requires finding and pairing the critical value (defined by the significance level $\alpha$ , and the number of degrees of freedom if applicable) and the test value (achieved by a test-specific formula). If the test value falls above (right-tailed), below (left-tailed), or above the absolute value (two-tailed) of the critical value, the null hypothesis is rejected. If not, it isn't rejected.
The p-value method (the one used most often in real life) requires finding the test value as described above, then finding the probability (called the p-value) of that test value falling above (right-tailed), below (left-tailed), or above the absolute value (above the absolute value) of that test value. If the p-value is less than $\alpha$ , the null hypothesis is rejected; if not, then we fail to reject the null hypothesis.

When testing for a single population mean, use z- or t-distributions as described above.

When testing for one or two proportions, use the z-distribution. The sampling distribution of the sample proportions can be assumed normal if $np_0 > 5$ and $n(1-p_0) > 5$ , where $n$ is the sample size and $p_0$ is the hypothesized population proportion. For two proportions, this must be true for each sample, and each sample must be independent of the other. The t-distribution does not apply to proportion tests.

When testing for the difference of means, we typically use the t-distribution for all cases. The z-distribution could be used if both sample sizes are above 30 and the standard deviations are known, but the t-distribution is usually better and more conservative.

For the difference of independent means test, the degrees of freedom have a very complicated formula, but if both sample sizes are large enough, you can generally go with $n_1 + n_2 - 2$ . The formula for test value depends on whether or not you assume the standard deviations in the two populations are equal. If they are, you find the pooled (combined) standard deviation and use that instead of the individual values.

To review, see Full Hypothesis Test Examples and Hypothesis Testing with Two Samples.

5e. Define and apply the significance level, and explain its importance to hypothesis testing

What is the difference between Type I and Type II errors? Can you name examples where each one would be the one considered more serious? How would this affect the level of alpha chosen?
If the researcher is free to select the significance level ( $\alpha$ ), why not just make it as small as possible? Why not make it zero?

A hypothesis test has two possible conclusions: We either reject the null hypothesis (prove it false) or fail to reject the null hypothesis (fail to prove it false).

A Type I Error (also called alpha-level error) occurs when we incorrectly reject the null hypothesis – that is, the null hypothesis is true, and by random sampling error, we happen to get unusual data that "disproves" it. The probability of a Type I error is designated by the experimenter before the data is collected and signified by the Greek letter alpha, $\alpha$ .

A Type II Error is the probability of incorrectly rejecting the null hypothesis. In other words, there is a difference between the null hypothesis and reality, but the test fails to find that difference. Type II errors are represented by the Greek letter beta, $\beta$ .

The probability of correctly rejecting the null hypothesis, $1-\beta$ , is known as the power of the test. A real-life use is testing for a drug's effectiveness. The null hypothesis is usually "no effect", and finding an effective drug requires rejecting the null hypothesis. If the drug is effective, the power tells us the probability of finding effectiveness.

The alpha level can be selected beforehand and has the standard values of 0.01, 0.05, and 0.10. Each of these is the probability of rejecting the null hypothesis. $\alpha$ and $\beta$ are complementary in the sense that lowering the level of $\alpha$ will necessarily raise the level of $\beta$ , and vice versa, all else being equal. So, you would typically choose to lower $\alpha$ if a Type I error is more serious but raise it if a Type II error is more serious. This is a balancing act.

$\beta$ is calculated given $\alpha$ , the standard deviation, and a particular alternate value for the parameter. Calculation of $\beta$ , often expressed as a function of the alternate parameter value, is beyond the scope of this course.

To review, see Outcomes and the Type I and Type II Errors.

5f. Compute a test statistic and determine a region of acceptance based on a test statistic

How would you find a test statistic? How would you find a critical value?
How would you find the p-value for a hypothesis test by using the test statistic? What special precautions must you take in a two-tailed test?

A test statistic is a value computed from the sample data using a set formula depending on the parameter being tested and the distribution used.

A critical value is the cutoff point that will have an area $\alpha$ to the left (left-tailed test) or right (right-tailed test). If the test is two-tailed, then the critical value will have $\frac{\alpha}{2}$ area to the right OR left. It is necessary to split $\alpha$ in half because two-tailed tests need the same significance level but split into both tails instead of one. The areas found to the right, left, or outside of the critical values are called critical regions.

The critical value hypothesis test will reject the null hypothesis if the test statistic falls inside the critical region(s). This tells us that the sample data is far enough away from the hypothesized mean that there is a good probability that the two are different. If this is not the case (that is, if the test value is closer to the hypothesized mean and thus not in the critical region), we fail to reject the null hypothesis.

The p-value hypothesis test (the one used most often in real life) requires finding the test value as described above, then finding the probability (called the p-value) of that test value falling above (right-tailed), below (left-tailed), or above the absolute value of that test value. If the p-value is less than $\alpha$ , then the null hypothesis is rejected; if it is not less than $\alpha$ , then we fail to reject the null hypothesis. If you use the p-value method in a two-tailed test, find the area to the right of the positive test value and double it because you have to test in both tails.

To review, see Full Hypothesis Test Examples.

Unit 5 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

alpha (significance) level
confidence interval
critical value
critical value hypothesis test
degrees of freedom
null hypothesis
p-value hypothesis test
point estimate
pooled (combined) standard deviation
standard normal distribution
t distribution
test statistic
Type I Error
Type II Error