Confidence Intervals: Introduction | Saylor Academy

Introduction

Figure 8.1 Have you ever wondered what the average number of M&Ms in a bag at the grocery store is? You can use confidence intervals to answer this question.

Suppose you were trying to determine the mean rent of a two-bedroom apartment in your town. You might look in the classified section of the newspaper, write down several rents listed, and average them together. You would have obtained a point estimate of the true mean. If you are trying to determine the percentage of times you make a basket when shooting a basketball, you might count the number of shots you make and divide that by the number of shots you attempted. In this case, you would have obtained a point estimate for the true proportion the parameter p in the binomial probability density function.

We use sample data to make generalizations about an unknown population. This part of statistics is called inferential statistics. The sample data help us to make an estimate of a population parameter. We realize that the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals. What statistics provides us beyond a simple average, or point estimate, is an estimate to which we can attach a probability of accuracy, what we will call a confidence level. We make inferences with a known level of probability.

In this chapter, you will learn to construct and interpret confidence intervals. You will also learn a new distribution, the Student's-t, and how it is used with these intervals. Throughout the chapter, it is important to keep in mind that the confidence interval is a random variable. It is the population parameter that is fixed.

If you worked in the marketing department of an entertainment company, you might be interested in the mean number of songs a consumer downloads a month from iTunes. If so, you could conduct a survey and calculate the sample mean, $\overline x$ , and the sample standard deviation, s. You would use $\overline x$ to estimate the population mean and s to estimate the population standard deviation. The sample mean, $\overline x$ , is the point estimate for the population mean, μ. The sample standard deviation, s, is the point estimate for the population standard deviation, σ.

$\overline x$ and s are each called a statistic.

A confidence interval is another type of estimate but, instead of being just one number, it is an interval of numbers. The interval of numbers is a range of values calculated from a given set of sample data. The confidence interval is likely to include the unknown population parameter.

Suppose, for the iTunes example, we do not know the population mean μ, but we do know that the population standard deviation is σ = 1 and our sample size is 100. Then, by the central limit theorem, the standard deviation of the sampling distribution of the sample means is

$\dfrac{σ}{\sqrt n}=\dfrac{1}{\sqrt{100}}=0.1$ .

The Empirical Rule, which applies to the normal distribution, says that in approximately 95% of the samples, the sample mean, $\overline x$ , will be within two standard deviations of the population mean μ. For our iTunes example, two standard deviations is (2)(0.1) = 0.2. The sample mean $\overline x$ is likely to be within 0.2 units of μ.

Because $\overline x$ is within 0.2 units of μ, which is unknown, then μ is likely to be within 0.2 units of $\overline x$ with 95% probability. The population mean μ is contained in an interval whose lower number is calculated by taking the sample mean and subtracting two standard deviations (2)(0.1) and whose upper number is calculated by taking the sample mean and adding two standard deviations. In other words, μ is between $\overline x$ − 0.2 and $\overline x$ + 0.2 in 95% of all the samples.

For the iTunes example, suppose that a sample produced a sample mean $\overline x$ = 2. Then with 95% probability the unknown population mean μ is between

$\overline x−0.2=2−0.2=1.8$ and $\overline x+0.2=2+0.2=2.2$

We say that we are 95% confident that the unknown population mean number of songs downloaded from iTunes per month is between 1.8 and 2.2. The 95% confidence interval is (1.8, 2.2). Please note that we talked in terms of 95% confidence using the empirical rule. The empirical rule for two standard deviations is only approximately 95% of the probability under the normal distribution. To be precise, two standard deviations under a normal distribution is actually 95.44% of the probability. To calculate the exact 95% confidence level we would use 1.96 standard deviations.

The 95% confidence interval implies two possibilities. Either the interval (1.8, 2.2) contains the true mean μ, or our sample produced an x– that is not within 0.2 units of the true mean μ. The first possibility happens for 95% of well-chosen samples. It is important to remember that the second possibility happens for 5% of samples, even though correct procedures are followed.

Remember that a confidence interval is created for an unknown population parameter like the population mean, μ.

For the confidence interval for a mean the formula would be:

$μ=\overline X ± Z_α^σ / \sqrt n$

Or written another way as:

$\overline X−Z_α^σ / \sqrt n ≤ μ ≤ \overline X +Z_α^σ / \sqrt n$

Where $\overline X$ is the sample mean. $Zα$ is determined by the level of confidence desired by the analyst, and $σ / \sqrt n$ is the standard deviation of the sampling distribution for means given to us by the Central Limit Theorem.

Source: OpenStax, https://openstax.org/books/introductory-business-statistics/pages/8-introduction
This work is licensed under a Creative Commons Attribution 4.0 License.

Course Introduction

Course Syllabus

Unit 1: Introduction to Statistical Analysis

1.1: Why Do We Need to Study Statistical Analysis as Part of a Business Program?

Why Do We Need to Study Statistical Analysis as Part of a Business Program?

1.2: Measuring Data

Definitions of Statistics, Probability, and Key Terms

Kinds of Data in Statistics

1.3: Measures of Spread and Data

Variance and Standard Deviation

Descriptive Statistics

1.4: Spreadsheet Exercises: Measures of Central Tendency and Spread

Measures of Central Tendency: Mode, Median, Mean, and Midrange

Practice: Measures of Middle and Spread

1.5: Spreadsheet Exercises: Graphs of Histograms and Frequency Tables

Graphs and Charts

Shapes of Distributions

Unit 1 Problem Set and Assessment

Descriptive Statistics Homework

Unit 1 Assessment

Unit 2: Counting, Probability, and Probability Distributions

2.1: Counting

Terminology

Counting and Probability

2.2: Theories of Probability

Independent and Mutually Exclusive Events

Venn Diagrams

2.3: Set Theory

Probability with Playing Cards and Venn Diagrams

Addition Rule for Probability

Two Basic Rules of Probability

Set Theory

2.4: Probability Fundamentals

Properties of Continuous Probability Density Functions

Probability Fundamentals

2.5: Probability Distributions and the Binomial Distribution

Probability Density Functions and Random Variables

Discrete Random Variables

Probability Distributions

Unit 2 Problem Set and Assessment

Probability Homework

Discrete Random Variables Homework

Unit 2 Assessment

Unit 3: The Normal Distribution

3.1: The Normal Distribution

Qualitative Sense of Normal Distributions

The Central Limit Theorem

3.2: Practice Problems

Normal Distribution Problems: Z-score

More Empirical Rule and Z-score Practice

Unit 3 Problem Set and Assessment

Practice: The Normal Distribution

Unit 3 Assessment

Unit 4: Sampling and Sampling Distributions

4.1: Sampling and Sampling Distributions

Sampling Distribution of the Sample Mean

Calculating the Sample Size n: Continuous and Binary Random Variables

Sampling and Sampling Distributions

Unit 4 Problem Set and Assessment

Sampling and Data Homework

Unit 4 Assessment

Unit 5: Estimation and Hypothesis Testing

5.1: Estimation and Confidence Intervals

Confidence Intervals

Confidence Intervals and Estimating Parameters

Computing Confidence Intervals

5.2: Hypothesis Testing

Hypothesis Testing and P-values

Hypothesis Testing with One Sample

Hypothesis Testing

5.3: Testing Equality of Two Percentages

Comparing Population Proportions

Hypothesis Testing with Two Samples

Hypothesis Testing

5.4: The Chi-Squared Test for Goodness of Fit

Introduction to the Chi-Square Distribution

The Chi-Square Distribution

Unit 5 Problem Set and Assessment

Confidence Intervals Homework

Unit 5 Assessment