Confidence Intervals for Correlation and Proportion

Site: Saylor Academy
Course: MA121: Introduction to Statistics
Book: Confidence Intervals for Correlation and Proportion
Printed by: Guest user
Date: Saturday, April 20, 2024, 7:27 AM

Description

First, this section shows how to compute a confidence interval for Pearson's correlation. The solution uses Fisher's z transformation. Then, it explains the procedure to compute confidence intervals for population proportions where the sampling distribution needs a normal approximation.

Correlation

Learning Objectives


  1. State the standard error of z'
  2. Compute a confidence interval on ρ


The computation of a confidence interval on the population value of Pearson's correlation (ρ) is complicated by the fact that the sampling distribution of r is not normally distributed. The solution lies with Fisher's z' transformation described in the section on the sampling distribution of Pearson's r. The steps in computing a confidence interval for ρ are:

  1. Convert r to z'
  2. Compute a confidence interval in terms of z'
  3. Convert the confidence interval back to r.


Let's take the data from the case study Animal Research as an example. In this study, students were asked to rate the degree to which they thought animal research is wrong and the degree to which they thought it is necessary. As you might have expected, there was a negative relationship between these two variables: the more that students thought animal research is wrong, the less they thought it is necessary. The correlation based on 34 observations is -0.654. The problem is to compute a 95% confidence interval on ρ based on this r of -0.654.

The conversion of r to z' can be done using a calculator. This calculator shows that the z' associated with an r of -0.654 is -0.78.

The sampling distribution of z' is approximately normally distributed and has a standard error of

\dfrac{1}{\sqrt {N-3}}


For this example, N = 34 and therefore the standard error is 0.180. The Z for a 95% confidence interval (Z_{.95}) is 1.96, as can be found using the normal distribution calculator (setting the shaded area to .95 and clicking on the "Between" button). The confidence interval is therefore computed as:

\text {Lower limit} = -0.775 - (1.96)(0.18) = -1.13

\text {Upper limit} = -0.775 + (1.96)(0.18) = -0.43


The final step is to convert the endpoints of the interval back to r using a calculator. The r associated with a z' of -1.13 is -0.81 and the r associated with a z' of -0.43 is -0.40. Therefore, the population correlation (ρ) is likely to be between -0.81 and -0.40. The 95% confidence interval is:

-0.81 ≤ ρ ≤ -0.40

To calculate the 99% confidence interval, you use the Z for a 99% confidence interval of 2.58 as follows:

\text {Lower limit} = -0.775 - (2.58)(0.18) = -1.24

\text {Upper limit} = -0.775 + (2.58)(0.18) = -0.32

Converting back to r, the confidence interval is:

-0.84 ≤ ρ ≤ -0.31

Naturally, the 99% confidence interval is wider than the 95% confidence interval.

R code:
install.packages("psychometric")
    library(psychometric)

    CIr(r=-.654, n = 34, level = .95)
[1] -0.8124778 -0.4055190

CIr(r=-.654, n = 34, level = .99)
[1] -0.8468443 -0.3091669

Source: David M. Lane, https://onlinestatbook.com/2/estimation/correlation_ci.html
Public Domain Mark This work is in the Public Domain.

Video

 

 

Questions

Question 1 out of 3.
Select all of the following choices that are possible confidence intervals on the population value of Pearson's correlation:
(-0.4, 0.6)

(0.3, 0.5)

(-0.85, -0.47)

(0.72, 1.2)


Question 2 out of 3.
A sample of 28 was taken from a population, and r = .45. What is the 95% confidence interval for the population correlation?
(.058, .842)

(.093, .877)

(.058, .687)

(.093, .705)


Question 3 out of 3.
The sample correlation is -0.8. If the sample size was 40, then the 99% confidence interval states that the population correlation lies between -.909 and

Answers


  1. All of them are possible except for (0.72, 1.2). The population correlation cannot be above 1.

  2. The corresponding z' for r = .45 is .485. The standard error = 1/sqrt(28-3) = .20. The Z for a 95% confidence interval is 1.96. Thus, the upper limit of the confidence interval is .485 + (1.96)(.20). You get .877. The lower limit of the confidence interval is .485 - (1.96)(.20). You get .093. Convert back to r and you get (.093, .705).

  3. The corresponding z' for r = -.8 is -1.099. The standard error = 1/sqrt(40-3) = .164. The Z for a 99% confidence interval is 2.58. Thus, the upper limit of the confidence interval is -1.099 + (2.58)(.164). You get -.676. Convert back to r and you get -.589.

Proportion

Learning Objectives


  1. Estimate the population proportion from sample proportions
  2. Apply the correction for continuity
  3. Compute a confidence interval


A candidate in a two-person election commissions a poll to determine who is ahead. The pollster randomly chooses 500 registered voters and determines that 260 out of the 500 favor the candidate. In other words, 0.52 of the sample favors the candidate. Although this point estimate of the proportion is informative, it is important to also compute a confidence interval. The confidence interval is computed based on the mean and standard deviation of the sampling distribution of a proportion. The formulas for these two parameters are shown below:

μp = π

  \sigma_ p = \sqrt {\dfrac{ \pi (1- \pi) }{N}}
Since we do not know the population parameter π, we use the sample proportion p as an estimate. The estimated standard error of p is therefore

  \S_p = \sqrt {\dfrac{ p (1- p) }{N}}

We start by taking our statistic (p) and creating an interval that ranges (Z_{.95})(s_p) in both directions, where Z_{.95} is the number of standard deviations extending from the mean of a normal distribution required to contain 0.95 of the area (see the section on the confidence interval for the mean). The value of Z_{.95} is computed with the normal calculator and is equal to 1.96. We then make a slight adjustment to correct for the fact that the distribution is discrete rather than continuous.

Normal Distribution Calculator

s_p is calculated as shown below:

  \S_p = \sqrt {\dfrac{ .52 (1- .52) }{500}} = 0.0223

To correct for the fact that we are approximating a discrete distribution with a continuous distribution (the normal distribution), we subtract 0.5/N from the lower limit and add 0.5/N to the upper limit of the interval. Therefore the confidence interval is

 p \pm Z_.95 \sqrt {\dfrac{p(1-p)}{N}} \pm \dfrac{0.5}{N}

Lower limit: 0.52 - (1.96)(0.0223) - 0.001 = 0.475
Upper limit: 0.52 + (1.96)(0.0223) + 0.001 = 0.565

0.475 ≤ π ≤ 0.565

Since the interval extends 0.045 in both directions, the margin of error is 0.045. In terms of percent, between 47.5% and 56.5% of the voters favor the candidate and the margin of error is 4.5%. Keep in mind that the margin of error of 4.5% is the margin of error for the percent favoring the candidate and not the margin of error for the difference between the percent favoring the candidate and the percent favoring the opponent. The margin of error for the difference is 6.36%, the square root of 2 times the margin of error for the individual percent. Keep this in mind when you hear reports in the media; the media often get this wrong.


R code:

prop.test(260,500,correct=TRUE)

1-sample proportions test with continuity correction
data: 260 out of 500, null probability 0.5
X-squared = 0.722, df = 1, p-value = 0.3955
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.4752277 0.5644604
sample estimates:
p
0.52

Video

 

 

Questions

Question 1 out of 3.
Why do we subtract 0.5/N from the lower limit and add 0.5/N to the upper limit when computing a confidence interval for the population proportion?
We need to correct for the fact that we are approximating a discrete distribution (the sampling distribution of p) with a continuous distribution (the normal distribution).

The estimate of the population proportion is slightly biased, and we need to correct for it.

The estimate


Question 2 out of 3.
The newspaper conducted a survey and asked some of the city's voters which candidate they preferred for mayor. The surveyors computed a 95% confidence interval and found that the percent of the voters in the city who prefer Candidate A ranges from 51% to 59%. What is the margin of error (as a percent)?

Question 3 out of 3.
A researcher was interested in knowing how many people in the city supported a new tax. She sampled 100 people from the city and found that 40% of these people supported the tax. What is the upper limit of the 95% confidence interval on the population proportion?

Answers


  1. We make these corrections because we approximate a discrete distribution with a continuous one.

  2. Because the confidence interval ranges from 51% to 59%, the newspaper must have found that 55% of their sample prefer Candidate A.

  3. Because the confidence interval extends 4% in both directions, the margin of error is 4%. The standard error is sqrt[(.4)(.6)/100] = .0490. The correction = .5/100 = .005. Thus, the upper limit of the 95% confidence interval is: .4 + (1.96)(.049) + .005 = .501. Clearly, there is a large margin of error.