A Complete Example

This section explains linear regression, from presenting the data to using scatter plots to identify the linear pattern. It then fits a linear model using least squares estimation and addresses statistical inferences on correlation coefficient and slope parameter.


The exercises in this section are unrelated to those in previous sections.

1. The data give the amount x of silicofluoride in the water (mg/L) and the amount y of lead in the bloodstream (μg/dL) of ten children in various communities with and without municipal water. Perform a complete analysis of the data, in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find SSE, sε, and r, and so on). In the hypothesis test use as the alternative hypothesis β_1 > 0, and test at the 5% level of significance. Use confidence level 95% for the confidence interval for β_1. Construct 95% confidence and predictions intervals at x_p=2 at the end.

\begin{array}{c|ccccc}x & 0.0 & 0.0 & 1.1 & 1.4 & 1.6 \\\hline y & 0.3 & 0.1 & 4.7 & 3.2 & 5.1 \\x & 1.7 & 2.0 & 2.0 & 2.2 & 2.2 \\\hline y & 7.0 & 5.0 & 6.1 & 8.6 & 9.5\end{array}

Large Data Set Exercises

3. Large Data Sets 3 and 3A list the shoe sizes and heights of 174 customers entering a shoe store. The gender of the customer is not indicated in Large Data Set 3. However, men's and women's shoes are not measured on the same scale; for example, a size 8 shoe for men is not the same size as a size 8 shoe for women. Thus it would not be meaningful to apply regression analysis to Large Data Set 3. Nevertheless, compute the scatter diagrams, with shoe size as the independent variable (x) and height as the dependent variable (y), for (i) just the data on men, (ii) just the data on women, and (iii) the full mixed data set with both men and women. Does the third, invalid scatter diagram look markedly different from the other two?



5. Separate out from Large Data Set 3A just the data on women and do a complete analysis, with shoe size as the independent variable (x) and height as the dependent variable (y). Use α=0.05 and x_p=10 whenever appropriate.