Problem 1
Height and weight data The table below and in the data file htwt. \(t x t\) gives \(H t=\) height in centimeters and \(W t=\) weight in kilograms for a sample of \(n=10\) 18-year-old girls. The data are taken from a larger study described in Problem 3.1. Interest is in predicting weight from height. a. Draw a scatter plot of \(W t\) on the vertical axis versus \(H t\) on the horizontal axis. On the basis of this plot, does a simple linear regression model make sense for these data? Why or why not? b. Show that \(\bar{x}=165.52, \bar{y}=59.47, S X X=472.076, S Y Y=731.961\) and \(S X Y=274.786 .\) Compute estimates of the slope and the intercept for the regression of \(Y\) on \(X\). Draw the fitted line on your scatterplot. c. Obtain the estimate of \(\sigma^{2}\) and find the estimated standard errors of \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1} .\) Also find the estimated covariance between \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\) Compute the \(t\) -tests for the hypotheses that \(\beta_{0}=0\) and that \(\beta_{1}=0\) and find the appropriate \(p\) -values using two-sided tests. d. Obtain the analysis of variance table and \(F\) -test for regression. Show numerically that \(F=t^{2}\), where \(t\) was computed in Problem 2.1 .3 for testing \(\beta_{1}=0\) $$\begin{array}{cc} \hline H t & W t \\ \hline 169.6 & 71.2 \\ 166.8 & 58.2 \\ 157.1 & 56.0 \\ 181.1 & 64.5 \\ 158.4 & 53.0 \\ 165.6 & 52.4 \\ 166.7 & 56.8 \\ 156.5 & 49.2 \\ 168.1 & 55.6 \\ 165.3 & 77.8 \\ \hline \end{array}$$
Problem 2
More with Forbes' data An alternative approach to the analysis of Forbes experiments comes from the Clausius-Clapeyron formula of classical thermodynamics, which dates to Clausius (1850). According to this theory, we should find that $$\mathrm{E}(\text {Lpres} | \text {Temp})=\beta_{0}+\beta_{1} \frac{1}{\text {Ktemp}}$$ where Ktemp is temperature in degrees Kelvin, which equals 255.37 plus \((5 / 9) \times\) Temp. If we were to graph this mean function on a plot of Lpres versus Ktemp, we would get a curve, not a straight line. However, we can estimate the parameters \(\beta_{0}\) and \(\beta_{1}\) using simple linear regression methods by defining \(u_{1}\) to be the inverse of temperature in degrees Kelvin Then the mean function (2.27) can be rewritten as \\[ \mathrm{E}(\text {Lpres} | \text {Temp})=\beta_{0}+\beta_{1} u_{1} \\] for which simple linear regression is suitable. The notation we have used in (2.28) is a little different, as the left side of the equation says we are conditioning on Temp, but the variable Temp does not appear explicitly on the right side of the equation. a. Draw the plot of Lpres versus \(u_{1},\) and verify that apart from case 12 the 17 points in Forbes' data fall close to a straight line. b. Compute the linear regression implied by \((2.28),\) and summarize your results. c. We now have two possible models for the same data based on the regression of Lpres on Temp used by Forbes, and (2.28) based on the Clausius-Clapeyron formula. To compare these two, draw the plot of the fitted values from Forbes' mean function fit versus the fitted values from \((2.28) .\) On the basis of these and any other computations you think might help, is it possible to prefer one approach over the other? Why? d. In his original paper, Forbes provided additional data collected by the botanist Dr. Joseph Hooker on temperatures and boiling points measured often at higher altitudes in the Himalaya Mountains. The data for \(n=31\) locations is given in the file hooker.txt. Find the estimated mean function (2.28) for Hooker's data. $$u_{1}=\frac{1}{K t e m p}=\frac{1}{(5 / 9) T e m p+255.37}$$ e. This problem is not recommended unless you have access to a package with a programming language, like \(R,\) S-plus, Mathematica, or SAS IML. For each of the cases in Hooker's data, compute the predicted values \(\hat{y}\) and the standard error of prediction. Then compute \(z=(L p r e s-\hat{y}) /\) sepred. Each of the \(z \mathrm{s}\) is a random variable, but if the model is correct, each has mean zero and standard deviation close to one. Compute the sample mean and standard deviation of the \(z\) s, and summarize results. f. Repeat Problem \(2.2 .5,\) but this time predict and compute the \(z\) -scores for the 17 cases in Forbes data, again using the fitted mean function from Hooker's data. If the mean function for Hooker's data applies to Forbes' data, then each of the \(z\) -scores should have zero mean and standard deviation close to one. Compute the \(z\) -scores, compare them to those in the last problem and comment on the results
Problem 3
Deviations from the mean Sometimes it is convenient to write the simple linear regression model in a different form that is a little easier to manipulate. Taking equation \((2.1),\) and adding \(\beta_{1} \bar{x}-\beta_{1} \bar{x},\) which equals zero, to the right-hand side, and combining terms, we can write $$\begin{aligned} y_{i} &=\beta_{0}+\beta_{1} \bar{x}+\beta_{1} x_{i}-\beta_{1} \bar{x}+e_{i} \\\ &=\left(\beta_{0}+\beta_{1} \bar{x}\right)+\beta_{1}\left(x_{i}-\bar{x}\right)+e_{i} \\ &=\alpha+\beta_{1}\left(x_{i}-\bar{x}\right)+e_{i} \end{aligned}$$ where we have defined \(\alpha=\beta_{0}+\beta_{1} \bar{x} .\) This is called the deviations from the sample average form for simple regression. a. What is the meaning of the parameter \(\alpha ?\) b. Show that the least squares estimates are $$\hat{\alpha}=\bar{y}, \quad \hat{\beta}_{1} \text { as given by }(2.5)$$ c. Find expressions for the variances of the estimates and the covariance between them.
Problem 4
Heights of mothers and daughters a. For the heights data in the file heights.txt, compute the regression of Dheight on Mheight, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Give the analysis of variance table that tests the hypothesis that \(\mathrm{E}(\text { Dheight } | \text {Mheight})=\beta_{0}\) versus the alternative that E (Dheight|Mheight) \(=\beta_{0}+\beta_{1}\) Mheight, and write a sentence or two that summarizes the results of these computations. b. Write the mean function in the deviations from the mean form as in Problem 2.3. For this particular problem, give an interpretation for the value of \(\beta_{1} .\) In particular, discuss the three cases of \(\beta_{1}=1, \beta_{1}<1\) and \(\beta_{1}>1 .\) Obtain a \(99 \%\) confidence interval for \(\beta_{1}\) from the data c. Obtain a prediction and \(99 \%\) prediction interval for a daughter whose mother is 64 inches tall.
Problem 7
Regression through the origin Occasionally, a mean function in which the intercept is known \(a\) priori to be zero may be fit. This mean function is given by $$\mathrm{E}(y | x)=\beta_{1} x$$ The residual sum of squares for this model, assuming the errors are independent with common variance \(\sigma^{2},\) is \(R S S=\sum\left(y_{i}-\hat{\beta}_{1} x_{i}\right)^{2}\) a. Show that the least squares estimate of \(\beta_{1}\) is \(\hat{\beta}_{1}=\sum x_{i} y_{i} / \sum x_{i}^{2}\) Show that \(\hat{\beta}_{1}\) is unbiased and that \(\operatorname{Var}\left(\hat{\beta}_{1}\right)=\sigma^{2} / \sum x_{i}^{2} .\) Find an expression for \(\hat{\sigma}^{2} .\) How many df does it have? b. Derive the analysis of variance table with the larger model given by \((2.16),\) but with the smaller model specified in \((2.30) .\) Show that the \(F\) -test derived from this table is numerically equivalent to the square of the \(t\) -test (2.23) with \(\beta_{0}^{*}=0\) c. The data in Table 2.6 and in the file snake.txt give \(X=\) water content of snow on April 1 and \(Y=\) water yield from April to July in inches in the Snake River watershed in Wyoming for \(n=17\) years from 1919 to 1935 (from \(\mathrm{Wilm}, 1950\) ). Fit a regression through the origin and find \(\hat{\beta}_{1}\) and \(\sigma^{2}\). Obtain a \(95 \%\) confidence interval for \(\beta_{1} .\) Test the hypothesis that the intercept is zero. d. Plot the residuals versus the fitted values and comment on the adequacy of the mean function with zero intercept. In regression through the origin, \(\sum \hat{e}_{i} \neq 0\)