/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 27 Does exposure to air pollution r... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

Does exposure to air pollution result in decreased life expectancy? This question was examined in the article "Does Air Pollution Shorten Lives?" (Statistics and Public Policy, Reading, MA, Addison-Wesley, 1977). Data on $$ \begin{aligned} y &=\text { total mortality rate }(\text { deaths per } 10,000) \\ x_{1} &=\text { mean suspended particle reading }\left(\mu \mathrm{g} / \mathrm{m}^{3}\right) \\ x_{2} &=\text { smallest sulfate reading }\left(\left[\mu \mathrm{g} / \mathrm{m}^{3}\right] \times 10\right) \\ x_{3} &=\text { population density }\left(\text { people } / \mathrm{mi}^{2}\right) \\ x_{4} &=\text { (percent nonwhite) } \times 10 \\ x_{5} &=\text { (percent over } 65) \times 10 \end{aligned} $$ for the year 1960 was recorded for \(n=117\) randomly selected standard metropolitan statistical areas. The estimated regression equation was $$ \begin{aligned} y=& 19.607+.041 x_{1}+.071 x_{2} \\ &+.001 x_{3}+.041 x_{4}+.687 x_{5} \end{aligned} $$ a. For this model, \(R^{2}=.827\). Using a \(.05\) significance level, perform a model utility test. b. The estimated standard deviation of \(\hat{\beta}_{1}\) was 016 . Calculate and interpret a \(90 \%\) CI for \(\beta_{1}\). c. Given that the estimated standard deviation of \(\hat{\beta}_{4}\) is \(.007\), determine whether percent nonwhite is an important variable in the model. Use a .01 significance level. d. In 1960 , the values of \(x_{1}, x_{2}, x_{3}, x_{4}\), and \(x_{5}\) for Pittsburgh were \(166,60,788,68\), and 95 , respectively. Use the given regression equation to predict Pittsburgh's mortality rate. How does your prediction compare with the actual 1960 value of 103 deaths per 10,000 ?

Short Answer

Expert verified
The model is useful and significant variables are identified. Pittsburgh's predicted mortality rate is higher than the actual 1960 value.

Step by step solution

01

Understanding the Model Utility Test

The model utility test checks if the regression model is useful. It involves testing the hypothesis \( H_0: \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = 0 \) against \( H_a: \) at least one \( \beta_i eq 0 \). The test statistic is \( F = \frac{MSR}{MSE} \), where MSR is the mean square regression and MSE is the mean square error. Given \( R^2 = 0.827 \), the F-statistic can be calculated as \( F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)} \). Here, \( n = 117 \) and \( k = 5 \).
02

Calculate the F-Statistic

Calculate the F-statistic using the formula: \( F = \frac{0.827 / 5}{(1 - 0.827) / (117 - 5 - 1)} \). This results in \( F \approx 105.63 \). We compare this with the critical value from the F-distribution table for \( df_1 = 5 \) and \( df_2 = 111 \) at a \(.05\) significance level.
03

Conclusion of Model Utility Test

The F-critical value for \(.05\) significance level and \( (5, 111) \) degrees of freedom is approximately \(2.31\). Since \( F \approx 105.63 \) is much larger than \(2.31\), we reject \( H_0 \) and conclude that the model is useful.
04

Calculate 90% Confidence Interval for \(\beta_1\)

The confidence interval is calculated as \( b_1 \pm t_{\alpha/2} \cdot s_{b_1} \), where \( b_1 = 0.041 \) and \( s_{b_1} = 0.016 \). The critical t-value at \(90\%\) confidence level and \( df = 111 \) is approximately \( 1.658 \). Hence, the CI is \( 0.041 \pm 1.658 \cdot 0.016 \), which results in \([0.014, 0.068]\).
05

Interpretation of Confidence Interval for \(\beta_1\)

The 90% confidence interval for \( \beta_1 \) \([0.014, 0.068]\) does not contain zero, indicating a significant positive effect of mean suspended particles on mortality rate at 90% confidence level.
06

Test the Importance of \( \beta_4 \)

To test the significance of \(\beta_4\), use the test statistic \( t = \frac{b_4}{s_{b_4}} = \frac{0.041}{0.007} \approx 5.857 \). Compare this value with the critical value \(t_{\alpha/2} = 2.576\) for \(.01\) significance level and \(111\) degrees of freedom.
07

Conclusion on the Importance of \( \beta_4 \)

Since \(t \approx 5.857\) is greater than \(2.576\), we reject the null hypothesis that \( \beta_4 = 0 \). This suggests that the percent nonwhite is a significant variable in the model.
08

Predict Mortality Rate for Pittsburgh

Substitute the Pittsburgh values into the regression equation: \( y = 19.607 + 0.041 \cdot 166 + 0.071 \cdot 60 + 0.001 \cdot 788 + 0.041 \cdot 68 + 0.687 \cdot 95 \), resulting in \( y \approx 120.478 \).
09

Compare Prediction to Actual Value

The predicted mortality rate for Pittsburgh is \(120.478\) deaths per 10,000, which is higher than the actual rate of \(103\) deaths per 10,000 in 1960.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Model Utility Test
A regression model's utility can be assessed using the Model Utility Test. It helps us understand if our model contributes meaningful insights beyond random chance. The test involves evaluating a null hypothesis, which assumes that all the coefficients of the independent variables are zero (i.e., the variables do not affect the outcome variable). Conversely, the alternative hypothesis posits that at least one coefficient is not zero, implying that the model is useful.

To conduct the test, we use the F-statistic, calculated through the formula: \[F = \frac{MSR}{MSE}\] where \(MSR\) is the mean square regression and \(MSE\) is the mean square error. A higher F-statistic value than the critical value from the F-distribution table suggests rejecting the null hypothesis. This means the model is statistically significant and useful for prediction or inference.

In the context of the given exercise, a high F-statistic value (approximately 105.63) compared to the critical value (approximately 2.31 at a 0.05 significance level) indicates that the regression model for mortality rate considering air pollution factors is indeed useful.
Confidence Interval
A confidence interval provides a range of values within which we can be reasonably sure the true parameter lies. It reflects the precision and uncertainty about the estimated parameter. For instance, a 90% confidence interval indicates that 90 out of 100 such intervals would contain the true parameter value.

The confidence interval for a regression coefficient \( \beta \) is given by: \[b \pm t_{\alpha/2} \cdot s_{b}\]where \(b\) is the sample estimate of \(\beta\), \(t_{\alpha/2}\) is the critical t-value, and \(s_{b}\) is the standard error of the estimate.

For example, the confidence interval for \(\beta_1\) in the exercise is \([0.014, 0.068]\). This interval does not include zero, implying a significant and positive influence of suspended particles on mortality rate in the study at a 90% confidence level. This means we can be fairly confident that there exists a positive relationship between air pollution and mortality, at least in the sampled data from 1960.
F-Statistic
The F-statistic is a crucial component of the Model Utility Test in regression analysis. It measures the overall significance of the regression model by comparing the model's explained variance with the residual variance.

The calculation for the F-statistic involves \[F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)}\]where \(R^2\) represents the proportion of variance in the dependent variable that is predictable from the independent variables, \(k\) is the number of predictors, and \(n\) is the sample size.

A large F-statistic, compared to the critical value from an F-distribution table (based on the given degrees of freedom), signifies that the regression model provides a better fit to the data than a model containing no predictors. In the provided exercise, the calculated F-statistic was significantly larger than the critical value, affirming the overall utility of the model in explaining mortality rates.
Significance Testing
Significance testing in regression analysis involves determining whether the coefficients of the regression model differ significantly from zero. This process often uses t-tests to assess the importance of each independent variable's contribution to the model.

The null hypothesis generally states that a coefficient (\(\beta\)) is equal to zero, implying no effect of the corresponding predictor. We compute the test statistic as \[t = \frac{b}{s_b}\]where \(b\) is the estimated coefficient, and \(s_b\) is its standard error.

In hypothesis testing, if the calculated t-value exceeds the critical t-value from the t-distribution table, the null hypothesis can be rejected at a specified significance level (e.g., 0.01). For \(\beta_4\) in the exercise, the t-value was found to be approximately 5.857, which is greater than the critical value of 2.576, indicating that the percentage of nonwhite residence significantly impacts mortality rates. This rejection of the null hypothesis signifies the variable's importance in the regression model, enhancing our understanding of factors affecting mortality.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

The article "'The Influence of Honing Process Parameters on Surface Quality, Productivity, Cutting Angle, and Coefficient of Friction" (Industrial Lubrication and Tribology, 2012: 77-83) included the following data on \(x_{1}=\) cutting speed \((\mathrm{m} / \mathrm{s}), x_{2}=\) specific pressure of pre-honing process \(\left(\mathrm{N} / \mathrm{mm}^{2}\right), x_{3}=\) specific pressure of finishing honing process, and \(y=\) productivity in the honing process ( \(\mathrm{mm}^{3} / \mathrm{s}\) for a particular tool; productivity is the volume of the material cut in a second.a. The article proposed a multivariate power model \(Y=\alpha x_{1}^{\beta_{1}} x x_{2}^{\beta_{2}} x_{3}^{\beta_{i}} \epsilon\). The implied linear regression model involves regressing \(\ln (y)\) against the three predictors \(\ln \left(x_{1}\right), \ln \left(x_{2}\right)\), and \(\ln \left(x_{3}\right)\). Partial Minitab output from fitting this latter model is as follows (the corresponding estimated power regression function appeared in the cited article). Carry out the model utility test at significance level \(.05\). b. The large \(P\)-value corresponding to the \(t\) ratio for \(\ln \left(x_{2}\right)\) suggests that this predictor can be eliminated from the model. Doing so and refitting yields the following Minitab output. c. Fit the simple linear regression model implied by your conclusion in (b) to the transformed data, and carry out a test of model utility. d. The standardized residuals from the fit referred to in (c) are .03,.33. \(1.69, .33,-.49, .96, .57, .33,-, 25\), \(-1.28, .29,-2.26\). Plot these against \(\ln \left(x_{1}\right)\). What does the pattern suggest? e. Fitting a quadratic regression model to relate \(\ln (y)\) to \(\ln \left(x_{1}\right)\) gave the following Minitab output. Carry out a test of model utility at significance level \(.05\) (the pattern in residual plots is satisfactory). Then use the fact that \(s_{\ln \left(\tilde{Y}^{\prime}\right)}=.0178\left[Y^{\prime}=\ln (Y)\right]\) when \(x_{1}=1\) to obtain a \(95 \%\) prediction interval for productivity.

If there is at least one \(x\) value at which more than one observation has been made, there is a formal test procedure for testing \(H_{0}: \mu_{\gamma-x}=\beta_{0}+\beta_{1} x\) for some values \(\beta_{0}, \beta_{1}\) (the true regression function is linear) versus \(H_{a}: H_{0}\) is not true (the true regression function is not linear) Suppose observations are made at \(x_{1}, x_{2}, \ldots, x_{c}\). Let \(Y_{11}, Y_{12}, \ldots, Y_{\mathrm{Ln}_{\mathrm{i}}}\) denote the \(n_{1}\) observations when \(x=x_{1} ; \ldots ; Y_{c 1}, Y_{c 2}, \ldots, Y_{c a}\) denote the \(n_{c}\) observations when \(x=x_{c^{-}}\)With \(n=\Sigma n_{j}\) (the total number of observations), SSE has \(n-2\) df. We break SSE into two pieces, SSPE (pure error) and SSLF (lack of fit), as follows: $$ \begin{aligned} \mathrm{SSPE} &=\sum_{i} \sum_{j}\left(Y_{i j}-\bar{Y}_{i-}\right)^{2} \\ &=\sum \sum Y_{i j}^{2}-\sum n_{i} \bar{Y}_{i-}^{2} \\ \mathrm{SSLF} &=\mathrm{SSE}-\mathrm{SSPE} \end{aligned} $$ The \(n_{i}\) observations at \(x_{i}\) contribute \(n_{i}-1\) df to SSPE, so the number of degrees of freedom for SSPE is \(\Sigma_{i}\left(n_{i}-1\right)=n-c\), and the degrees of freedom for SSLF is \(n-2-(n-c)=c-2\). Let MSPE \(=\operatorname{SSPE} /(n-c)\) and MSLF \(=S S L F /(c-2)\). Then it can be shown that whereas \(E(\mathrm{MSPE})=\sigma^{2}\) whether or not \(H_{0}\) is true, \(E(\mathrm{MSLF})=\sigma^{2}\) if \(H_{0}\) is true and \(E(\mathrm{MSLF})>\sigma^{2}\) if \(H_{0}\) is false. The test statistic is \(F=\) MSLF/MSPE, and the corresponding \(P\)-value is the area under the \(F_{c-2, n-c}\) curve to the right of \(f\). The following data comes from the article "Changes in Growth Hormone Status Related to Body Weight of Growing Cattle" (Growth, 1977: 241-247), with \(x=\) body weight and \(y=\) metabolic clearance rate/body weight. $$ \begin{array}{l|ccccccc} x & 110 & 110 & 110 & 230 & 230 & 230 & 360 \\ \hline y & 235 & 198 & 173 & 174 & 149 & 124 & 115 \\ x & 360 & 360 & 360 & 505 & 505 & 505 & 505 \\ \hline y & 130 & 102 & 95 & 122 & 112 & 98 & 96 \end{array} $$ a. Test \(H_{0}\) versus \(H_{a}\) at level 05 using the lack-of-fit test just described. b. Does a scatterplot of the data suggest that the relationship between \(x\) and \(y\) is linear? How does this compare with the result of part (a)? (A nonlinear regression function was used in the article.)

Cardiorespiratory fitness is widely recognized as a major component of overall physical well-being. Direct measurement of maximal oxygen uptake (VO \(\mathrm{VO}_{2}\) max \()\) is the single best measure of such fitness, but direct measurement is time-consuming and expensive. It is therefore desirable to have a prediction equation for \(\mathrm{VO}_{2} \max\) in terms of easily obtained quantities. Consider the variables $$ \begin{aligned} &y=\mathrm{VO}_{2} \max (\mathrm{L} / \mathrm{min}) \quad x_{1}=\text { weight }(\mathrm{kg}) \\ &x_{2}=\text { age }(\mathrm{yr}) \\ &x_{3}=\text { time necessary to walk } 1 \text { mile (min) } \\ &x_{4}=\text { heart rate at the end of the walk (beats/min) } \\ &\text { Here is one possible model, for male students, consistent } \\ &\text { with the information given in the article "Validation of } \\ &\text { the Rockport Fitness Walking Test in College Males } \\ &\text { and Females" (Research Quarterly for Exercise and } \\ &\text { Sport, } 1994: 152-158): \\ &Y=5.0+.01 x_{1}-.05 x_{2}-.13 x_{3}-.01 x_{4}+\epsilon \\ &\sigma=.4 \end{aligned} $$ a. Interpret \(\beta_{1}\) and \(\beta_{3}\). b. What is the expected value of \(\mathrm{VO}_{2} \max\) when weight is \(76 \mathrm{~kg}\), age is 20 yr, walk time is \(12 \mathrm{~min}\), and heart rate is \(140 \mathrm{~b} / \mathrm{m}\) ? c. What is the probability that \(\mathrm{VO}_{2} \max\) will be between \(1.00\) and \(2.60\) for a single observation made when the values of the predictors are as stated in part (b)?

An aeronautical engineering student carried out an experiment to study how \(y=\) lift/drag ratio related to the variables \(x_{1}=\) position of a certain forward lifting surface relative to the main wing and \(x_{2}=\) tail placement relative to the main wing, obtaining the following data (Statistics for Engineering Problem Solving, p. 133\():\)a. Fitting the first-order model gives \(\mathrm{SSE}=5.18\), whereas including \(x_{3}=x_{1} x_{2}\) as a predictor results in \(\mathrm{SSE}=\) 3.07. Calculate and interpret the coefficient of multiple determination for each model. b. Carry out a test of model utility using \(\alpha=.05\) for each of the models described in part (a). Does either result surprise you?

A trucking company considered a multiple regression model for relating the dependent variable \(y=\) total daily travel time for one of its drivers (hours) to the predictors \(x_{1}=\) distance traveled (miles) and \(x_{2}=\) the number of deliveries made. Suppose that the model equation is $$ Y=-.800+.060 x_{1}+.900 x_{2}+\epsilon $$ a. What is the mean value of travel time when distance traveled is 50 miles and three deliveries are made? b. How would you interpret \(\beta_{1}=.060\), the coefficient of the predictor \(x_{1}\) ? What is the interpretation of \(\beta_{2}=.900 ?\) c. If \(\sigma=.5\) hour, what is the probability that travel time will be at most 6 hours when three deliveries are made and the distance traveled is 50 miles?

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.