Problem 27 Does exposure to air pollution r... [FREE SOLUTION]

Chapter 13: Problem 27

Does exposure to air pollution result in decreased life expectancy? This question was examined in the article "Does Air Pollution Shorten Lives?" (Statistics and Public Policy, Reading, MA, Addison-Wesley, 1977). Data on $$ \begin{aligned} y &=\text { total mortality rate }(\text { deaths per } 10,000) \\ x_{1} &=\text { mean suspended particle reading }\left(\mu \mathrm{g} / \mathrm{m}^{3}\right) \\ x_{2} &=\text { smallest sulfate reading }\left(\left[\mu \mathrm{g} / \mathrm{m}^{3}\right] \times 10\right) \\ x_{3} &=\text { population density }\left(\text { people } / \mathrm{mi}^{2}\right) \\ x_{4} &=\text { (percent nonwhite) } \times 10 \\ x_{5} &=\text { (percent over } 65) \times 10 \end{aligned} $$ for the year 1960 was recorded for $n=117$ randomly selected standard metropolitan statistical areas. The estimated regression equation was $$ \begin{aligned} y=& 19.607+.041 x_{1}+.071 x_{2} \\ &+.001 x_{3}+.041 x_{4}+.687 x_{5} \end{aligned} $$ a. For this model, $R^{2}=.827$. Using a $.05$ significance level, perform a model utility test. b. The estimated standard deviation of $\hat{\beta}_{1}$ was 016 . Calculate and interpret a $90 \%$ CI for $\beta_{1}$. c. Given that the estimated standard deviation of $\hat{\beta}_{4}$ is $.007$, determine whether percent nonwhite is an important variable in the model. Use a .01 significance level. d. In 1960 , the values of $x_{1}, x_{2}, x_{3}, x_{4}$, and $x_{5}$ for Pittsburgh were $166,60,788,68$, and 95 , respectively. Use the given regression equation to predict Pittsburgh's mortality rate. How does your prediction compare with the actual 1960 value of 103 deaths per 10,000 ?

Short Answer

Expert verified

The model is useful and significant variables are identified. Pittsburgh's predicted mortality rate is higher than the actual 1960 value.

Step by step solution

Understanding the Model Utility Test

The model utility test checks if the regression model is useful. It involves testing the hypothesis $ H_0: \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = 0 $ against $ H_a: $ at least one $ \beta_i eq 0 $. The test statistic is $ F = \frac{MSR}{MSE} $, where MSR is the mean square regression and MSE is the mean square error. Given $ R^2 = 0.827 $, the F-statistic can be calculated as $ F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)} $. Here, $ n = 117 $ and $ k = 5 $.

Calculate the F-Statistic

Calculate the F-statistic using the formula: $ F = \frac{0.827 / 5}{(1 - 0.827) / (117 - 5 - 1)} $. This results in $ F \approx 105.63 $. We compare this with the critical value from the F-distribution table for $ df_1 = 5 $ and $ df_2 = 111 $ at a $.05$ significance level.

Conclusion of Model Utility Test

The F-critical value for $.05$ significance level and $ (5, 111) $ degrees of freedom is approximately $2.31$. Since $ F \approx 105.63 $ is much larger than $2.31$, we reject $ H_0 $ and conclude that the model is useful.

Calculate 90% Confidence Interval for $\beta_1$

The confidence interval is calculated as $ b_1 \pm t_{\alpha/2} \cdot s_{b_1} $, where $ b_1 = 0.041 $ and $ s_{b_1} = 0.016 $. The critical t-value at $90\%$ confidence level and $ df = 111 $ is approximately $ 1.658 $. Hence, the CI is $ 0.041 \pm 1.658 \cdot 0.016 $, which results in $[0.014, 0.068]$.

Interpretation of Confidence Interval for $\beta_1$

The 90% confidence interval for $ \beta_1 $ $[0.014, 0.068]$ does not contain zero, indicating a significant positive effect of mean suspended particles on mortality rate at 90% confidence level.

Test the Importance of $ \beta_4 $

To test the significance of $\beta_4$, use the test statistic $ t = \frac{b_4}{s_{b_4}} = \frac{0.041}{0.007} \approx 5.857 $. Compare this value with the critical value $t_{\alpha/2} = 2.576$ for $.01$ significance level and $111$ degrees of freedom.

Conclusion on the Importance of $ \beta_4 $

Since $t \approx 5.857$ is greater than $2.576$, we reject the null hypothesis that $ \beta_4 = 0 $. This suggests that the percent nonwhite is a significant variable in the model.

Predict Mortality Rate for Pittsburgh

Substitute the Pittsburgh values into the regression equation: $ y = 19.607 + 0.041 \cdot 166 + 0.071 \cdot 60 + 0.001 \cdot 788 + 0.041 \cdot 68 + 0.687 \cdot 95 $, resulting in $ y \approx 120.478 $.

Compare Prediction to Actual Value

The predicted mortality rate for Pittsburgh is $120.478$ deaths per 10,000, which is higher than the actual rate of $103$ deaths per 10,000 in 1960.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91影视!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Model Utility Test

A regression model's utility can be assessed using the Model Utility Test. It helps us understand if our model contributes meaningful insights beyond random chance. The test involves evaluating a null hypothesis, which assumes that all the coefficients of the independent variables are zero (i.e., the variables do not affect the outcome variable). Conversely, the alternative hypothesis posits that at least one coefficient is not zero, implying that the model is useful.

To conduct the test, we use the F-statistic, calculated through the formula: \[F = \frac{MSR}{MSE}\] where $MSR$ is the mean square regression and $MSE$ is the mean square error. A higher F-statistic value than the critical value from the F-distribution table suggests rejecting the null hypothesis. This means the model is statistically significant and useful for prediction or inference.

In the context of the given exercise, a high F-statistic value (approximately 105.63) compared to the critical value (approximately 2.31 at a 0.05 significance level) indicates that the regression model for mortality rate considering air pollution factors is indeed useful.

Confidence Interval

A confidence interval provides a range of values within which we can be reasonably sure the true parameter lies. It reflects the precision and uncertainty about the estimated parameter. For instance, a 90% confidence interval indicates that 90 out of 100 such intervals would contain the true parameter value.

The confidence interval for a regression coefficient $ \beta $ is given by: \[b \pm t_{\alpha/2} \cdot s_{b}\]where $b$ is the sample estimate of $\beta$, $t_{\alpha/2}$ is the critical t-value, and $s_{b}$ is the standard error of the estimate.

For example, the confidence interval for $\beta_1$ in the exercise is $[0.014, 0.068]$. This interval does not include zero, implying a significant and positive influence of suspended particles on mortality rate in the study at a 90% confidence level. This means we can be fairly confident that there exists a positive relationship between air pollution and mortality, at least in the sampled data from 1960.

F-Statistic

The F-statistic is a crucial component of the Model Utility Test in regression analysis. It measures the overall significance of the regression model by comparing the model's explained variance with the residual variance.

The calculation for the F-statistic involves \[F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)}\]where $R^2$ represents the proportion of variance in the dependent variable that is predictable from the independent variables, $k$ is the number of predictors, and $n$ is the sample size.

A large F-statistic, compared to the critical value from an F-distribution table (based on the given degrees of freedom), signifies that the regression model provides a better fit to the data than a model containing no predictors. In the provided exercise, the calculated F-statistic was significantly larger than the critical value, affirming the overall utility of the model in explaining mortality rates.

Significance Testing

Significance testing in regression analysis involves determining whether the coefficients of the regression model differ significantly from zero. This process often uses t-tests to assess the importance of each independent variable's contribution to the model.

The null hypothesis generally states that a coefficient ($\beta$) is equal to zero, implying no effect of the corresponding predictor. We compute the test statistic as \[t = \frac{b}{s_b}\]where $b$ is the estimated coefficient, and $s_b$ is its standard error.

In hypothesis testing, if the calculated t-value exceeds the critical t-value from the t-distribution table, the null hypothesis can be rejected at a specified significance level (e.g., 0.01). For $\beta_4$ in the exercise, the t-value was found to be approximately 5.857, which is greater than the critical value of 2.576, indicating that the percentage of nonwhite residence significantly impacts mortality rates. This rejection of the null hypothesis signifies the variable's importance in the regression model, enhancing our understanding of factors affecting mortality.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

91影视

Short Answer

Step by step solution

Understanding the Model Utility Test

Calculate the F-Statistic

Conclusion of Model Utility Test

Calculate 90% Confidence Interval for \(\beta_1\)

Interpretation of Confidence Interval for \(\beta_1\)

Test the Importance of \( \beta_4 \)

Conclusion on the Importance of \( \beta_4 \)

Predict Mortality Rate for Pittsburgh

Compare Prediction to Actual Value

Key Concepts

Model Utility Test

Confidence Interval

F-Statistic

Significance Testing

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Pure Maths

Decision Maths

Logic and Functions

Statistics

Theoretical and Mathematical Physics

Discrete Mathematics

Study anywhere. Anytime. Across all devices.