/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 3 Use appropriate multiple regress... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

Use appropriate multiple regression software of your choice and enter the data. Note that the data are also available for download at the Companion Sites for this text. Medical: Blood Pressure The systolic blood pressure of individuals is thought to be related to both age and weight. For a random sample of 11 men, the following data were obtained: $$\begin{array}{ccc|ccc} \hline \begin{array}{c} \text { Systolic } \\ \text { Blood Pressure } \end{array} & \begin{array}{c} \text { Age } \\ \text { (years) } \end{array} & \begin{array}{c} \text { Weight } \\ \text { (pounds) } \end{array} & \begin{array}{c} \text { Systolic } \\ \text { Blood Pressure } \end{array} & \begin{array}{c} \text { Age } \\ \text { (years) } \end{array} & \begin{array}{c} \text { Weight } \\ \text { (pounds) } \end{array} \\ \hline x_{1} & x_{2} & x_{3} & x_{1} & x_{2} & x_{3} \\ \hline 132 & 52 & 173 & 137 & 54 & 188 \\ 143 & 59 & 184 & 149 & 61 & 188 \\ 153 & 67 & 194 & 159 & 65 & 207 \\ 162 & 73 & 211 & 128 & 46 & 167 \\ 154 & 64 & 196 & 166 & 72 & 217 \\ 168 & 74 & 220 & & & \\ \hline \end{array}$$ (a) Generate summary statistics, including the mean and standard deviation of each variable. Compute the coefficient of variation (see Section 3.2) for each variable. Relative to its mean, which variable has the greatest spread of data values? Which variable has the smallest spread of data values relative to its mean? (b) For each pair of variables, generate the sample correlation coefficient \(r\) Compute the corresponding coefficient of determination \(r^{2}\). Which variable (other than \(x_{1}\) ) has the greatest influence (by itself) on \(x_{1} ?\) Would you say that both variables \(x_{2}\) and \(x_{3}\) show a strong influence on \(x_{1}\) ? Explain your answer. What percent of the variation in \(x_{1}\) can be explained by the corresponding variation in \(x_{2}\) ? Answer the same question for \(x_{3}\) (c) Perform a regression analysis with \(x_{1}\) as the response variable. Use \(x_{2}\) and \(x_{3}\) as explanatory variables. Look at the coefficient of multiple determination. What percentage of the variation in \(x_{1}\) can be explained by the corresponding variations in \(x_{2}\) and \(x_{3}\) taken together? (d) Look at the coefficients of the regression equation. Write out the regression equation. Explain how each coefficient can be thought of as a slope. If age were held fixed, but a person put on 10 pounds, what would you expect for the corresponding change in systolic blood pressure? If a person kept the same weight but got 10 years older, what would you expect for the corresponding change in systolic blood pressure? (e) Test each coefficient to determine if it is zero or not zero. Use level of significance \(5 \% .\) Why would the outcome of each test help us determine whether or not a given variable should be used in the regression model? (f) Find a \(90 \%\) confidence interval for each coefficient. (g) Suppose Michael is 68 years old and weighs 192 pounds. Predict his systolic blood pressure, and find a \(90 \%\) confidence range for your prediction (if your software produces prediction intervals).

Short Answer

Expert verified
Age and weight influence blood pressure; weight has a stronger effect. The regression model can predict blood pressure with age and weight information.

Step by step solution

01

Calculate Summary Statistics

For each variable, calculate the mean and standard deviation. Let's denote the systolic blood pressure as \(x_1\), age as \(x_2\), and weight as \(x_3\). Use statistical software or a calculator to find these values. Once you have the mean \( \bar{x} \) and standard deviation \( s \), calculate the coefficient of variation (CV) as \( CV = \frac{s}{\bar{x}} \times 100\% \). Identify which variable has the greatest and smallest spread relative to its mean.
02

Compute the Correlation Coefficient

Use statistical software to calculate the sample correlation coefficient \( r \) for each pair \((x_1, x_2)\), \((x_1, x_3)\), and \((x_2, x_3)\). Calculate the coefficient of determination \( r^2 \) as \( r^2 = r^2 \times 100\% \) to express the influence of one variable on another. Analyze which variable \(x_2\) or \(x_3\) has a stronger influence on \(x_1\) by comparing the \( r^2 \) values.
03

Perform Multiple Regression Analysis

Set up a multiple regression with systolic blood pressure \(x_1\) as the dependent variable and both age \(x_2\) and weight \(x_3\) as independent variables. Use software to find the regression coefficients and the overall coefficient of multiple determination \( R^2 \). This \( R^2 \) will show the percentage of variance in \(x_1\) explained by \(x_2\) and \(x_3\) together.
04

Construct the Regression Equation

Based on the multiple regression output, write the equation as \( x_1 = \beta_0 + \beta_1 x_2 + \beta_2 x_3 \), where \( \beta_0 \) is the intercept, and \( \beta_1 \) and \( \beta_2 \) are the coefficients for age and weight, respectively. Interpret the coefficients as the change in systolic blood pressure per unit increase in age or weight. Predict changes in \(x_1\) with fixed \(x_2\) and changing \(x_3\), and vice versa.
05

Test the Significance of Coefficients

Perform hypothesis tests on \( \beta_1 \) and \( \beta_2 \) to see if they are significantly different from zero, using a 5% significance level. Significant p-values (less than 0.05) suggest the corresponding variable is influential. Insignificant p-values suggest the variable could be omitted from the model.
06

Confidence Intervals of Coefficients

Calculate the 90% confidence intervals for each coefficient \( \beta_1 \) and \( \beta_2 \). This interval estimates where the true population parameter lies, providing insight into the reliability of the coefficients.
07

Predict Systolic Pressure and Confidence Interval

Using the regression model, substitute Michael's age (68) and weight (192) into the equation to predict his systolic blood pressure. Also, generate a 90% prediction interval if the software provides this feature, which gives a range where Michael's systolic pressure is likely to fall.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Systolic Blood Pressure
Systolic Blood Pressure is an essential measurement in understanding cardiovascular health. It represents the pressure in your arteries when your heart beats. Key factors influencing systolic blood pressure include age and weight.
Understanding the changes in systolic pressure helps doctors monitor conditions like hypertension. In analyzing data, especially in multiple regression, systolic blood pressure is often used as a dependent variable.
This is because it can be influenced by other factors like age and weight, which are used as independent variables. By conducting a regression analysis, you can assess how age and weight contribute to blood pressure changes, helping to predict future health risks.
Correlation Coefficient
The correlation coefficient, often denoted as \(r\), measures the strength and direction of a linear relationship between two variables. Ranging from -1 to 1, values close to 1 indicate a strong positive relationship, while values near -1 imply a strong negative relationship.
In our exercise, the correlation coefficient helps determine how strongly age and weight are connected to systolic blood pressure.
A correlation analysis within multiple regression helps establish whether these independent variables are worthy predictors of changes in blood pressure. Knowing the correlation also aids in understanding whether increasing age or weight is more influential in modifying systolic pressure.
Coefficient of Determination
The Coefficient of Determination, represented as \(R^2\), reveals the proportion of variance in a dependent variable explained by the independent variables in a regression model.
The value of \(R^2\) ranges between 0 and 1, where 0 indicates that the model explains none of the variability and 1 indicates it explains all the variability in the data. In multiple regression, \(R^2\) reflects how well age and weight together predict systolic blood pressure.
  • An \(R^2\) close to 1 means age and weight are good predictors of blood pressure changes.
  • An \(R^2\) closer to 0 suggests a weak predictive capability of the model.
This metric is crucial because it informs the reliability of the predictions made by the regression equation.
Confidence Interval
A Confidence Interval offers a range within which we can expect the true population parameter to lie, based on sample data. In regression analysis, confidence intervals are calculated for regression coefficients, providing insights into their accuracy and reliability.
A 90% confidence interval implies that if we were to repeat the sampling process many times, 90% of the calculated intervals would contain the true parameter.
  • Wide intervals indicate less precision and suggest a need for more data.
  • Narrow intervals imply more precise estimates and stronger conclusions.
In our exercise, understanding the confidence intervals aids in determining the reliability of predictions regarding systolic blood pressure based on age and weight.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

(a) Suppose \(n=6\) and the sample correlation coefficient is \(r=0.90 .\) Is \(r\) significant at the \(1 \%\) level of significance (based on a two-tailed test)? (b) Suppose \(n=10\) and the sample correlation coefficient is \(r=0.90 .\) Is \(r\) significant at the \(1 \%\) level of significance (based on a two-tailed test)? (c) Explain why the test results of parts (a) and (b) are different even though the sample correlation coefficient \(r=0.90\) is the same in both parts. Does it appear that sample size plays an important role in determining the significance of a correlation coefficient? Explain.

What is the symbol used for the population correlation coefficient?

Serial correlation, also known as autocorrelation, describes the extent to which the result in one period of a time series is related to the result in the next period. A time series with high serial correlation is said to be very predictable from one period to the next. If the serial correlation is low (or near zero), the time series is considered to be much less predictable. For more information about serial correlation, see the book Ibbotson \(S B B I\) published by Morningstar. A research veterinarian at a major university has developed a new vaccine to protect horses from West Nile virus. An important question is: How predictable is the buildup of antibodies in the horse's blood after the vaccination is given? A large random sample of horses from Wyoming were given the vaccination. The average antibody buildup factor (as determined from blood samples) was measured each week after the vaccination for 8 weeks. Results are shown in the following time series:$$\begin{array}{l|rrrrrrrr}\hline \text { Week } & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\\\\hline \text { Buildup Factor } & 2.4 & 4.7 & 6.2 & 7.5 & 8.0 & 9.1 & 10.7 & 12.3 \\\\\hline\end{array}$$ To construct a serial correlation, we simply use data pairs \((x, y)\) where \(x=\) original buildup factor data and \(y=\) original data shifted ahead by 1 week. This gives us the following data set. since we are shifting 1 week ahead, we now have 7 data pairs (not 8 ). $$\begin{array}{c|ccccccc}\hline x & 2.4 & 4.7 & 6.2 & 7.5 & 8.0 & 9.1 & 10.7 \\\\\hline y & 4.7 & 6.2 & 7.5 & 8.0 & 9.1 & 10.7 & 12.3 \\\\\hline\end{array}$$ (a) Use the sums provided (or a calculator with least-squares regression) to compute the equation of the sample least-squares line, \(\hat{y}=a+b x .\) If the buildup factor was \(x=5.8\) one week, what would you predict the buildup factor to be the next week? (b) Compute the sample correlation coefficient \(r\) and the coefficient of determination \(r^{2}\). Test \(\rho>0\) at the \(1 \%\) level of significance. Would you say the time series of antibody buildup factor is relatively predictable from one week to the next? Explain.

Please do the following. (a) Draw a scatter diagram displaying the data. (b) Verify the given sums \(\Sigma x, \Sigma y, \Sigma x^{2}, \Sigma y^{2},\) and \(\Sigma x y\) and the value of the sample correlation coefficient \(r\) (c) Find \(\bar{x}, \bar{y}, a,\) and \(b .\) Then find the equation of the least- squares line \(\hat{y}=a+b x\) (d) Graph the least-squares line on your scatter diagram. Be sure to use the point \((\bar{x}, \bar{y})\) as one of the points on the line. (e) Interpretation Find the value of the coefficient of determination \(r^{2} .\) What percentage of the variation in \(y\) can be explained by the corresponding variation in \(x\) and the least-squares line? What percentage is unexplained? Answers may vary slightly due to rounding. Education: Violent Crime The following data are based on information from the book Life in America's Small Cities (by G. S. Thomas, Prometheus Books). Let \(x\) be the percentage of \(16-\) to 19 -year-olds not in school and not high school graduates. Let \(y\) be the reported violent crimes per 1000 residents. Six small cities in Arkansas (Blytheville, El Dorado, Hot Springs, Jonesboro, Rogers, and Russellville) reported the following information about \(x\) and \(y:\) Complete parts (a) through (e), given \(\Sigma x=112.8, \Sigma y=32.4\) \(\Sigma x^{2}=2167.14, \Sigma y^{2}=290.14, \Sigma x y=665.03,\) and \(r \approx 0.764\) (f) If the percentage of \(16-\) to 19 -year-olds not in school and not graduates reaches \(24 \%\) in a similar city, what is the predicted rate of violent crimes per 1000 residents?

Please do the following. (a) Draw a scatter diagram displaying the data. (b) Verify the given sums \(\Sigma x, \Sigma y, \Sigma x^{2}, \Sigma y^{2},\) and \(\Sigma x y\) and the value of the sample correlation coefficient \(r\) (c) Find \(\bar{x}, \bar{y}, a,\) and \(b .\) Then find the equation of the least- squares line \(\hat{y}=a+b x\) (d) Graph the least-squares line on your scatter diagram. Be sure to use the point \((\bar{x}, \bar{y})\) as one of the points on the line. (e) Interpretation Find the value of the coefficient of determination \(r^{2} .\) What percentage of the variation in \(y\) can be explained by the corresponding variation in \(x\) and the least-squares line? What percentage is unexplained? Answers may vary slightly due to rounding. Let \(x\) be the age of a licensed driver in years. Let \(y\) be the percentage of all fatal accidents (for a given age) due to failure to yield the right-of- way. For example, the first data pair states that \(5 \%\) of all fatal accidents of 37 -year-olds are due to failure to yield the right-of-way. The Wall Street Journal article referenced in Problem 11 reported the following data: Complete parts (a) through (e), given \(\Sigma x=372, \Sigma y=112, \Sigma x^{2}=24,814\) \(\Sigma y^{2}=3194, \Sigma x y=8254,\) and \(r \approx-0.943\) (f) Predict the percentage of all fatal accidents due to failing to yield the right-of-way for 70-year-olds.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.