/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 14 If there is at least one \(x\) v... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

If there is at least one \(x\) value at which more than one observation has been made, there is a formal test procedure for testing \(H_{0}: \mu_{\gamma-x}=\beta_{0}+\beta_{1} x\) for some values \(\beta_{0}, \beta_{1}\) (the true regression function is linear) versus \(H_{a}: H_{0}\) is not true (the true regression function is not linear) Suppose observations are made at \(x_{1}, x_{2}, \ldots, x_{c}\). Let \(Y_{11}, Y_{12}, \ldots, Y_{\mathrm{Ln}_{\mathrm{i}}}\) denote the \(n_{1}\) observations when \(x=x_{1} ; \ldots ; Y_{c 1}, Y_{c 2}, \ldots, Y_{c a}\) denote the \(n_{c}\) observations when \(x=x_{c^{-}}\)With \(n=\Sigma n_{j}\) (the total number of observations), SSE has \(n-2\) df. We break SSE into two pieces, SSPE (pure error) and SSLF (lack of fit), as follows: $$ \begin{aligned} \mathrm{SSPE} &=\sum_{i} \sum_{j}\left(Y_{i j}-\bar{Y}_{i-}\right)^{2} \\ &=\sum \sum Y_{i j}^{2}-\sum n_{i} \bar{Y}_{i-}^{2} \\ \mathrm{SSLF} &=\mathrm{SSE}-\mathrm{SSPE} \end{aligned} $$ The \(n_{i}\) observations at \(x_{i}\) contribute \(n_{i}-1\) df to SSPE, so the number of degrees of freedom for SSPE is \(\Sigma_{i}\left(n_{i}-1\right)=n-c\), and the degrees of freedom for SSLF is \(n-2-(n-c)=c-2\). Let MSPE \(=\operatorname{SSPE} /(n-c)\) and MSLF \(=S S L F /(c-2)\). Then it can be shown that whereas \(E(\mathrm{MSPE})=\sigma^{2}\) whether or not \(H_{0}\) is true, \(E(\mathrm{MSLF})=\sigma^{2}\) if \(H_{0}\) is true and \(E(\mathrm{MSLF})>\sigma^{2}\) if \(H_{0}\) is false. The test statistic is \(F=\) MSLF/MSPE, and the corresponding \(P\)-value is the area under the \(F_{c-2, n-c}\) curve to the right of \(f\). The following data comes from the article "Changes in Growth Hormone Status Related to Body Weight of Growing Cattle" (Growth, 1977: 241-247), with \(x=\) body weight and \(y=\) metabolic clearance rate/body weight. $$ \begin{array}{l|ccccccc} x & 110 & 110 & 110 & 230 & 230 & 230 & 360 \\ \hline y & 235 & 198 & 173 & 174 & 149 & 124 & 115 \\ x & 360 & 360 & 360 & 505 & 505 & 505 & 505 \\ \hline y & 130 & 102 & 95 & 122 & 112 & 98 & 96 \end{array} $$ a. Test \(H_{0}\) versus \(H_{a}\) at level 05 using the lack-of-fit test just described. b. Does a scatterplot of the data suggest that the relationship between \(x\) and \(y\) is linear? How does this compare with the result of part (a)? (A nonlinear regression function was used in the article.)

Short Answer

Expert verified
Part a: Reject \( H_0 \); the model is not linear. Part b: Scatterplot supports nonlinearity, consistent with part (a).

Step by step solution

01

Compute Mean Responses for Each x

Calculate the mean of the response variable \( y \) for each value of \( x \). We have \( x = 110, 230, 360, \) and \( 505 \):\[\bar{Y}_{1-} = \frac{235 + 198 + 173}{3} = 202.0, \quad \bar{Y}_{2-} = \frac{174 + 149 + 124}{3} = 149.0\]\[\bar{Y}_{3-} = \frac{115 + 130 + 102 + 95}{4} = 110.5, \quad \bar{Y}_{4-} = \frac{122 + 112 + 98 + 96}{4} = 107.0\]
02

Calculate SSPE (Sum of Squared Pure Errors)

SSPE is computed by summing the squares of the differences between each observation and its group mean:\[\text{SSPE} = (235-202.0)^2 + (198-202.0)^2 + (173-202.0)^2 + \ldots + (96-107.0)^2\]\[= 1086.67 + 16 + 841 + 625 + 0 + 625 + 192.25 + 240.25 + 246.01 + 16 + 4\]\[\text{SSPE} = 2985.19\]
03

Calculate SSE (Total Sum of Squares Error)

SSE is the sum of all squared differences between each \( y_{ij} \) and the overall mean \( \bar{Y} \). First, calculate \( \bar{Y} = \frac{\sum Y_{ij}}{n} = \frac{1509}{13} \approx 116.08 \):\[\text{SSE} = \sum (Y_{ij} - \bar{Y})^2\]Calculate and sum the squared differences to obtain SSE:\[\text{SSE} = 14153.77\]
04

Calculate SSLF (Lack of Fit)

SSLF is the difference between SSE and SSPE:\[\text{SSLF} = \text{SSE} - \text{SSPE} = 14153.77 - 2985.19 = 11168.58\]
05

Degrees of Freedom and Calculate MSPE and MSLF

Calculate the degrees of freedom:- \( \text{df}_{\text{SSPE}} = n - c = 13 - 4 = 9 \)- \( \text{df}_{\text{SSLF}} = c - 2 = 4 - 2 = 2 \)Calculate MSPE and MSLF:\[\text{MSPE} = \frac{2985.19}{9} = 331.6878\]\[\text{MSLF} = \frac{11168.58}{2} = 5584.29\]
06

Calculate the F-statistic

Compute the test statistic \( F \):\[F = \frac{\text{MSLF}}{\text{MSPE}} = \frac{5584.29}{331.6878} \approx 16.83\]
07

Determine the P-value

The P-value is the probability of observing such an \( F \) value under the \( F \) distribution with \( 2, 9 \) degrees of freedom. Using an \( F \) table or software, \( P \approx 0.001 \), which is less than 0.05.
08

Make a Conclusion for Part (a)

Since the P-value is less than the significance level 0.05, we reject the null hypothesis \( H_0 \). There is significant evidence that the model is not linear.
09

Interpret the Scatterplot for Part (b)

Create a scatterplot of the given data points \((x, y)\). The pattern should show an apparent curve rather than a straight line, suggesting nonlinearity. This visual inconsistency with a linear model supports the conclusion from part (a).

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Linear Regression
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The simplest form, simple linear regression, involves only one independent variable. The formula for a linear regression line is \( y = \beta_0 + \beta_1 x + \epsilon \), where:
  • \( y \) is the dependent variable.
  • \( x \) is the independent variable.
  • \( \beta_0 \) is the y-intercept.
  • \( \beta_1 \) is the slope of the line, representing the change in \( y \) for a one-unit change in \( x \).
  • \( \epsilon \) is the error term, accounting for variability not explained by the model.
Linear regression assumes a linear relationship between \( x \) and \( y \). It is useful for making predictions or assessing the strength and direction of a relationship between variables. However, it's important to validate the model assumptions before relying on the results. This can be done through graphical analysis like scatterplots, which help to visually assess the appropriateness of a linear relationship.
Sum of Squares
Sum of squares is a method used in statistics to determine the variance within a data set. In regression analysis, it helps us understand the variability explained by the model versus the variability left unexplained.

Total Sum of Squares (SST)

The total sum of squares (SST) measures the total variation in the dependent variable and is calculated by taking the sum of the squared differences between each observed value and the overall mean:\[SST = \sum (Y_{ij} - \bar{Y})^2\]

Error Sum of Squares (SSE)

SSE measures the variation attributed to differences within each group. It's calculated as the sum of squared differences between observed values and their respective group's mean:\[SSE = \sum (Y_{ij} - \bar{Y}_{i-})^2\]

Lack of Fit Sum of Squares (SSLF)

Lack of fit sum of squares (SSLF) measures how well the proposed model fits the data. It is the difference between the total sum of squares and the pure error:\[SSLF = SST - SSPE\]By analyzing these sums of squares, we can assess the model's effectiveness in explaining variance in the data.
Degrees of Freedom
Degrees of freedom are important in hypothesis testing because they relate to the number of independent values or quantities which can vary in an analysis. They influence the shape of various statistical distributions, including the F-distribution used in this test.
  • For SSPE (Sum of Squared Pure Errors), the degrees of freedom is \( n-c \), where \( n \) is the total number of observations and \( c \) is the number of different \( x \) levels.
  • For SSLF (Sum of Squared Lack of Fit), the degrees of freedom is \( c-2 \). This reflects the complexity increasing with more x-values, demanding a higher level of precision in the model fitting.
Understanding and correctly calculating degrees of freedom are crucial, as they affect the computation of mean squares (MSPE and MSLF) which are critical for conducting F-tests to determine the fit of the regression model.
F-test
The F-test is a statistical test used to compare the variances of two populations to determine if they are significantly different. In the context of regression analysis, the F-test is used as a part of hypothesis testing for model parameters.In this case, the F-test helps evaluate whether the lack-of-fit in a regression model is statistically significant, suggesting a non-linear relationship.

Calculating F-statistic

The test statistic \( F \) is calculated from the mean sum of squares due to lack of fit (MSLF) and due to pure error (MSPE):\[F = \frac{\text{MSLF}}{\text{MSPE}}\]A higher \( F \)-value indicates a greater degree of variation due to lack of fit than would be expected by random chance.

P-value and Decision Making

The P-value from the F-distribution is used to make a decision about the null hypothesis. If the P-value is less than the significance level (commonly 0.05), we reject the null hypothesis, indicating sufficient evidence that the model is not linear.This F-test is essential for assessing model adequacy when exploring relationships between variables, ensuring any proposed model fits the data well enough to yield meaningful insights.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

The accompanying scatterplot is based on data provided by authors of the article "'Spurious Correlation in the USEPA Rating Curve Method for Estimating Pollutant Loads" (J. of Emvir. Engra, 2008: 610-618); here discharge is in \(\mathrm{ft}^{3} / \mathrm{s}\) as opposed to \(\mathrm{m}^{3} / \mathrm{s}\) used in the article. The point on the far right of the plot corresponds to the observation \((140,1529.35)\). The resulting standardized residual is 3.10. Minitab flags the observation with an \(R\) for large residual and an \(X\) for potentially influential observation. a line to the following data on \(x=\) prepreg thickness \((\mathrm{mm}\) ) and \(v=\) core crush \((\%)\) : $$ \begin{array}{c|cccccccc} x & .246 & .250 & .251 & .251 & .254 & .262 & .264 & .270 \\ \hline y & 16.0 & 11.0 & 15.0 & 10.5 & 13.5 & 7.5 & 6.1 & 1.7 \\ x & .272 & .277 & .281 & .289 & .290 & .292 & .293 & \\ \hline y & 3.6 & 0.7 & 0.9 & 1.0 & 0.7 & 3.0 & 3.1 & \end{array} $$ a. Fit the simple linear regression model. What proportion of the observed variation in core crush can be attributed to the model relationship? b. Construct a scatterplot. Does the plot suggest that a linear probabilistic relationship is appropriate? c. Obtain the residuals and standardized residuals, and then construct residual plots. What do these plots suggest? What type of function should provide a better fit to the data than does a straight line?

Let \(y=\) sales at a fast-food outlet \((1000 s\) of \(\$), x_{1}=\) number of competing outlets within a 1 -mile radius, \(x_{2}=\) population within a 1 -mile radius ( \(1000 \mathrm{~s}\) of people), and \(x_{3}\) be an indicator variable that equals 1 if the outlet has a drive-up window and 0 otherwise. Suppose that the true regression model is $$ Y=10.00-1.2 x_{1}+6.8 x_{2}+15.3 x_{3}+\epsilon $$ a. What is the mean value of sales when the number of competing outlets is 2 , there are 8000 people within a 1-mile radius, and the outlet has a drive-up window? b. What is the mean value of sales for an outlet without a drive-up window that has three competing outlets and 5000 people within a 1 -mile radius? c. Interpret \(\beta_{3}\).

An aeronautical engineering student carried out an experiment to study how \(y=\) lift/drag ratio related to the variables \(x_{1}=\) position of a certain forward lifting surface relative to the main wing and \(x_{2}=\) tail placement relative to the main wing, obtaining the following data (Statistics for Engineering Problem Solving, p. 133\():\)a. Fitting the first-order model gives \(\mathrm{SSE}=5.18\), whereas including \(x_{3}=x_{1} x_{2}\) as a predictor results in \(\mathrm{SSE}=\) 3.07. Calculate and interpret the coefficient of multiple determination for each model. b. Carry out a test of model utility using \(\alpha=.05\) for each of the models described in part (a). Does either result surprise you?

Cardiorespiratory fitness is widely recognized as a major component of overall physical well-being. Direct measurement of maximal oxygen uptake (VO \(\mathrm{VO}_{2}\) max \()\) is the single best measure of such fitness, but direct measurement is time-consuming and expensive. It is therefore desirable to have a prediction equation for \(\mathrm{VO}_{2} \max\) in terms of easily obtained quantities. Consider the variables $$ \begin{aligned} &y=\mathrm{VO}_{2} \max (\mathrm{L} / \mathrm{min}) \quad x_{1}=\text { weight }(\mathrm{kg}) \\ &x_{2}=\text { age }(\mathrm{yr}) \\ &x_{3}=\text { time necessary to walk } 1 \text { mile (min) } \\ &x_{4}=\text { heart rate at the end of the walk (beats/min) } \\ &\text { Here is one possible model, for male students, consistent } \\ &\text { with the information given in the article "Validation of } \\ &\text { the Rockport Fitness Walking Test in College Males } \\ &\text { and Females" (Research Quarterly for Exercise and } \\ &\text { Sport, } 1994: 152-158): \\ &Y=5.0+.01 x_{1}-.05 x_{2}-.13 x_{3}-.01 x_{4}+\epsilon \\ &\sigma=.4 \end{aligned} $$ a. Interpret \(\beta_{1}\) and \(\beta_{3}\). b. What is the expected value of \(\mathrm{VO}_{2} \max\) when weight is \(76 \mathrm{~kg}\), age is 20 yr, walk time is \(12 \mathrm{~min}\), and heart rate is \(140 \mathrm{~b} / \mathrm{m}\) ? c. What is the probability that \(\mathrm{VO}_{2} \max\) will be between \(1.00\) and \(2.60\) for a single observation made when the values of the predictors are as stated in part (b)?

The article "'The Influence of Honing Process Parameters on Surface Quality, Productivity, Cutting Angle, and Coefficient of Friction" (Industrial Lubrication and Tribology, 2012: 77-83) included the following data on \(x_{1}=\) cutting speed \((\mathrm{m} / \mathrm{s}), x_{2}=\) specific pressure of pre-honing process \(\left(\mathrm{N} / \mathrm{mm}^{2}\right), x_{3}=\) specific pressure of finishing honing process, and \(y=\) productivity in the honing process ( \(\mathrm{mm}^{3} / \mathrm{s}\) for a particular tool; productivity is the volume of the material cut in a second.a. The article proposed a multivariate power model \(Y=\alpha x_{1}^{\beta_{1}} x x_{2}^{\beta_{2}} x_{3}^{\beta_{i}} \epsilon\). The implied linear regression model involves regressing \(\ln (y)\) against the three predictors \(\ln \left(x_{1}\right), \ln \left(x_{2}\right)\), and \(\ln \left(x_{3}\right)\). Partial Minitab output from fitting this latter model is as follows (the corresponding estimated power regression function appeared in the cited article). Carry out the model utility test at significance level \(.05\). b. The large \(P\)-value corresponding to the \(t\) ratio for \(\ln \left(x_{2}\right)\) suggests that this predictor can be eliminated from the model. Doing so and refitting yields the following Minitab output. c. Fit the simple linear regression model implied by your conclusion in (b) to the transformed data, and carry out a test of model utility. d. The standardized residuals from the fit referred to in (c) are .03,.33. \(1.69, .33,-.49, .96, .57, .33,-, 25\), \(-1.28, .29,-2.26\). Plot these against \(\ln \left(x_{1}\right)\). What does the pattern suggest? e. Fitting a quadratic regression model to relate \(\ln (y)\) to \(\ln \left(x_{1}\right)\) gave the following Minitab output. Carry out a test of model utility at significance level \(.05\) (the pattern in residual plots is satisfactory). Then use the fact that \(s_{\ln \left(\tilde{Y}^{\prime}\right)}=.0178\left[Y^{\prime}=\ln (Y)\right]\) when \(x_{1}=1\) to obtain a \(95 \%\) prediction interval for productivity.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.