/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 11 Suppose that we wish to construc... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

Suppose that we wish to construct the likelihood ratio statistic for comparison of the two linear models \(y=X_{1} \beta_{1}+\varepsilon\) and \(y=X_{1} \beta_{1}+X_{2} \beta_{2}+\varepsilon\), where the components of \(\varepsilon\) are independent normal variables with mean zero and variance \(\sigma^{2} ;\) call the corresponding residual sums of squares \(S S_{1}\) and \(S S\) on \(v_{1}\) and \(v\) degrees of freedom. (a) Show that the maximum value of the log likelihood is \(-\frac{1}{2} n(\log S S+1-\log n)\) for a model whose residual sum of squares is \(S S\), and deduce that the likelihood ratio statistic for comparison of the models above is \(W=n \log \left(S S_{1} / S S\right)\). (b) By writing \(S S_{1}=S S+\left(S S_{1}-S S\right)\), show that \(W\) is a monotonic function of the \(F\) statistic for comparison of the models. (c) Show that \(W \doteq\left(v_{1}-v\right) F\) when \(n\) is large and \(v\) is close to \(n\), and say why \(F\) would usually be preferred to \(W\).

Short Answer

Expert verified
The likelihood ratio statistic \( W = n \log(SS_1 / SS) \) relates to the F statistic as a monotonic function, and is approximated by \( (v_1-v)F \) when \( n \) is large, thereby favoring the use of \( F \) due to its interpretability and standardized distribution.

Step by step solution

01

Understanding the Likelihood Function

The likelihood function for a normal distribution is given by \( L(\beta,\sigma^2) = (2\pi\sigma^2)^{-n/2}\exp\left(-\frac{1}{2\sigma^2}\text{RSS}\right) \), where RSS is the residual sum of squares. For model comparison, we use the maximum likelihood estimation.
02

Finding the Maximum Log Likelihood

For a model with residual sum of squares \( SS \), the maximum likelihood occurs when \( \sigma^2 = SS/n \). Substitute this into the log likelihood function to get: \[ \log L = -\frac{n}{2} \log(2\pi) - \frac{n}{2} \log \left(\frac{SS}{n}\right) - \frac{1}{2} n \] which simplifies to \( -\frac{1}{2}n( \log SS + 1 - \log n) \).
03

Deriving the Likelihood Ratio Statistic W

The likelihood ratio statistic \( W \) is given by the log likelihood difference between the models: \[ W = -\frac{1}{2}n(\log SS_1 + 1 - \log n) - \left(-\frac{1}{2}n(\log SS + 1 - \log n) \right) = n \log(SS_1 / SS) \]
04

Expressing SS1 in Terms of SS and F

Express the residual sum of squares \( SS_1 = SS + (SS_1 - SS) \). The expression \( (SS_1 - SS) \) can be related to the F statistic, which compares the variance between the models.
05

Showing the Monotonicity in Relation to F

The likelihood ratio statistic \( W = n \log(SS_1 / SS) \) can be transformed using the F statistic: \[ F = \frac{(SS_1 - SS)/(v_1 - v)}{SS/v} \]This shows that \( W \) is a monotonic function of \( F \) because they both reflect model improvement.
06

Approximating W with the F Statistic

For large \( n \) and when \( v \) is close to \( n \), the approximation \( W \approx (v_1 - v) F \) holds due to the relationship between \( SS_1 \), \( SS \), and the F-statistic's comparative measure of explained variance.
07

Preference for F over W

Typically, \( F \) is preferred over \( W \) because it provides a direct statistical test (ANOVA) with an established distribution under the null hypothesis, making it more interpretable and commonly used in hypothesis testing.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Linear Models
Linear Models are fundamental in statistical analysis for understanding relationships between variables. They express the response variable, often denoted as \( y \), as a linear combination of predictor variables, plus an error term \( \varepsilon \). In the exercise at hand, we compare two such models where:
  • Model 1: \( y = X_1 \beta_1 + \varepsilon \)
  • Model 2: \( y = X_1 \beta_1 + X_2 \beta_2 + \varepsilon \)
These models assess how additional variables may improve the explanation of the response variable. The error term represents the part of \( y \) not explained by the predictors and is assumed to be normally distributed with mean zero and variance \( \sigma^2 \). Understanding these models helps in gauging the effect of different predictors on the outcome variable.
Likelihood Function
The Likelihood Function is a core concept in parameter estimation, especially in the context of statistical modeling like linear models. It quantifies how likely it is to observe the given data under specific parameter values.
For models with normally distributed errors, the likelihood function is defined as:\[ L(\beta, \sigma^2) = (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}\text{RSS}\right) \]where \( \text{RSS} \) is the residual sum of squares.

The goal is to maximize this likelihood with respect to the parameters \( \beta \) and \( \sigma^2 \). Doing so provides maximum likelihood estimates (MLEs), which are parameter values that make the observed data most probable. In the exercise's context, the maximum log likelihood for a model is expressed as:

\[ -\frac{1}{2} n( \log SS + 1 - \log n) \]where \( SS \) is the residual sum of squares for the model. This formulation simplifies the computation and comparison of models, leading us to the likelihood ratio test.
F Statistic
The F Statistic is a fundamental tool for comparing statistical models. Particularly in linear models, it helps determine if the addition of new predictors significantly improves the model.
  • Calculated as: \[ F = \frac{(SS_1 - SS)/(v_1 - v)}{SS/v} \]where \( SS_1 \) and \( SS \) are the residual sum of squares for the models being compared, and \( v_1 \) and \( v \) are their degrees of freedom, respectively.
  • The F statistic quantifies whether the model with more predictors is significantly better fitting compared to a simpler model.
  • When the F statistic is large, it indicates that the additional predictors significantly reduce the error from the model, suggesting better performance.
This measure is often preferred over the likelihood ratio because it fits into the Analysis of Variance (ANOVA) framework, providing a direct pathway to hypothesis testing.
Residual Sum of Squares
Residual Sum of Squares (RSS) plays a crucial role in assessing the fit of a linear model. It measures the total of squared differences between observed values and the values predicted by the model. The RSS is calculated as:
\[ RSS = \sum (y_i - \hat{y}_i)^2 \]where \( y_i \) are the observed values and \( \hat{y}_i \) are the predicted values from the model.

A smaller RSS indicates a better fit, as it signifies that the model's predictions are closer to actual observations. In model comparison, choosing a model with a lower RSS typically means a better overall fit.
  • For Model 1, the RSS is \( SS_1 \), and for Model 2, it is \( SS \).
  • These RSS values are pivotal in calculating the likelihood function and, subsequently, the likelihood ratio statistic.
The comparison between different models often hinges on analyzing these RSS values to determine improvement in model fit when additional variables are included.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Write down the linear model corresponding to a simple random sample \(y_{1}, \ldots, y_{n}\) from the \(N\left(\mu, \sigma^{2}\right)\) distribution, and find the design matrix. Verify that $$ \widehat{\mu}=\left(X^{\mathrm{T}} X\right)^{-1} X^{\mathrm{T}} y=\bar{y}, \quad s^{2}=S S(\widehat{\beta}) /(n-p)=(n-1)^{-1} \sum\left(y_{j}-\bar{y}\right)^{2} $$

Consider a normal linear regression \(y=\beta_{0}+\beta_{1} x+\varepsilon\) in which the parameter of interest is \(\psi=\beta_{0} / \beta_{1}\), to be estimated by \(\widehat{\psi}=\widehat{\beta}_{0} / \widehat{\beta}_{1} ;\) let \(\operatorname{var}\left(\widehat{\beta}_{0}\right)=\sigma^{2} v_{00}, \operatorname{cov}\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}\right)=\sigma^{2} v_{01}\) and \(\operatorname{var}\left(\widehat{\beta}_{1}\right)=\sigma^{2} v_{11}\) (a) Show that $$ \frac{\widehat{\beta}_{0}-\psi \widehat{\beta}_{1}}{\left\\{s^{2}\left(v_{00}-2 \psi v_{01}+\psi^{2} v_{11}\right)\right\\}^{1 / 2}} \sim t_{n-p} $$ and hence deduce that a \((1-2 \alpha)\) confidence interval for \(\psi\) is the set of values of \(\psi\) satisfying the inequality $$ \widehat{\beta}_{0}^{2}-s^{2} t_{n-p}^{2}(\alpha) v_{00}+2 \psi\left\\{s^{2} t_{n-p}^{2}(\alpha) v_{01}-\beta_{0} \beta_{1}\right\\}+\psi^{2}\left\\{\widehat{\beta}_{1}^{2}-s^{2} t_{n-p}^{2}(\alpha) v_{11}\right\\} \leq 0 $$ How would this change if the value of \(\sigma\) was known? (b) By considering the coefficients on the left-hand-side of the inequality in (a), show that the confidence set can be empty, a finite interval, semi- infinite intervals stretching to \(\pm \infty\), the entire real line, two disjoint semi-infinite intervals - six possibilities in all. In each case illustrate how the set could arise by sketching a set of data that might have given rise to it. (c) A government Department of Fisheries needed to estimate how many of a certain species of fish there were in the sea, in order to know whether to continue to license commercial fishing. Each year an extensive sampling exercise was based on the numbers of fish caught, and this resulted in three numbers, \(y, x\), and a standard deviation for \(y, \sigma\). A simple model of fish population dynamics suggested that \(y=\beta_{0}+\beta_{1} x+\varepsilon\), where the errors \(\varepsilon\) are independent, and the original population size was \(\psi=\beta_{0} / \beta_{1}\). To simplify the calculations, suppose that in each year \(\sigma\) equalled 25 . If the values of \(y\) and \(x\) had been \(\begin{array}{cccccc}y: & 160 & 150 & 100 & 80 & 100 \\ x: & 140 & 170 & 200 & 230 & 260\end{array}\) after five years, give a \(95 \%\) confidence interval for \(\psi\). Do you find it plausible that \(\sigma=25\) ? If not, give an appropriate interval for \(\psi\).

The usual linear model \(y=X \beta+\varepsilon\) is thought to apply to a set of data, and it is assumed that the \(\varepsilon_{j}\) are independent with means zero and variances \(\sigma^{2}\), so that the data are summarized in terms of the usual least squares estimates and estimate of \(\sigma^{2}, \widehat{\beta}\) and \(S^{2}\). Unknown to the unfortunate investigator, in fact \(\operatorname{var}\left(\varepsilon_{j}\right)=v_{j} \sigma^{2}\), and \(v_{1}, \ldots, v_{n}\) are unequal. Show that \(\widehat{\beta}\) remains unbiased for \(\beta\) and find its actual covariance matrix.

Data \(\left(x_{1}, y_{1}\right), \ldots,\left(x_{n}, y_{n}\right)\) satisfy the straight-line regression model (5.3). In a calibration problem the value \(y_{+}\)of a new response independent of the existing data has been observed, and inference is required for the unknown corresponding value \(x_{+}\)of \(x\). (a) Let \(s_{x}^{2}=\sum\left(x_{j}-\bar{x}\right)^{2}\) and let \(S^{2}\) be the unbiased estimator of the error variance \(\sigma^{2}\). Show that $$ T\left(x_{+}\right)=\frac{Y_{+}-\widehat{\gamma}_{0}-\widehat{\gamma}_{1}\left(x_{+}-\bar{x}\right)}{\left[S^{2}\left\\{1+n^{-1}+\left(x_{+}-\bar{x}\right)^{2} / s_{x}^{2}\right\\}\right]^{1 / 2}} $$ is a pivot, and explain why the set $$ \mathcal{X}_{1-2 \alpha}=\left\\{x_{+}: t_{n-2}(\alpha) \leq T\left(x_{+}\right) \leq t_{n-2}(1-\alpha)\right\\} $$ contains \(x_{+}\)with probability \(1-2 \alpha\). (b) Show that the function \(g(u)=(a+b u) /\left(c+u^{2}\right)^{1 / 2}, c>0, a, b \neq 0\), has exactly one stationary point, at \(\tilde{u}=-b c / a\), that sign \(g(\tilde{u})=\operatorname{sign} a\), that \(g(\tilde{u})\) is a local maximum if \(a>0\) and a local minimum if \(a<0\), and that \(\lim _{u \rightarrow \pm \infty} g(u)=\mp b .\) Hence sketch \(g(u)\) in the four possible cases \(a, b<0, a, b>0, a<0

In the normal straight-line regression model it is thought that a power transformation of the covariate may be needed, that is, the model $$ y=\beta_{0}+\beta_{1} x^{(\lambda)}+\varepsilon $$ may be suitable, where \(x^{(\lambda)}\) is the power transformation $$ x^{(\lambda)}= \begin{cases}\frac{x^{\lambda}-1}{\lambda}, & \lambda \neq 0 \\\ \log x, & \lambda=0\end{cases} $$ (a) Show by Taylor series expansion of \(x^{(\lambda)}\) at \(\lambda=1\) that a test for power transformation can be based on the reduction in sum of squares when the constructed variable \(x \log x\) is added to the model with linear predictor \(\beta_{0}+\beta_{1} x\). (b) Show that the profile log likelihood for \(\lambda\) is equivalent to \(\ell_{\mathrm{p}}(\lambda) \equiv-\frac{n}{2} \log \operatorname{SS}\left(\widehat{\beta}_{\lambda}\right)\), where \(S S\left(\widehat{\beta}_{\lambda}\right)\) is the residual sum of squares for regression of \(y\) on the \(n \times 2\) design matrix with a column of ones and the column consisting of the \(x_{j}^{(\lambda)}\). Why is a Jacobian for the transformation not needed in this case, unlike in Example \(8.23 ?\) (Box and Tidwell, 1962 )

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.