/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 3 By writing \(\sum\left\\{y_{j}-\... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

By writing \(\sum\left\\{y_{j}-\widehat{g}\left(x_{j}\right)\right\\}^{2}=(y-\widehat{g})^{\mathrm{T}}(y-\widehat{g})\) and recalling that \(y=g+\varepsilon\) and \(\widehat{g}=S y\), where \(S\) is a smoothing matrix, show that $$ \mathrm{E}\left[\sum_{j=1}^{n}\left\\{y_{j}-\widehat{g}\left(x_{j}\right)\right\\}^{2}\right]=\sigma^{2}\left(n-2 v_{1}+v_{2}\right)+g^{\mathrm{T}}(I-S)^{\mathrm{T}}(I-S) g $$ Hence explain the use of \(s^{2}(h)\) as an estimator of \(\sigma^{2}\). Under what circumstances is it unbiased?

Short Answer

Expert verified
\(s^2(h)\) is unbiased for \(\sigma^2\) if the bias term \(g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g\) is negligible or zero.

Step by step solution

01

Express the Sum of Squares

We start with the expression \( \sum (y_j - \widehat{g}(x_j))^2 \). This can be rewritten in matrix notation as \((y - \widehat{g})^\mathrm{T}(y - \widehat{g})\). Given that \( y = g + \varepsilon \), we substitute \( \widehat{g} = S y = S(g + \varepsilon) \), yielding \( \widehat{g} = Sg + S\varepsilon \). Hence, \( y - \widehat{g} = (g + \varepsilon) - (Sg + S\varepsilon) = (I - S)g + (I - S)\varepsilon \).
02

Compute the Expected Value

To find \( \mathrm{E}\left[\sum (y_j - \widehat{g}(x_j))^2\right] \), consider both components. Expanding the matrix multiplication, we have \(( (I-S)g + (I-S)\varepsilon )^\mathrm{T} ( (I-S)g + (I-S)\varepsilon )\). This simplifies to three terms after applying linearity of expectation and \( \mathrm{E}[\varepsilon] = 0 \): \( g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g + 2 \cdot \mathrm{E}[g^\mathrm{T}(I-S)^\mathrm{T}(I-S)\varepsilon] + \mathrm{E}[\varepsilon^\mathrm{T}(I-S)^\mathrm{T}(I-S)\varepsilon] \). The middle term vanishes by expectation properties. Therefore, the expected value is \( g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g + \mathrm{E}[\varepsilon^\mathrm{T}(I-S)^\mathrm{T}(I-S)\varepsilon] \).
03

Evaluate Variance Component

The variance component \( \mathrm{E}[\varepsilon^\mathrm{T}(I-S)^\mathrm{T}(I-S)\varepsilon] \) can be calculated as \( \sigma^2 \mathrm{tr}((I-S)^\mathrm{T}(I-S)) \) because \( \varepsilon \) is a vector of i.i.d. errors with variance \( \sigma^2 \). Thanks to the properties of the trace, this results in \( \sigma^2 (n - 2v_1 + v_2) \), where \( v_1 = \mathrm{tr}(S) \) and \( v_2 = \mathrm{tr}(S^2) \).
04

Final Expected Value

Combine results from steps 2 and 3 to conclude: \[ \mathrm{E}\left[\sum_{j=1}^n(y_j - \widehat{g}(x_j))^2\right] = \sigma^2(n - 2 v_1 + v_2) + g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g \]. This expression represents the expected value of the sum of squared errors, including both a variance term and a bias term related to the function \( g \).
05

Estimation of \( \sigma^2 \) Using \( s^2(h) \)

The term \( s^2(h) \) is used to estimate \( \sigma^2 \) by assuming that the bias term \( g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g \) is negligible or zero. \( s^2(h) \) is unbiased for estimating \( \sigma^2 \) when the true function \( g \) is captured well by the smoother, making the bias component practically zero.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Smoothing Matrix
The concept of a smoothing matrix is pivotal in statistical inference, especially when working with regression and smoothing splines. A smoothing matrix, often denoted as \( S \), is applied to data to create a smoothed version of the response variable. For instance, in our exercise, \( \widehat{g} = S y \) represents the smoothed estimate of the response variable \( y \).
The smoothing matrix essentially serves to transform the observed data, thereby reducing noise and extracting signal. Its main role is to balance fidelity to data with the smoothness of the function estimate. Internally, it adapts the model fit by allowing a trade-off between accuracy and generalization.
In the context of this problem, the smoothing matrix is crucial for defining \( \widehat{g}(x_j) \), which involves transforming the original responses using \( S \). As such, it acts as a filter that adjusts the contribution of each observation to the overall smoothed output. This helps in minimizing overfitting and providing a clearer view of the underlying trend.
Expected Value
In probability theory and statistics, the expected value is a fundamental concept that provides the average of a random variable when measured over a large number of trials. For the given exercise, computing the expected value involves examining the expectation of the sum of squared differences between the observed \( y_j \) and smoothed estimates \( \widehat{g}(x_j) \).
The problem's solution requires expanding the expression \[(y - \widehat{g})^\mathrm{T}(y - \widehat{g})\] and applying the linearity of expectation. By recognizing the statistical independence and zero mean property of the error term \( \varepsilon \), we can effectively separate the expectation into simple parts. The term \[g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g\] represents the deterministic bias component, while \[ \mathrm{E}[\varepsilon^\mathrm{T}(I-S)^\mathrm{T}(I-S)\varepsilon]\] relates to the stochastic volatility captured by the variance of the error term.
Sum of Squares
In statistical analysis, the sum of squares is a measure of variance, indicating how much the data varies from the mean. It is commonly used in regression models to calculate the total discrepancy between the observed data and the estimated model. Let's explore its role in this specific context.
The equation \[\sum \{ y_j - \widehat{g}(x_j) \}^2 = (y - \widehat{g})^\mathrm{T}(y - \widehat{g}) \] is a representation of the sum of squared errors, which showcases the difference between observed values \( y_j \) and estimated values from the model \( \widehat{g}(x_j) \).
This exercise further breaks down the summation by recognizing \( y = g + \varepsilon \) and substituting \( \widehat{g} = S y \), allowing us to see how the smoothing matrix modifies both signal and noise. These manipulations guide us to express the sum of squares in terms of both variance (due to \( \varepsilon \)) and bias (due to \( g \)).
Unbiased Estimator
An unbiased estimator is a statistical tool that, on average, produces the true parameter value being estimated across numerous samples. For an estimator to be unbiased, the expected value of its estimates is equal to the true parameter value.
In this exercise, \( s^2(h) \) is employed as an estimator for \( \sigma^2 \), the variance of the error term. It adapts the basic principle of capturing the scale of variation without bias. The formula for the unbiased condition is grounded in simplifying the expected value equation
\[ E\left[\sum_{j=1}^n(y_j - \widehat{g}(x_j))^2\right] = \sigma^2(n - 2v_1 + v_2) + g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g \]When the bias term, \( g^\mathrm{T}(I-S)^\mathrm{T}(I-S)g \), becomes negligible typically due to sufficient fitting by \( S \), \( s^2(h) \) becomes unbiased. This happens when the smoothing function \( \widehat{g} \) closely approximates the true function \( g \), as indicated by low deviation in error term estimates.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

The rate of growth of an epidemic such as AIDS for a large population can be estimated fairly accurately and treated as a known function \(g(t)\) of time \(t\). In a smaller area where few cases have been observed the rate is hard to estimate because data are scarce. However predictions of the numbers of future cases in such an area must be made in order to allocate resources such as hospital beds. A simple assumption is that cases in the area arise in a non- homogeneous Poisson process with rate \(\lambda g(t)\), for which the mean number of cases in period \(\left(t_{1}, t_{2}\right)\) is \(\lambda \int_{t_{1}}^{t_{2}} g(t) d t\). Suppose that \(N_{1}=n_{1}\) individuals with the disease have been observed in the period \((-\infty, 0)\), and that predictions are required for the number \(N_{2}\), of cases to be observed in a future period \(\left(t_{1}, t_{2}\right)\). (a) Find the conditional distribution of \(N_{2}\) given \(N_{1}+N_{2}\), and show it to be free of \(\lambda\). Deduce that a \((1-2 \alpha)\) prediction interval \(\left(n_{-}, n_{+}\right)\)for \(N_{2}\) is found by solving approximately the equations $$ \begin{aligned} &\alpha=\operatorname{Pr}\left(N_{2} \leq n_{-} \mid N_{1}+N_{2}=n_{1}+n_{-}\right) \\ &\alpha=\operatorname{Pr}\left(N_{2} \geq n_{+} \mid N_{1}+N_{2}=n_{1}+n_{+}\right) \end{aligned} $$ (b) Use a normal approximation to the conditional distribution in (a) to show that for moderate to large \(n_{1}, n_{-}\)and \(n_{+}\)are the solutions to the quadratic equation $$ (1-p)^{2} n^{2}+p(p-1)\left(2 n_{1}+z_{\alpha}^{2}\right) n+n_{1} p\left\\{n_{1} p-(1-p) z_{\alpha}^{2}\right\\}=0 $$ where \(\Phi\left(z_{\alpha}\right)=\alpha\) and $$ p=\int_{t_{1}}^{t_{2}} g(t) d t /\left\\{\int_{t_{1}}^{t_{2}} g(t) d t+\int_{-\infty}^{0} g(t) d t\right\\} $$ (c) Find approximate \(0.90\) prediction intervals for the special case where \(g(t)=2^{t / 2}\), so that the doubling time for the epidemic is two years, \(n_{1}=10\) cases have been observed until time 0 , and \(t_{1}=0, t_{2}=1\) (next year) (Cox and Davison, 1989).

Consider a linear smoother with \(n \times n\) smoothing matrix \(S_{h}\), so \(\widehat{g}=S_{h} y\), and show that the function \(a_{j}(u)\) giving the fitted value at \(x_{j}\) as a function of the response \(u\) there satisfies $$ a_{j}(u)= \begin{cases}\widehat{g}\left(x_{j}\right), & u=y_{j} \\\ \widehat{g}_{-j}\left(x_{j}\right), & u=\widehat{g}_{-j}\left(x_{j}\right)\end{cases} $$ Explain why this implies that \(S_{j j}(h)\left\\{y_{j}-\widehat{g}_{-j}\left(x_{j}\right)\right\\}=\widehat{g}\left(x_{j}\right)-\widehat{g}_{-j}\left(x_{j}\right)\), and hence obtain \((10.42)\)

Suppose that the cumulant-generating function of \(X\) can be written in the form \(m\\{b(\theta+\) \(t)-b(\theta)\\}\). Let \(\mathrm{E}(X)=\mu=m b^{\prime}(\theta)\) and let \(\kappa_{2}(\mu)\) and \(\kappa_{3}(\mu)\) be the variance and third cumulant respectively of \(X\), expressed in terms of \(\mu ; \kappa_{2}(\mu)\) is the variance function \(V(\mu)\). (a) Show that $$ \kappa_{3}(\mu)=\kappa_{2}(\mu) \kappa_{2}^{\prime}(\mu) \quad \text { and } \quad \frac{\kappa_{3}}{\kappa_{2}^{2}}=\frac{d}{d \mu} \log \kappa_{2}(\mu) $$ Verify that the binomial cumulants have this form with \(b(\theta)=\log \left(1+e^{\theta}\right)\). (b) Show that if the derivatives of \(b(\theta)\) are all \(O(1)\), then \(Y=g(X)\) is approximately symmetrically distributed if \(g\) satisfies the second-order differential equation $$ 3 \kappa_{2}^{2}(\mu) g^{\prime \prime}(\mu)+g^{\prime}(\mu) \kappa_{3}(\mu)=0 $$ Show that if \(\kappa_{2}(\mu)\) and \(\kappa_{3}(\mu)\) are related as in (a), then $$ g(x)=\int^{x} \kappa_{2}^{-1 / 3}(\mu) d \mu $$ (c) Hence find symmetrizing transformations for Poisson and binomial variables. (McCullagh and Nelder, 1989 , Section 4.8)

A positive stable random variable \(U\) has \(\mathrm{E}\left(e^{-s U}\right)=\exp \left(-\delta s^{\alpha} / \alpha\right), 0<\alpha \leq 1\) (a) Show that if \(Y\) follows a proportional hazards model with cumulative hazard function \(u \exp \left(x^{\mathrm{T}} \beta\right) H_{0}(y)\), conditional on \(U=u\), then \(Y\) also follows a proportional hazards model unconditionally. Are \(\beta, \alpha\), and \(\delta\) estimable from data with single individuals only? (b) Consider a shared frailty model, as in the previous question, with positive stable \(U\). Show that the joint survivor function may be written as $$ \mathcal{F}\left(y_{1}, y_{2}\right)=\exp \left(-\left[\left\\{-\log \mathcal{F}_{1}\left(y_{1}\right)\right\\}^{1 / \alpha}+\left\\{-\log \mathcal{F}_{2}\left(y_{2}\right)\right\\}^{1 / \alpha}\right]^{\alpha}\right), \quad y_{1}, y_{2}>0 $$ in terms of the marginal survivor functions \(\mathcal{F}_{1}\) and \(\mathcal{F}_{2}\). Show that if the conditional cumulative hazard functions are Weibull, \(u H_{r}(y)=u \xi_{r} y^{\gamma}, \gamma>0, r=1,2\), then the marginal survivor functions are also Weibull. Show also that the time to the first event has a Weibull distribution.

Show that if \(Y\) is continuous with cumulative hazard function \(H(y)\), then \(H(Y)\) has the unit exponential distribution. Hence establish that \(\mathrm{E}\\{H(Y) \mid Y>c\\}=1+H(c)\), and explain the reasoning behind (10.55).

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.