/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 37 Nonparametric regression Nonpara... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

Nonparametric regression Nonparametric methods have also been devised for regression. Here's a simple way to estimate the slope: For each pair of subjects, the slope of the line connecting their two points is the difference between their \(y\) values divided by the difference between their \(x\) values. (See the figure.) With \(n\) subjects, we can find this slope for each pair of points. (There are \(n(n-1) / 2\) pairs of points.) A nonparametric estimate of the slope is the median of all these slopes for the various pairs of points. The ordinary slope (least squares, minimizing the sum of squared residuals) can be strongly affected by a regression outlier. Is this true also for the nonparametric estimate of the slope? Why or why not?

Short Answer

Expert verified
No, the nonparametric slope estimate is not strongly affected by outliers because it uses the median, which is robust to extreme values.

Step by step solution

01

Understanding Nonparametric Slope Estimation

In nonparametric regression, we estimate the slope between each pair of data points by calculating the difference in their y-values divided by the difference in their x-values. For \(n\) subjects, there are \(\frac{n(n-1)}{2}\) total pairs, each resulting in a calculated slope.
02

Determine the Median Slope

Once all the slopes between pairs of data points have been calculated, the nonparametric estimate of the overall slope is determined by finding the median of these slopes. This approach reflects a typical or central tendency of the slopes between all pairs.
03

Impact of Outliers on Nonparametric Slope

The median, as a measure of central tendency, is less sensitive to extreme values (outliers) than the mean. Therefore, if there is a regression outlier, the median of the slopes will not be affected as much as the slope calculated by ordinary least squares (OLS), which minimizes the sum of squared residuals and is sensitive to outliers.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Slope Estimation
Slope estimation in nonparametric regression may seem complicated, but it's quite simple once you get the hang of it. Imagine you have a bunch of data points plotted on a graph. To estimate the slope between these points, you pair them up. For each pair, calculate the change in their y-values and divide that by the change in their x-values. This gives you the slope for that pair.
Sounds easy, right? If you have a total of \( n \) points, you will end up calculating slopes for \( \frac{n(n-1)}{2} \) pairs. This nonparametric method shines because it doesn't assume a particular mathematical form for the relationship between variables. Instead, it focuses purely on the data presented.
This approach allows for flexibility and is not overly influenced by the assumptions of traditional parametric methods like linear regression.
Median Slope
The concept of the median slope is central to nonparametric regression. After calculating all the individual slopes between pairs of data points, the next step is to find the median of these slopes.
Why the median? Well, the median is the middle value in a list of numbers when they're sorted in ascending order. It provides a measure of central tendency. In the context of slope estimation, the median slope offers a robust estimate that represents a typical slope across all data pairs.
This approach is especially useful because it eliminates the effect of extremely high or low slopes, which can skew the results. It's like finding a stable center point in a chaotic set of data. As a result, the median slope can be a more reliable indicator of the overall trend between variables, especially in the presence of noisy data.
Impact of Outliers
Outliers can often wreak havoc in statistical analysis, particularly when using methods that rely heavily on mean values. But how do they affect nonparametric slope estimation?
Unlike traditional methods, such as the ordinary least squares (OLS) regression that tries to fit a line minimizing the squared differences from the data points, nonparametric estimation through median slope is resilient to outliers. The median is very good at remaining stable and unaffected, even when a few data points lie far away from the others — these are the outliers.
The reason is simple: the median only considers the middle point of an ordered data set, which makes it insensitive to extreme values at either end. This ability to "ignore" those extremities means that, in nonparametric regression, the median slope is less impacted by outliers. It maintains a more consistent and reliable relationship representation, providing you with an estimate that is less likely to be skewed by anomalous data. This makes the nonparametric approach particularly appealing in real-world data scenarios where outliers are common.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Why nonparametrics? Present a situation for which it's preferable to use a nonparametric method instead of a parametric method and explain why.

What's the best way to learn French? Exercise 14.3 gave the data in the table for scores on the first quiz for ninthgrade students in an introductory-level French course. The instructor grouped the students in the course as follows: Group 1: Never studied foreign language before, but have good English skills Group 2: Never studied foreign language before; have poor English skills Group 3: Studied at least one other foreign language The table also shows results of using MINITAB to perform the Kruskal-Wallis test. a. Find the rank associated with each observation and show how to find the mean rank for Group \(1 .\) b. Report and interpret the P-value for the test. $$ \begin{array}{ccc} \hline \text { Scores on the quiz } & & \\ \hline \text { Group 1 } & \text { Group 2 } & \text { Group 3 } \\ \hline 4 & 1 & 9 \\ 6 & 5 & 10 \\ 8 & & 5 \\ \hline \end{array} $$

Trading volumes The following data show the number of shares of General Electric stock traded on Mondays and on Fridays from February through April of 2011 . The trading volumes (rounded to the nearest million) are as follows: Mondays: 45,43,43,66,91,53,35,45,29,64,56 Fridays: 43,41,45,46,61,56,80,40,48,49,50,41 Using software, a. Plot the data. Summarize what the plot shows. b. State the hypotheses and give the P-value for the Wilcoxon test for comparing the two groups with a twosided alternative hypothesis. (As one option, you can find an accurate approximation of the exact P-value by using the Permutation Test for Means web app.) c. A \(95.5 \%\) confidence interval for comparing the population medians equals \((-11,13) .\) Interpret and explain what (if any) effect the day of the week (Monday versus Friday) has on the median number of shares traded. d. State the assumptions for the methods in parts \(\mathrm{b}\) and \(\mathrm{c}\).

Does exercise help blood pressure? Exercise 10.50 in Chapter 10 discussed a pilot study of people who suffer from abnormally high blood pressure. A medical researcher decides to test her belief that walking briskly for at least half an hour a day has the effect of lowering blood pressure. She randomly samples three of her patients who have high blood pressure. She measures their systolic blood pressure initially and then again a month later after they participate in her exercise program. The table shows the results. Show how to analyze the data with the sign test. State the hypotheses, find the P-value, and interpret. $$ \begin{aligned} &\begin{array}{ccc} \hline \text { Subject } & \text { Before } & \text { After } \\ \hline 1 & 150 & 130 \\ 2 & 165 & 140 \\ 3 & 135 & 120 \\ \hline \end{array}\\\ &\text { 4. More on blood pressure } \quad \text { Refer to the previous } \end{aligned} $$

Multiple choice Nonparametric statistical methods are used a. Whenever the response variable is known to have a normal distribution. b. Whenever the assumptions for a parametric method are not perfectly satisfied. c. When the data are ranks for the subjects rather than quantitative measurements or when it's inappropriate to assume normality and the ordinary statistical method is not robust when the normal assumption is violated. d. Whenever we want to compare two methods for getting a good tan.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.