Problem 31 In a study of exhaust emissions ... [FREE SOLUTION]

Chapter 20: Problem 31

In a study of exhaust emissions from school buses, the pollution intake by passengers was determined for a sample of nine school buses used in the Southern California Air Basin. The pollution intake is the amount of exhaust emissions, in grams per person, that would be inhaled while traveling on the bus during its usual 18-mile trip on congested freeways from South Central LA to a magnet school in West LA. (As a reference, the average intake of motor emissions of carbon monoxide in the LA area is estimated to be about \(0.000046\) gram per person.) Here are the amounts for the nine buses when driven with the windows open: 20 \(\begin{array}{lllllllll}1.15 & 0.33 & 0.40 & 0.33 & 1.35 & 0.38 & 0.25 & 0.40 & 0.35\end{array}\) (a) Make a stemplot. Are there outliers or strong skewness that would preclude use of the \(t\) procedures? (b) A good way to judge the effect of outliers is to do your analysis twice, once with the outliers and a second time without them. Give two \(90 \%\) confidence intervals, one with all the data and one with the outliers removed, for the mean pollution intake among all school buses used in the Southern California Air Basin that travel the route investigated in the study. (c) Compare the two intervals in part (b). What is the most important effect of removing the outliers?

Short Answer

Expert verified

Outliers do not skew results significantly; removing "outliers" narrows the confidence interval.

Step by step solution

Create a Stemplot

To create a stemplot, break down each pollution intake value into a stem and a leaf. The stem is the integer part, and the leaf is the decimal part. Write down: 1 | 15 35 0 | 33 40 33 38 25 40 35. Looking at the stemplot, there are no apparent outliers or strong skewness indicated.

Identify Outliers

Use an outlier detection method such as the 1.5 times the interquartile range (IQR) to identify outliers. Calculate the quartiles:\( Q1 = 0.33 \) and \( Q3 = 0.40 \), giving \( IQR = Q3 - Q1 = 0.07 \).An outlier is a data point below \( Q1 - 1.5*IQR = 0.23 \) or above \( Q3 + 1.5*IQR = 0.50 \). No data point exceeds these bounds, indicating no outliers.

Compute 90% Confidence Interval with All Data

First, calculate the mean \( \bar{x} \) and standard deviation \( s \) of the data. \( \bar{x} = \frac{1.15 + 0.33 + 0.40 + 0.33 + 1.35 + 0.38 + 0.25 + 0.40 + 0.35}{9} = 0.546 \), and \( s \approx 0.381 \). Use \( t \)-distribution with 8 degrees of freedom:\[ \bar{x} \pm t* \frac{s}{\sqrt{n}} \]Based on \( t \approx 1.860 \), calculate: \[ 0.546 \pm 1.860*\frac{0.381}{3} \approx 0.546 \pm 0.236 \].The interval is (0.310, 0.782).

Recalculate 90% Confidence Interval Without "Outliers"

Although no outliers were detected, for educational purposes, let's remove the highest \( 1.35 \) and recalculate with the remaining 8 data points. \( \bar{x} = \frac{1.15 + 0.33 + 0.40 + 0.33 + 0.38 + 0.25 + 0.40 + 0.35}{8} = 0.448 \) and \( s \approx 0.295 \). With 7 degrees of freedom:\[ \bar{x} \pm t* \frac{s}{\sqrt{n}} \]Use \( t \approx 1.895 \), calculate: \[ 0.448 \pm 1.895*\frac{0.295}{2.828} \approx 0.448 \pm 0.198 \].The interval is (0.250, 0.646).

Compare Intervals

The interval without the highest value \(1.35\) is narrower and slightly shifts left (lower mean estimate), reflecting less variability. Removing potential high outliers results in a confidence interval that may provide a tighter estimate, but it could underrepresent the variability if that data point is legitimate.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91影视!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Stemplot

A stemplot is a simple method to visualize numerical data while retaining the original values. It works similarly to a histogram but shows more detail. When creating a stemplot, you break down each data point into two parts: the stem and the leaf.
The stem is the main part, made of the leading digits of the number, and the leaf is the final digit. For example, in the emission data of 1.15, the "1" is the stem and "15" is the leaf. This makes the stemplot a practical tool in exploring the distribution of the data.
Stemplots help identify general patterns such as skewness, outliers, and overall distribution. In this exercise, the stemplot showed no skewness or outliers, meaning it was appropriate to use the t-distribution for further analysis. Stemplots are particularly useful with small datasets to detect anomalies or outliers at a glance.

Confidence Interval

A confidence interval provides a range within which we believe the true population mean lies. The width of this interval gives us insight into the precision of our estimate鈥攖he narrower, the more precise.
In this exercise, a 90% confidence interval was calculated for pollution intake. This interval means that if we were to repeat the study many times, 90% of the intervals would contain the true mean of the pollution intake.
The investigation included calculating two intervals: one with all data points and another after removing a high value, assumed as an 'outlier' for learning purposes.
By comparing these intervals, students see how data variability affected by extreme values can influence our confidence in the estimate of the mean.

Outliers

Outliers are data points significantly different from others in a dataset. They can distort statistical analyses, especially when calculating measures like the mean.
In this example, outliers were checked using the interquartile range (IQR) method, calculating:

The first quartile (Q1)
The third quartile (Q3)
The IQR, as Q3 minus Q1

An outlier lies outside the range defined by Q1 - 1.5*IQR and Q3 + 1.5*IQR. Here, no data fell outside these bounds.
Although no true outliers were found, removing a high value let us observe effects on the statistics, showing how results can shift when data variability changes.

t-distribution

The t-distribution is a probability distribution often used instead of the normal distribution when samples are small or the population standard deviation is unknown. It's correct to employ the t-distribution when the exact standard deviation of the population isn't known, which is typical in real-world scenarios.
In this case, with 9 bus samples, the t-distribution was appropriate due to the small sample size and unknown population parameters.
The t-distribution has fatter tails than the normal distribution, which means it can better accommodate the uncertainty in the estimate of the standard deviation, making it ideal for smaller sample sizes.
Understanding when and why to use the t-distribution is crucial in statistics, as it helps ensure conclusions drawn from sample data are as accurate as possible given the available information.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

91影视

Short Answer

Step by step solution

Create a Stemplot

Identify Outliers

Compute 90% Confidence Interval with All Data

Recalculate 90% Confidence Interval Without "Outliers"

Compare Intervals

Key Concepts

Stemplot

Confidence Interval

Outliers

t-distribution

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Probability and Statistics

Statistics

Applied Mathematics

Discrete Mathematics

Pure Maths

Calculus

Study anywhere. Anytime. Across all devices.