/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 31 In a study of exhaust emissions ... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

In a study of exhaust emissions from school buses, the pollution intake by passengers was determined for a sample of nine school buses used in the Southern California Air Basin. The pollution intake is the amount of exhaust emissions, in grams per person, that would be inhaled while traveling on the bus during its usual 18-mile trip on congested freeways from South Central LA to a magnet school in West LA. (As a reference, the average intake of motor emissions of carbon monoxide in the LA area is estimated to be about \(0.000046\) gram per person.) Here are the amounts for the nine buses when driven with the windows open: 20 \(\begin{array}{lllllllll}1.15 & 0.33 & 0.40 & 0.33 & 1.35 & 0.38 & 0.25 & 0.40 & 0.35\end{array}\) (a) Make a stemplot. Are there outliers or strong skewness that would preclude use of the \(t\) procedures? (b) A good way to judge the effect of outliers is to do your analysis twice, once with the outliers and a second time without them. Give two \(90 \%\) confidence intervals, one with all the data and one with the outliers removed, for the mean pollution intake among all school buses used in the Southern California Air Basin that travel the route investigated in the study. (c) Compare the two intervals in part (b). What is the most important effect of removing the outliers?

Short Answer

Expert verified
Outliers do not skew results significantly; removing "outliers" narrows the confidence interval.

Step by step solution

01

Create a Stemplot

To create a stemplot, break down each pollution intake value into a stem and a leaf. The stem is the integer part, and the leaf is the decimal part. Write down: 1 | 15 35 0 | 33 40 33 38 25 40 35. Looking at the stemplot, there are no apparent outliers or strong skewness indicated.
02

Identify Outliers

Use an outlier detection method such as the 1.5 times the interquartile range (IQR) to identify outliers. Calculate the quartiles:\( Q1 = 0.33 \) and \( Q3 = 0.40 \), giving \( IQR = Q3 - Q1 = 0.07 \).An outlier is a data point below \( Q1 - 1.5*IQR = 0.23 \) or above \( Q3 + 1.5*IQR = 0.50 \). No data point exceeds these bounds, indicating no outliers.
03

Compute 90% Confidence Interval with All Data

First, calculate the mean \( \bar{x} \) and standard deviation \( s \) of the data. \( \bar{x} = \frac{1.15 + 0.33 + 0.40 + 0.33 + 1.35 + 0.38 + 0.25 + 0.40 + 0.35}{9} = 0.546 \), and \( s \approx 0.381 \). Use \( t \)-distribution with 8 degrees of freedom:\[ \bar{x} \pm t* \frac{s}{\sqrt{n}} \]Based on \( t \approx 1.860 \), calculate: \[ 0.546 \pm 1.860*\frac{0.381}{3} \approx 0.546 \pm 0.236 \].The interval is (0.310, 0.782).
04

Recalculate 90% Confidence Interval Without "Outliers"

Although no outliers were detected, for educational purposes, let's remove the highest \( 1.35 \) and recalculate with the remaining 8 data points. \( \bar{x} = \frac{1.15 + 0.33 + 0.40 + 0.33 + 0.38 + 0.25 + 0.40 + 0.35}{8} = 0.448 \) and \( s \approx 0.295 \). With 7 degrees of freedom:\[ \bar{x} \pm t* \frac{s}{\sqrt{n}} \]Use \( t \approx 1.895 \), calculate: \[ 0.448 \pm 1.895*\frac{0.295}{2.828} \approx 0.448 \pm 0.198 \].The interval is (0.250, 0.646).
05

Compare Intervals

The interval without the highest value \(1.35\) is narrower and slightly shifts left (lower mean estimate), reflecting less variability. Removing potential high outliers results in a confidence interval that may provide a tighter estimate, but it could underrepresent the variability if that data point is legitimate.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Stemplot
A stemplot is a simple method to visualize numerical data while retaining the original values. It works similarly to a histogram but shows more detail. When creating a stemplot, you break down each data point into two parts: the stem and the leaf.
The stem is the main part, made of the leading digits of the number, and the leaf is the final digit. For example, in the emission data of 1.15, the "1" is the stem and "15" is the leaf. This makes the stemplot a practical tool in exploring the distribution of the data.
Stemplots help identify general patterns such as skewness, outliers, and overall distribution. In this exercise, the stemplot showed no skewness or outliers, meaning it was appropriate to use the t-distribution for further analysis. Stemplots are particularly useful with small datasets to detect anomalies or outliers at a glance.
Confidence Interval
A confidence interval provides a range within which we believe the true population mean lies. The width of this interval gives us insight into the precision of our estimate—the narrower, the more precise.
In this exercise, a 90% confidence interval was calculated for pollution intake. This interval means that if we were to repeat the study many times, 90% of the intervals would contain the true mean of the pollution intake.
The investigation included calculating two intervals: one with all data points and another after removing a high value, assumed as an 'outlier' for learning purposes.
By comparing these intervals, students see how data variability affected by extreme values can influence our confidence in the estimate of the mean.
Outliers
Outliers are data points significantly different from others in a dataset. They can distort statistical analyses, especially when calculating measures like the mean.
In this example, outliers were checked using the interquartile range (IQR) method, calculating:
  • The first quartile (Q1)
  • The third quartile (Q3)
  • The IQR, as Q3 minus Q1
An outlier lies outside the range defined by Q1 - 1.5*IQR and Q3 + 1.5*IQR. Here, no data fell outside these bounds.
Although no true outliers were found, removing a high value let us observe effects on the statistics, showing how results can shift when data variability changes.
t-distribution
The t-distribution is a probability distribution often used instead of the normal distribution when samples are small or the population standard deviation is unknown. It's correct to employ the t-distribution when the exact standard deviation of the population isn't known, which is typical in real-world scenarios.
In this case, with 9 bus samples, the t-distribution was appropriate due to the small sample size and unknown population parameters.
The t-distribution has fatter tails than the normal distribution, which means it can better accommodate the uncertainty in the estimate of the standard deviation, making it ideal for smaller sample sizes.
Understanding when and why to use the t-distribution is crucial in statistics, as it helps ensure conclusions drawn from sample data are as accurate as possible given the available information.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Does a football filled with helium travel farther than one filled with ordinary air? To test this, the Columbus Dispatch conducted a study. Two identical footballs, one filled with helium and one filled with ordinary air, were used. A casual observer was unable to detect a difference in the two footballs. A novice kicker was used to punt the footballs. A trial consisted of kicking both footballs in a random order. The kicker did not know which football (the helium-filled or the air-filled football) he was kicking. The distance of each punt was recorded. Then another trial was conducted. A total of 39 trials were run. Here are the data for the 39 trials, in yards that the footballs traveled. The difference (helium minus air) is the response variable. \({ }^{25}\) $$ \begin{array}{l|rrrrrrrrrr} \hline \text { Helium } & 25 & 16 & 25 & 14 & 23 & 29 & 25 & 26 & 22 & 26 \\ \text { Air } & 25 & 23 & 18 & 16 & 35 & 15 & 26 & 24 & 24 & 28 \\ \hline \text { Difference } & 0 & -7 & 7 & -2 & -12 & 14 & -1 & 2 & -2 & -2 \\\ \hline \text { Helium } & 12 & 28 & 28 & 31 & 22 & 29 & 23 & 26 & 35 & 24 \\ \text { Air } & 25 & 19 & 27 & 25 & 34 & 26 & 20 & 22 & 33 & 29 \\ \hline \text { Difference } & -13 & 9 & 1 & 6 & -12 & 3 & 3 & 4 & 2 & -5 \\ \hline \text { Helium } & 31 & 34 & 39 & 32 & 14 & 28 & 30 & 27 & 33 & 11 \\ \text { Air } & 31 & 27 & 22 & 29 & 28 & 29 & 22 & 31 & 25 & 20 \\ \hline \text { Difference } & 0 & 7 & 17 & 3 & -14 & -1 & 8 & -4 & 8 & -9 \\ \hline \text { Helium } & 26 & 32 & 30 & 29 & 30 & 29 & 29 & 30 & 26 & \\ \hline \text { Air } & 27 & 26 & 28 & 32 & 28 & 25 & 31 & 28 & 28 & \\ \hline \text { Difference } & -1 & 6 & 2 & -3 & 2 & 4 & -2 & 2 & -2 & \\ \hline \end{array} $$ (a) Examine the data. Is it reasonable to use the \(t\) procedures? (b) If your conclusion in part (a) is Yes, do the data give convincing evidence that the helium-filled football travels farther than the air-filled football?

The Trial Urban District Assessment (TUDA) is a government-sponsored study of student achievement in large urban school districts. TUDA gives a mathematics test scored from 0 to 500 . A score of 262 is a "basic" mathematics level and a score of 299 is "proficient." Scores for a random sample of 1100 eighth- graders in Dallas had \(x \bar{x}=271\) with standard error 1.3. 16 (a) We don't have the 1100 individual scores, but use of the \(t\) procedures is surely safe. Why? (b) Give a \(99 \%\) confidence interval for the mean score of all Dallas eighthgraders. (Be careful: the report gives the standard error of \(x^{-} \bar{x}\), not the standard deviation s.) (c) Urban children often perform below the basic level. Is there good evidence that the mean for all Dallas eighth-graders is more than the basic level?

Because the \(t\) procedures are robust, the most important condition for their safe use is that (a) the sample size is at least \(15 .\) (b) the population distribution is exactly Normal. (c) the data can be regarded as an SRS from the population.

Do wearable devices that monitor diet and physical activity help people lose weight? Researchers had 237 subjects, already involved in a program of diet and exercise, use wearable technology for 24 months. They measured their weight (in kilograms) before using the technology and 24 months after using the technology. 18 (a) Explain why the proper procedure to compare the mean weight before using the wearable technology and 24 months after using the wearable technology is a matched pairs \(t\) test. (b) The 237 differences in weight (weight after 24 months minus weight before using the wearable technology) had \(\mathrm{x}^{-} \bar{x}=-3.5\) and \(s=7.8\). Is there significant evidence of a reduction in weight after using the wearable technology?

Fortunately, we aren't really interested in the number of seeds velvetleaf plants produce (see Exercise \(20.41\) ). The velvetleaf seed beetle feeds on the seeds and might be a natural weed control. Here are the total seeds, seeds infected by the beetle, and percent of seeds infected for 28 velvetleaf plants: $$ \begin{array}{l|rrrrrrrrrr} \hline \text { Seeds } & 2450 & 2504 & 2114 & 1110 & 2137 & 8015 & 1623 & 1531 & 2008 & 1716 \\ \text { Infected } & 135 & 101 & 76 & 24 & 121 & 189 & 31 & 44 & 73 & 12 \\ \text { Percent } & 5.5 & 4.0 & 3.6 & 2.2 & 5.7 & 2.4 & 1.9 & 2.9 & 3.6 & 0.7 \\\ \hline \text { Seeds } & 721 & 863 & 1136 & 2819 & 1911 & 2101 & 1051 & 218 & 1711 & 164 \\ \text { Infected } & 27 & 40 & 41 & 79 & 82 & 85 & 42 & 0 & 64 & 7 \\ \text { Percent } & 3.7 & 4.6 & 3.6 & 2.8 & 4.3 & 4.0 & 4.0 & 0.0 & 3.7 & 4.3 \\\ \hline \text { Seeds } & 2228 & 363 & 5973 & 1050 & 1961 & 1809 & 130 & 880 & & \\ \text { Infected } & 156 & 31 & 240 & 91 & 137 & 92 & 5 & 23 & & \\ \text { Percent } & 7.0 & 8.5 & 4.0 & 8.7 & 7.0 & 5.1 & 3.8 & 2.6 & & \\ \hline \end{array} $$ Do a complete analysis of the percent of seeds infected by the beetle. Include a \(90 \%\) confidence interval for the mean percent infected in the population of all velvetleaf plants. Do you think that the beetle is very helpful in controlling the weed? Why is analyzing percent of seeds infected more useful than analyzing number of seeds infected?

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.