/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 63 O The amount of aluminum contami... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

O The amount of aluminum contamination (in parts per million) in plastic was determined for a sample of 26 plastic specimens, resulting in the following data ("The Log Normal Distribution for Modeling Quality Data When the Mean Is Near Zero," Journal of Quality Technology \([1990]: 105-110)\) : \(\begin{array}{rrrrrrrrr}30 & 30 & 60 & 63 & 70 & 79 & 87 & 90 & 101 \\ 102 & 115 & 118 & 119 & 119 & 120 & 125 & 140 & 145 \\ 172 & 182 & 183 & 191 & 222 & 244 & 291 & 511 & \end{array}\) Construct a boxplot that shows outliers, and comment on the interesting features of this plot.

Short Answer

Expert verified
The boxplot illustrates that most of the data points fall between the interquartile range of 74.5 ppm and 187 ppm with a median at 119.5 ppm. The range, or the smallest and the largest observations, go from 30 ppm to 511 ppm, respectively. However, the value 511 ppm is identified as an outlier as it falls above the upper whisker limit (355.75 ppm).

Step by step solution

01

Sorting the data

Start by organizing the data from least to greatest, this will make it easier to find the key features needed for the boxplot (minimum, maximum, and quartiles).
02

Determine key features

Next, identify the minimum and maximum values, these are the smallest and largest values in the sorted dataset. The minimum value is 30 and the maximum value is 511. Also, calculate the 1st, 2nd, and 3rd quartiles(Q1, Q2, Q3). You can do this by dividing the list into four equal parts. The median (Q2) should be the average of the 13th and 14th value in the sorted dataset. In this case, the median is \( \frac{119+120}{2}=119.5 \). Similarly, Q1 is the median of the lower half of the data (not including the median) and Q3 is the median of the upper half of the data. Here Q1 is \( \frac{70+79}{2}=74.5 \) and Q3 is \( \frac{183+191}{2}=187 \). Calculate the interquartile range (IQR), which is Q3-Q1, in this case, 187-74.5=112.5.
03

Identify Outliers

Outliers are data points that are significantly different from the others. These can be identified as values that fall below Q1 - 1.5* IQR or above Q3 + 1.5* IQR. This range is known as 'whiskers' in the boxplot. Calculate these values as follows: lower whisker = 74.5 - 1.5*112.5 = -94.25, upper whisker = 187 + 1.5*112.5 = 355.75. Any point below -94.25 or above 355.75 is an outlier. Here we have one outlier, which is 511.
04

Construct the boxplot

Draw a number line that will cover the entire range of the data and mark the minimum, Q1, Q2, Q3, maximum and any outliers on the number line. Draw a box that ranges from Q1 to Q3 and draw a vertical line in the box at the median. Draw lines (or 'whiskers') from the box out to the minimum and maximum values that are not outliers, in this case, the whisker will stretch to the maximum number within the upper whisker limit (355.75). Mark the points that are considered outliers with a different symbol, in this case, the number 511.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Statistical Data Analysis

Statistical data analysis involves collecting, presenting, and interpreting data to make decisions. At its core, it includes summarizing data in a way that provides insight into the data set’s characteristics. When analyzing the aluminum contamination in plastic specimens, we begin by organizing the data to understand its distribution. This step is crucial as it helps us identify key metrics such as the median, quartiles, and outliers. These metrics provide a way to summarize complex data into a few informative numbers and detect patterns and anomalies.


The use of boxplots in the example given is an excellent method for visually summarizing a data set’s distribution. By understanding the principles of statistical data analysis, students can gain valuable insights into their data and make informed conclusions based on empirical evidence.

Interquartile Range (IQR)

The Interquartile Range (IQR) is a measure of variability that describes the middle 50% of values when ordered from lowest to highest. To calculate the IQR, we subtract the first quartile (Q1) from the third quartile (Q3), essentially the range between the 25th and 75th percentile. In the plastic specimen data, the IQR is found by subtracting 74.5 from 187, yielding an IQR of 112.5.


Importance of IQR

The IQR is less affected by extreme values or outliers, making it a more robust measure than the range. It helps us determine the spread of the majority of the data and is crucial for later stages of analysis, such as outlier detection. By understanding the IQR, students can better grasp the variation in their data and how spread out the central values are.

Data Visualization

Data visualization is the graphical representation of information and data. It uses visual elements like charts, graphs, and plots to provide an accessible way to see and understand trends, outliers, and patterns in data. In the context of constructing a boxplot, it serves as an intuitive way to display the distribution of a data set at a glance.


  • A boxplot clearly highlights the median, quartiles, and outliers.
  • It shows the spread and symmetry of the data.
  • It helps identify if the data is clustered around a central value or if there are distinct gaps.

A well-constructed boxplot, like the one for the aluminum contamination levels in plastic, allows a quick visual assessment of the data’s central tendency, variability, and skewness. It becomes a foundation for more in-depth statistical analysis and interpretation.

Outliers Detection

Outliers are data points that differ significantly from the rest of the data. Detecting outliers is vital because they can indicate data variability and lead to improvements in the data collection process or provide insights into unusual occurrences. In the boxplot for the plastic specimen data, outliers are determined using the IQR. By multiplying the IQR by 1.5 and adding this value to Q3 or subtracting it from Q1, we establish boundaries beyond which points are considered outliers.


Impact of Outliers

Outliers can affect the mean and standard deviation of a data set, leading to potentially misleading interpretations. A single outlier, such as the 511 parts per million aluminum contamination level, can highlight an issue worthy of further investigation. With this knowledge in hand, students can carefully scrutinize data points that fall outside expected ranges and decide how to handle them within their analyses.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

1 The San Luis Obispo Telegram-Tribune (October 1. 1994) reported the following monthly salaries for supervisors from six different counties: \(\$ 5354\) (Kern), \(\$ 5166\) (Monterey), \$4443 (Santa Cruz), \$4129 (Santa Barbara), \$2500 (Placer), and \$2220 (Merced). San Luis Obispo County supervisors are supposed to be paid the average of the two counties among these six in the middle of the salary range. Which measure of center determines this salary, and what is its value? Why is the other measure of center featured in this section not as favorable to these supervisors (although it might appeal to taxpayers)?

The standard deviation alone does not measure relative variation. For example, a standard deviation of \(\$ 1\) would be considered large if it is describing the variability from store to store in the price of an ice cube tray. On the other hand, a standard deviation of \(\$ 1\) would be considered small if it is describing store-to-store variability in the price of a particular brand of freezer. A quantity designed to give a relative measure of variahility is the \(\mathrm{co-}\) efficient of variation. Denoted by CV, the coefficient of variation expresses the standard deviation as a percentage of the mean. It is defined by the formula \(C V=100\left(\frac{s}{\bar{x}}\right)\). Consider two samples. Sample 1 gives the actual weight (in ounces) of the contents of cans of pet food labeled as having a net weight of 8 oz. Sample 2 gives the actual weight (in pounds) of the contents of bags of dry pet food labeled as having a net weight of \(50 \mathrm{lb}\). The weights for the two samples are: \(\begin{array}{lrrrrr}\text { Sample 1 } & 8.3 & 7.1 & 7.6 & 8.1 & 7.6 \\ & 8.3 & 8.2 & 7.7 & 7.7 & 7.5 \\ \text { Sample 2 } & 52.3 & 50.6 & 52.1 & 48.4 & 48.8 \\ & 47.0 & 50.4 & 50.3 & 48.7 & 48.2\end{array}\) a. For each of the given samples, calculate the mean and the standard deviation. b. Compute the coefficient of variation for each sample. Do the results surprise you? Why or why not?

A student took two national aptitude tests. The national average and standard deviation were 475 and 100 , respectively, for the first test and 30 and 8 , respectively, for the second test. The student scored 625 on the first test and 45 on the second test. Use \(z\) scores to determine on which exam the student performed better relative to the other test takers.

In 1997 a woman sued a computer keyboard manufacturer, charging that her repetitive stress injuries were caused by the keyboard (Genessey v. Digital Equipment Corporation). The jury awarded about \(\$ 3.5\) million for pain and suffering, but the court then set aside that award as being unreasonable compensation. In making this determination, the court identified a "normative" group of 27 similar cases and specified a reasonable award as one within 2 standard deviations of the mean of the awards in the 27 cases. The 27 award amounts were (in thousands of dollars) \(\begin{array}{rrrrrrrr}37 & 60 & 75 & 115 & 135 & 140 & 149 & 150 \\ 238 & 290 & 340 & 410 & 600 & 750 & 750 & 750 \\\ 1050 & 1100 & 1139 & 1150 & 1200 & 1200 & 1250 & 1576 \\ 1700 & 1825 & 2000 & & & & & \end{array}\) What is the maximum possible amount that could be awarded under the "2-standard deviations rule"?

The article "Taxable Wealth and Alcoholic Beverage Consumption in the United States" (Psychological Reports [1994]: \(813-814\) ) reported that the mean annual adult consumption of wine was \(3.15\) gal and that the standard deviation was \(6.09\) gal. Would you use the Empirical Rule to approximate the proportion of adults who consume more than \(9.24\) gal (i.e., the proportion of adults whose consumption value exceeds the mean by more than 1 standard deviation)? Explain your reasoning.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.