/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 24 The following data set lists the... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

The following data set lists the number of women from each of 10 different countries who were on the Rolex Women's World Golf Rankings Top 25 list as of March 31,2009 . The data, entered in that order, are for the following countries: Australia, Brazil, England, Japan, Korea, Mexico, Norway, Sweden, Taiwan, and United States. \(\begin{array}{lllllllll}2 & 1 & 1 & 2 & 9 & 1 & 1 & 2 & 2 & 4\end{array}\) a. Calculate the mean and median for these data. b. Identify the outlier in this data set. Drop the outlier and recalculate the mean and median. Which of these two summary measures changes by a larger amount when you drop the outlier? c. Which is the better summary measure for these data, the mean or the median? Explain.

Short Answer

Expert verified
The mean and median of the data are 2.5 and 2.0 respectively. After identifying and dropping the outlier 9, the new mean and median are 1.7 and 2.0 respectively. Therefore, the mean changed by a larger amount when dropping the outlier. The better summary measure for this data is the median because it is less sensitive to outliers.

Step by step solution

01

Calculation of mean and median

The mean is calculated by summing all the values and dividing the result by the number of data. The median is the middle number in a sorted, ascending or descending, list of numbers. Given this data set: \([2, 1, 1, 2, 9, 1, 1, 2, 2, 4]\). Sum of data = \(2 + 1 + 1 + 2 + 9 + 1 + 1 + 2 + 2 + 4 = 25\) and number of data = 10. Mean = \(25 / 10 = 2.5\) . For median, ordering the data gives \([1, 1, 1, 1, 2, 2, 2, 2, 4, 9]\). The median is the average of the 5th and 6th values, so Median = \( (2+2)/2 = 2.0\) .
02

Identification of outlier and recalculation

The outlier in this data set is 9, from Korea. This is clearly larger than the rest of the data points.After dropping the outlier, the data set becomes \([1, 1, 1, 2, 2, 1, 1, 2, 2, 4]\). The new sum = \(1 + 1 + 1 + 2 + 2 + 1 + 1 + 2 + 2 + 4 = 17\) and the count = 10. So the new mean = \(17 / 10 = 1.7\) . Organising the data we get \([1, 1, 1, 1, 2, 2, 2, 2, 4]\). The median remains the same as before which is 2.0.
03

Comparing change in mean and median

Comparing the change in the mean and median after dropping the outlier, the mean changes by \(2.5 - 1.7 = 0.8\) while the median remains the same. Therefore, we can see that the mean is more heavily influenced by the outlier.
04

Choosing better measure

Considering the outlier and its effect on the mean, it would be more appropriate to use the median as the measure of central tendency for this data. The median is less sensitive to extreme values (outliers), and hence provides a better summary measure for this data.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Mean Calculation
Calculating the mean, or average, is one of the foundational tasks in data analysis.
It gives a central value for a data set. To find it, sum up all the numbers and then divide by how many numbers there are.
  • For our golf ranking data: \[\frac{2 + 1 + 1 + 2 + 9 + 1 + 1 + 2 + 2 + 4}{10} = \frac{25}{10} = 2.5\]
The mean tells us what each value would be if they were all the same. It's a useful measure but can be heavily influenced by outliers. If there's a very high or very low value compared to the others, the mean can become skewed.
This is why understanding the mean's limitations is also important when analyzing data.
Median Calculation
The median is a useful way to find the "middle" of a data set.
Unlike the mean, it does not get skewed by extremely high or low values, making it a robust measure of central tendency.
To calculate it:
  • Sort the data set in order: \[ [1, 1, 1, 1, 2, 2, 2, 2, 4, 9] \]
  • Find the middle number(s).
  • With an even number of observations, take the average of the two middle values.
  • In our example: \[\frac{2+2}{2} = 2.0\]
The median is an excellent representation of a data set's central point, especially when the data contains outliers.
Outlier Analysis
Outliers are values that differ significantly from other observations in a data set.
They can lead to misinterpretations if not handled properly.
In our data, the number 9 is much higher than the others, making it an outlier.
When analyzing data:
  • Check for any values that do not appear to fit the pattern of the rest.
  • Outliers can skew the mean.
  • They often point to special circumstances worth further investigation.
Removing outliers can sometimes provide a clearer view of the data, but understanding their origin is also crucial.
Central Tendency
Measures of central tendency are statistical tools that describe a central value for a data set.
They summarize data with a single number that represents the "center" of the data.
Common measures include mean, median, and mode.
  • The mean is greatly influenced by all data points.
  • The median is less sensitive to outliers, providing a more stable measure.
  • The mode (not covered here explicitly) shows the most frequently occurring value.
Choosing the right measure depends on the data's distribution and the presence of outliers.
For asymmetric data or data with outliers, the median is often preferred.
Data Set Analysis
Analyzing a data set involves understanding its distribution and identifying key characteristics like central tendencies and outliers.
The overall goal is to derive insights from the data that inform decisions or contribute to a deeper understanding.
  • Is the data symmetric or skewed?
  • Which values are most common?
  • Are there outliers that need handling?
For our golf ranking data, we examined how the presence of an outlier affected the mean more than the median.
The process of data set analysis helps in choosing the appropriate statistical measures to accurately represent the data.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Nixon Corporation manufactures computer monitors. The following data give the numbers of computer monitors produced at the company for a sample of 30 days. \(\begin{array}{llllllllll}24 & 32 & 27 & 23 & 33 & 33 & 29 & 25 & 23 & 36 \\\ 26 & 26 & 31 & 20 & 27 & 33 & 27 & 23 & 28 & 29 \\ 31 & 35 & 34 & 22 & 37 & 28 & 23 & 35 & 31 & 43\end{array}\) a. Calculate the values of the three quartiles and the interquartile range. Where does the value of 31 lie in relation to these quartiles? b. Find the (approximate) value of the 65 th percentile. Give a brief interpretation of this percentile. c. For what percentage of the days was the number of computer monitors produced 32 or higher? Answer by finding the percentile rank of 32 .

The following data are the ages (in years) of six students. \(\begin{array}{llllll}19 & 19 & 19 & 19 & 19 & 19\end{array}\) Calculate the standard deviation. Is its value zero? If yes, why?

According to Fair Isaac, "The Median FICO (Credit) Score in the U.S. is 723" (The Credit Scoring Site, 2009). Suppose the following data represent the credit scores of 22 randomly selected loan applicants. \(\begin{array}{lllllllllll}494 & 728 & 468 & 533 & 747 & 639 & 430 & 690 & 604 & 422 & 356 \\ 805 & 749 & 600 & 797 & 702 & 628 & 625 & 617 & 647 & 772 & 572\end{array}\) a. Calculate the values of the three quartiles and the interquartile range. Where does the value 617 fall in relation to these quartiles? b. Find the approximate value of the 30 th percentile. Give a brief interpretation of this percentile. c. Calculate the percentile rank of 533 . Give a brief interpretation of this percentile rank.

Suppose the average credit card debt for households currently is \(\$ 9500\) with a standard deviation of \(\$ 2600\). a. Using Chebyshev's theorem, find at least what percentage of current credit card debts for all households are between i. \(\$ 4300\) and \(\$ 14,700\) ii. \(\$ 3000\) and \(\$ 16,000\) :b. Using Chebyshev's theorem, find the interval that contains credit card debts of at least \(89 \%\) of all households.

The 2009 gross sales of all companies in a large city have a mean of \(\$ 2.3\) million and a standard deviation of \(\$ .6\) million. Using Chebyshev's theorem, find at least what percentage of companies in this city had 2009 gross sales of a. \(\$ 1.1\) to \(\$ 3.5\) million b. \(\$ .8\) to \(\$ 3.8\) million c. \(\$ .5\) to \(\$ 4,1\) million

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.