/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Q25 E The accompanying data consists o... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

The accompanying data consists of prices (\$) for one sample of California cabernet sauvignon wines that received ratings of 93 or higher in the May 2013 issue of Wine Spectator and another sample of California cabernets that received ratings of 89 or lower in the same issue.

\(\begin{array}{*{20}{c}}{ \ge 93:}&{100}&{100}&{60}&{135}&{195}&{195}&{}\\{}&{125}&{135}&{95}&{42}&{75}&{72}&{}\\{ \le 89:}&{80}&{75}&{75}&{85}&{75}&{35}&{85}\\{}&{65}&{45}&{100}&{28}&{38}&{50}&{28}\end{array}\)

Assume that these are both random samples of prices from the population of all wines recently reviewed that received ratings of at least 93 and at most 89 , respectively.

a. Investigate the plausibility of assuming that both sampled populations are normal.

b. Construct a comparative boxplot. What does it suggest about the difference in true average prices?

c. Calculate a confidence interval at the\(95\% \)confidence level to estimate the difference between\({\mu _1}\), the mean price in the higher rating population, and\({\mu _2}\), the mean price in the lower rating population. Is the interval consistent with the statement "Price rarely equates to quality" made by a columnist in the cited issue of the magazine?

Short Answer

Expert verified

(a) Plausible

(b) A large difference .

(c) \((16.1180,81.9534)\)

The interval is not consistent with the statement.

Step by step solution

01

a)Step 1: Determine the normal probability plot

Given:

\(\begin{array}{l} \ge 93:100,100,60,135,195,195,125,135,95,42,75,72\\ \le 89:80,75,75,85,75,35,85,65,45,100,28,38,50,28\end{array}\)

If we want to perform a two-sample\(t\)test, then we require that both sampling distributions of the sample mean are approximately normal.

FIRST DATA SET

We will create a normal probability plot.

The data values are on the horizontal axis and the standardized normal scores are on the vertical axis.

If the data contains\(n\)data values, then the standardized normal scores are the z-scores in the normal probability table of the appendix corresponding to an area of\(\frac{{j - 0.5}}{n}\)(or the closest area) with\(j \in \{ 1,2,3, \ldots ,n\} \).

The smallest standardized score corresponds with the smallest data value, the second smallest standardized score corresponds with the second smallest data value, and so on.

02

b)Step 2: Determine the normal probability plot

\(\begin{array}{l}{{\bar x}_1} = 13.4\\{{\bar x}_2} = 9.7\\{n_1} = 65\\{n_2} = 50\\{\sigma _{{{\bar x}_1}}} = 2.05 \Rightarrow {s_1} = {\sigma _{{{\bar x}_1}}}\sqrt n = 2.05\sqrt {65} \approx 16.5276\\{\sigma _{{{\bar x}_2}}} = 1.76 \Rightarrow {s_2} = {\sigma _{{{\bar x}_2}}}\sqrt n = 1.76\sqrt {50} \approx 12.4451\end{array}\)

Let us assume: \(\alpha = 0.05\)

Given claim: exceeds

The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis and the alternative hypothesis state the opposite of each other. The null hypothesis needs to contain the value mentioned in the claim.

\(\begin{array}{l}{H_0}:{\mu _1} = {\mu _2}\\{H_a}:{\mu _1} > {\mu _2}\end{array}\)

SECOND DATA SET

We will create a normal probability plot.

The data values are on the horizontal axis and the standardized normal scores are on the vertical axis.

If the data contains \(n\) data values, then the standardized normal scores are the z-scores in the normal probability table of the appendix corresponding to an area of \(\frac{{j - 0.5}}{n}\) (or the closest area) with\(j \in \{ 1,2,3, \ldots ,n\} \).

The smallest standardized score corresponds with the smallest data value, the second smallest standardized score corresponds with the second smallest data value, and so on.

If the pattern in the normal probability plot is roughly linear and does not contain strong curvature, then the population distribution is approximately normal.

Both probability plots do not contain strong curvature and are roughly linear, thus both population distributions are approximately normal.

Since the population distributions are approximately normal, the sampling distribution of the sample mean(s) \(\bar x\) are also approximately normal. and thus it is appropriate to use the two-sample\(t\) test.

03

B)Step 3: Fild the quartile for first data set

Given:

\(\begin{array}{l} \ge 93:100,100,60,135,195,195,125,135,95,42,75,72\\ \le 89:80,75,75,85,75,35,85,65,45,100,28,38,50,28\end{array}\)

Sort the data values from smallest to largest:

\(\begin{array}{l} \ge 93:42,60,72,75,95,100,100,125,135,135,195,195\\ \le 89:28,28,35,38,45,50,65,75,75,75,80,85,85,100\end{array}\)

FIRST DATA SET

The minimum is \(42.\)

Since the number of data values is even, the median is the average of the two middle values of the sorted data set:

\(M = {Q_2} = \frac{{100 + 100}}{2} = 100\)

The first quartile is the median of the data values below the median (or at \(25\% \) of the data):

\({Q_1} = \frac{{72 + 75}}{2} = 73.5\)

The third quartile is the median of the data values above the median (or at \(75\% \) of the data):

\({Q_3} = \frac{{135 + 135}}{2} = 135\)

The maximum is \(195.\)

04

Find the quartile for second data set

SECOND DATA SET

The minimum is\(28.\)

Since the number of data values is even, the median is the average of the two middle values of the sorted data set:

\(M = {Q_2} = \frac{{65 + 75}}{2} = 70\)

The first quartile is the median of the data values below the median (or at\(25\% \)of the data):

\({Q_1} = 38\)

The third quartile is the median of the data values above the median (or at\(75\% \)of the data):

\({Q_3} = 80\)

The maximum is \(100\) .

05

Mapping the graph

The whiskers of the boxplot are at the minimum and maximum value. The box starts at the first quartile, ends at the third quartile and has a vertical line at the median.

The first quartile is at \(25\% \) of the sorted data list, the median at \(50\% \) and the third quartile at\(75\% \).

There appears to be a large difference between the true average prices, because the vertical lines corresponding to the median in the box of th boxplots lie are not roughly at the same location (on the horizontal axis)

06

c)Step 6: Determine the standard deviation

Given:

\(\begin{array}{l} \ge 93:100,100,60,135,195,195,125,135,95,42,75,72\\ \le 89:80,75,75,85,75,35,85,65,45,100,28,38,50,28\end{array}\)

The mean is the sum of all values divided by the number of values:

\(\begin{array}{l}{{\bar x}_1} = \frac{{100 + 100 + 60 + \ldots + 42 + 75 + 72}}{{12}} = 110.75\\{{\bar x}_2} = \frac{{80 + 75 + 75 + \ldots + 38 + 50 + 28}}{{14}} \approx 61.7143\end{array}\)

The variance is the sum of squared deviations from the mean divided by\(n - 1\). The standard deviation is the square root of the variance: \(\begin{array}{l}{s_1} = \sqrt {\frac{{{{(100 - 110.75)}^2} + \ldots . + {{(72 - 110.75)}^2}}}{{12 - 1}}} \approx 48.7445\\{s_2} = \sqrt {\frac{{{{(80 - 61.7143)}^2} + \ldots . + {{(28 - 61.7143)}^2}}}{{14 - 1}}} \approx 23.8438\end{array}\)

07

Find the endpoint of the confidence interval

Given:

\(c = 95\% = 0.95\)

Determine the degrees of freedom (rounded down to the nearest integer):

\(\Delta = \frac{{{{\left( {\frac{{s_1^2}}{{{n_1}}} + \frac{{s_2^2}}{{{n_2}}}} \right)}^2}}}{{\frac{{{{\left( {s_1^2/{n_1}} \right)}^2}}}{{{n_1} - 1}} + \frac{{{{\left( {s_2^2/{n_2}} \right)}^2}}}{{{n_2} - 1}}}} = \frac{{{{\left( {\frac{{{{48.7445}^2}}}{{12}} + \frac{{{{23.8438}^2}}}{{14}}} \right)}^2}}}{{\frac{{{{\left( {{{48.7445}^2}/12} \right)}^2}}}{{12 - 1}} + \frac{{{{\left( {{{23.8438}^2}/14} \right)}^2}}}{{14 - 1}}}} \approx 15\)

Determine the t-value by looking in the row starting with degrees of freedom \(df = 15\) and in the column with \(1 - c/2 = 0.025\) in the Student's t distribution table in the appendix:

\({t_{\alpha /2}} = 2.131\)

The margin of error is then:

\(E = {t_{\alpha /2}} \cdot \sqrt {\frac{{s_1^2}}{{{n_1}}} + \frac{{s_2^2}}{{{n_2}}}} = 2.131 \cdot \sqrt {\frac{{{{48.7445}^2}}}{{12}} + \frac{{{{23.8438}^2}}}{{14}}} \approx 32.9177\)

The endpoints of the confidence interval for \({\mu _1} - {\mu _2}\) are: \(\begin{array}{l}\left( {{{\bar x}_1} - {{\bar x}_2}} \right) - E = (110.75 - 61.7143) - 32.9177 = 49.0357 - 32.9177 = 16.1180\\\left( {{{\bar x}_1} - {{\bar x}_2}} \right) + E = (110.75 - 61.7143) + 32.9177 = 49.0357 + 32.9177 = 81.9534\end{array}\)

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Using the traditional formula, a \(95\% \) CI for \({p_1} - {p_2}\)is to be constructed based on equal sample sizes from the two populations. For what value of \(n( = m)\)will the resulting interval have a width at most of .1, irrespective of the results of the sampling?

Shoveling is not exactly a high-tech activity, but it will continue to be a required task even in our information age. The article "A Shovel with a Perforated Blade Reduces Energy Expenditure Required for Digging Wet Clay" (Human Factors, 2010: 492-502) reported on an experiment in which 13 workers were each provided with both a conventional shovel and a shovel whose blade was perforated with small holes. The authors of the cited article provided the following data on stable energy expenditure ((kcal/kg(subject)//b(clay)):

Worker : 1 2 3 4

Conventional : .0011 .0014 .0018 .0022

Perforated : .0011 .0010 .0019 .0013

Worker: 5 6 7

Conventional : .0010 .0016 .0028

Perforated : .0011 .0017 .0024

Worker 8 9 10

Conventional : .0020 .0015 .0014

Perforated : .0020 .0013 .0013

Worker: 11 12 13

Conventional : .0023 .0017 .0020

Perforated : .0017 .0015 .0013

a. Calculate a confidence interval at the 95 % confidence level for the true average difference between energy expenditure for the conventional shovel and the perforated shovel (the relevant normal probability plot shows a reasonably linear pattern). Based on this interval, does it appear that the shovels differ with respect to true average energy expenditure? Explain.

b. Carry out a test of hypotheses at significance level .05 to see if true average energy expenditure using the conventional shovel exceeds that using the perforated shovel.

The article "Urban Battery Litter" cited in Example 8.14 gave the following summary data on zinc mass (g) for two different brands of size D batteries:

Brand Sample Size Sample Mean Sample SD

Duracell 15 138.52 7.76

Energizer 20 149.07 1.52

Assuming that both zinc mass distributions are at least approximately normal, carry out a test at significance level .05 to decide whether true average zinc mass is different for the two types of batteries.

Cushing's disease is characterized by muscular weakness due to adrenal or pituitary dysfunction. To provide effective treatment, it is important to detect childhood Cushing's disease as early as possible. Age at onset of symptoms and age at diagnosis (months) for 15 children suffering from the disease were given in the article "Treatment of Cushing's Disease in Childhood and Adolescence by Transphenoidal Microadenomectomy" (New Engl. J. of Med., 1984: 889). Here are the values of the differences between age at onset of symptoms and age at diagnosis:

\(\begin{array}{*{20}{l}}{ - 24}&{ - 12}&{ - 55}&{ - 15}&{ - 30}&{ - 60}&{ - 14}&{ - 21}\\{ - 48}&{ - 12}&{ - 25}&{ - 53}&{ - 61}&{ - 69}&{ - 80}&{}\end{array}\)

a. Does the accompanying normal probability plot cast strong doubt on the approximate normality of the population distribution of differences?

b. Calculate a lower\(95\% \)confidence bound for the population mean difference, and interpret the resulting bound.

c. Suppose the (age at diagnosis) - (age at onset) differences had been calculated. What would be a\(95\backslash \% \)upper confidence bound for the corresponding population mean difference?

The invasive diatom species Didymosphenia geminata has the potential to inflict substantial ecological and economic damage in rivers. The article "Substrate Characteristics Affect Colonization by the BloomForming Diatom Didymosphenia geminata" (Acquatic Ecology, 2010:\(33 - 40\)) described an investigation of colonization behavior. One aspect of particular interest was whether the roughness of stones impacted the degree of colonization. The authors of the cited article kindly provided the accompanying data on roughness ratio (dimensionless) for specimens of sandstone and shale.

\(\begin{array}{*{20}{l}}{ Sandstone: }&{5.74}&{2.07}&{3.29}&{0.75}&{1.23}\\{}&{2.95}&{1.58}&{1.83}&{1.61}&{1.12}\\{}&{2.91}&{3.22}&{2.84}&{1.97}&{2.48}\\{}&{3.45}&{2.17}&{0.77}&{1.44}&{3.79}\end{array}\)

\(\begin{array}{*{20}{l}}{ Shale: }&{.56}&{.84}&{.40}&{.55}&{.36}&{.72}\\{}&{.29}&{.47}&{.66}&{.48}&{.28}&{}\\{}&{.72}&{.31}&{.35}&{.32}&{.37}&{.43}\\{}&{.60}&{.54}&{.43}&{.51}&{}&{}\end{array}\)

Normal probability plots of both samples show a reasonably linear pattern. Estimate the difference between true average roughness for sandstone and that for shale in a way that provides information about reliability and precision, and interpret your estimate. Does it appear that true average roughness differs for the two types of rocks (a formal test of this was reported in the article)? (Note: The investigators concluded that more diatoms colonized the rougher surface than the smoother surface.)

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.