/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 24 A randomly chosen group of 20,00... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

A randomly chosen group of 20,000 nonsmokers and one of 10,000 smokers were followed over a 10 -year period. The data of Table \(5.34\) relate the numbers of them that developed lung cancer during the period. Test the hypothesis that smoking and lung cancer are independent. Use the \(1 \%\) level of significance. $$ \begin{array}{l} \text { Table 5.34 Data for Problem } 5.24\\\ \begin{array}{l|l|l} \hline & \text { Smokers } & \text { Nonsmokers } \\ \hline \text { Lung cancer } & 62 & 14 \\ \hline \text { No Lung cancer } & 9938 & 19986 \\ \hline \end{array} \end{array} $$

Short Answer

Expert verified
Based on Chi-Square test of independence, it's concluded that smoking and lung cancer are not independent at a 1% level of significance.

Step by step solution

01

Obtain Observed Frequencies

The observed frequencies from table \(5.34\) are: \(O_{11} = 62\) (smokers with lung cancer), \(O_{12} = 9938\) (smokers without lung cancer), \(O_{21} = 14\) (non-smokers with lung cancer) and \(O_{22} = 19986\) (non-smokers without lung cancer).
02

Calculate Row and Column Totals

Calculate the row and column totals: \(R1 = O_{11} + O_{12} = 10000\) (total smokers), \(R2 = O_{21} + O_{22} = 20000\) (total non-smokers), \(C1 = O_{11} + O_{21} = 76\) (total with lung cancer), and \(C2 = O_{12} + O_{22} = 29924\) (total without lung cancer). Also calculate the grand total \(N = R1 + R2 = 30000\).
03

Calculate Expected Frequencies

Given that the row and column totals are fixed, the expected frequencies under the null hypothesis (independence) are given by: \(E_{ij} = (Ri * Cj) / N\). Hence, \(E_{11} = (R1*C1)/N = 25.33\), \(E_{12} = (R1*C2)/N = 9974.67\), \(E_{21} = (R2*C1)/N = 50.67\) and \(E_{22} = (R2*C2)/N = 19949.33\).
04

Calculate Chi-Square Statistic

The Chi-Square statistic is given by: \(X^2 = \sum (O_{ij} - E_{ij})^2 / E_{ij}\). Hence, \(X^2 = (62-25.33)^2/25.33 + (9938-9974.67)^2/9974.67 + (14-50.67)^2/50.67 + (19986-19949.33)^2/19949.33 = 43.08\).
05

Find Chi-Square Critical Value and Decision

The critical value of \(X^2\) at \(1\%\) level of significance and \(1\) degree of freedom is \(6.635\). Since the calculated statistic (\(43.08\)) is greater than the critical value, we reject the null hypothesis. Therefore, there is enough evidence to say smoking and lung cancer are not independent.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Observed Frequencies
In a Chi-Square Test of Independence, observed frequencies are the actual counts collected from your data. They represent what you find in reality. For the lung cancer study in smokers and nonsmokers:
  • \(O_{11} = 62\), which stands for 62 smokers who developed lung cancer.
  • \(O_{12} = 9938\), representing smokers without lung cancer.
  • \(O_{21} = 14\), illustrating nonsmokers with lung cancer.
  • \(O_{22} = 19986\), indicating nonsmokers without lung cancer.
This step is essentially the foundation of your statistical analysis.
Observing these frequencies helps in comparing what actually happened against what was expected under an assumption of independence.
Expected Frequencies
Expected frequencies are predicted counts you calculate if the null hypothesis, stating there's no relationship between the variables, is true. It's like "what you would anticipate" to see. To find expected frequencies, follow these simple steps:
  • First, compute total counts for rows and columns. For instance, total smokers (\(R1 = 10000\)), total nonsmokers (\(R2 = 20000\)), will have certain expected cancer/non-cancer cases.
  • Total cases of lung cancer and no lung cancer are summed for both groups. For lung cancer, that’s 76 persons (\(C1\)), and for no lung cancer, it's 29924 (\(C2\)).
  • Now, apply the formula \(E_{ij} = (R_i \times C_j) / N\), where \(N\) is the grand total of the study subjects, which is 30000 in this case.
This results in expected values such as \(E_{11} = 25.33\) for smokers with lung cancer, representing what you hypothesized would occur if smoking didn’t affect lung cancer development.
Chi-Square Statistic
The Chi-Square statistic helps determine if what you observe in your sample is significantly different from what you’d expect. You calculate it as:
  • Take each observed frequency and subtract the corresponding expected frequency.
  • Square the result to avoid negative numbers.
  • Divide by the expected frequency to gauge the relative difference.
  • Sum up all these values across categories.
For this lung cancer data, the Chi-Square statistic, \(X^2 = 43.08\), was calculated using these steps. A notably high statistic suggests a stronger signal that the variables (like smoking and lung cancer) may not be independent.
Level of Significance
The level of significance (\(\alpha\)) is the probability threshold used to decide whether the observed effect exists. It's the cut-off point for determining whether to reject the null hypothesis.
  • In most studies, typical levels include 0.05 or 0.01. Here, it's 0.01, meaning there's only a 1% chance that you’d reject the null hypothesis by pure random chance if smoking and lung cancer truly were independent.
  • To conclude your test, compare the Chi-Square statistic against a critical value derived from Chi-Square distribution tables, specific to your level of significance and degrees of freedom.
  • In this scenario, the critical value at a 1% significance level with one degree of freedom is 6.635.
Since the calculated Chi-Square statistic of 43.08 exceeds the critical value of 6.635, we reject the null hypothesis. This implies strong evidence that smoking is related to lung cancer occurrence, beyond random chance alone.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

In a certain chemical process, it is very important that a particular solution that is to be used as a reactant has a pH of exactly \(8.20 .\) A method for determining \(\mathrm{pH}\) that is available for solutions of this type is known to give measurements that are normally distributed with a mean equal to the actual \(\mathrm{pH}\) and with a standard deviation of .02. Suppose ten independent measurements yielded the following \(p H\) values: \(8.18,8.17,8.16,8.15,8.17,8.21,8.22,8.16,8.19,8.18\) 1\. What conclusion can be drawn at the \(\alpha=0.10\) level of significance? 2\. What about at the \(\alpha=0.05\) level of significance?

There has been a great deal of controversy in recent years over the possible dangers of living near a high-level electromagnetic field (EMF). After hearing many anecdotal tales of large increase among children living near EMF, one researcher decided to study the possible dangers. In order to do his study, he followed following steps: (a) studied maps to find the locations of electric power lines, (b) used these maps to select a fairly large community that was located in a high-level EMF area. He interviews people in the local schools, hospitals, and public health facilities in order to discover the number of children who had been affected by any type of cancer in the previous 3 years. He found that there had been 32 such cases. According to government public health committee, the average number of cases of childhood cancer over a 3-year period in such a community was \(16.2\), with a standard deviation of 4.7. Is the discovery of 32 cases of childhood cancers significantly large enough, in comparison with the average number of \(16.2\), for the researcher to conclude that there is some special factor in the community being studied that increases the chance for children to contract cancer? Or is it possible that there is nothing special about the community and that the greater number of cancers is solely due to chance?

Let \(X_{1}, X_{2}, \ldots, X_{n}\) be a random sample from a distribution with the following PDF $$ f(x ; \theta)=\left\\{\begin{array}{l} \theta x^{\theta-1}, \quad 00\). Find a sufficient statistics for \(\theta\) and show that a uniformly most powerful test of \(H_{0}: \theta=6\) against \(H_{1}: \theta<6\) is based on this statistic.

In a study of the effect of two treatments on the survival of patients with a certain disease, each of the 156 patients was equally likely to be given either one of the two treatments. The result of the above was that 39 of the 72 patients given the first treatment survived and 44 of the 84 patients given the second treatment survived. Test the null hypothesis that the two treatments are equally effective at \(\alpha=0.05\) level of significance.

Random samples of the yields from the usage of two different brands of fertilizers produced the following results: \(n_{1}=10, \bar{X}=90.13, s_{1}^{2}=4.02 ; n_{2}=10, \bar{Y}=\) \(92.70, s_{2}^{2}=3.98 .\) Test at 1 and \(5 \%\) level of significance whether the difference between the two sample means is significant.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.