Chapter 10 Inference Procedures and Type I, Type II Error Classroom Ex.
Name:______________________________________________ Period:___________________
1. **This Wine Stinks** Sulfur compounds cause “off-odors” in wine, so winemakers want to know the odor threshold, the lowest concentration of a compound that the human nose can detect. The odor threshold for dimethyl sulfide (DMS) in trained wine tasters is about 25 micrograms per liter of wine (µg/l). The untrained noses of consumers may be less sensitive, however. Here are the DMS odor thresholds for 10 untrained students.
31 31 43 36 23 34 32 30 20 24
Assume that the standard deviation of the odor threshold for untrained noses is known to be σ = 7 µg/l.
a) Make a stemplot to verify that the distribution of the sample is roughly symmetric with no outliers. This will give you some faith that it was drawn from a reasonably normal population. (More data would confirm that there are no meaningful departures from normality).
If you did a stemplot, that would reveal no strong non-normality in the sample, which suggests the population (and thus the sampling distribution) is also plausibly normal. It’s about satisfying the conditions. You could also run an NPP on the sample, or a histogram. Both would reveal that there’s no strong non-normality of which to be cautious.
b) Give a 95% confidence interval for the mean DMA odor threshold among all students. (Toolbox!!)
Step 1: Population: Untrained wine tasting students
Parameter: Mean DMS threshold, µ
Step 2: Procedure: 1-Sample Confidence Interval, known σ
Conditions: SRS? We must assume these 10 students are a reasonably random sample drawn from all wine tasting students
Step 3: x-bar ± z*(σ/√n)
30.4 ± 1.96 (7/√10)
(26.06, 34.74)
Step 4: We are 95% confident, based on this sample, that the mean DMS threshold for wine students is between 26.04 and 34.74 µg/l.
d) Are you convinced that the mean odor threshold for students is higher than the published threshold, 25 µg/l? Carry out a significance test to justify your answer. (Another toolbox . If this toolbox has elements in common with part b), you may say, “See above.”
Step 1: Population: See above
Parameter: See above
Ho: The mean DMS level is as quoted, 25 µg/l (µ = 25)
Ha: The mean DMS level is higher than 25 µg/l (µ > 25)
Step 2: Procedure: 1-Sample z-test for means
Conditions: See above
Step 3: z = (x-bar - µ_{o})/(σ/√n) (INCLUDE DRAWING OF A STANDARD NORMAL CURVE)
=(30.4 – 25)/(7/√10) P(Z ≥2.44) = .00735. Because the p-value is so small, reject Ho.
=2.44.
Step 4: Evidence from this sample suggests that the DMS threshold among wine students is statistically significantly greater than 25 µg/l
Step 4: We are 95% confident, based on this sample, that the mean DMS threshold for wine students is between 26.04 and 34.74 µg/l.
2. **Explaining Statistical Significance** When asked to explain the meaning of “the P-value was P = .03,” a student says, “This means there only a probability of .03 that the null hypothesis is true.”
Yikes! This is horrible. By definition, a p-value is the probability of finding a statistic as extreme or more extreme than what you found, __assuming__ the null hypothesis were true and sampling were random. We’ve done no research whatsoever to do with the null, so making any claim about its likelihood is inappropriate.
a) Is this an essentially correct explanation? Explain your answer.
This actually isn’t bad. It appropriately recognizes the role chance plays in assessing results, and speaks to the definition of “statistically significant” which what we call a result so ostensibly rare that it should only occur rarely if chance alone were at work.
b) Another student, when asked why statistical significance appears so often in research reports, says, “Because saying that results are significant tells us that they cannot easily be explained by chance alone.” Do you think that this statement is essentially correct?
c) What might you add to the statement in Part b) to make it even better?
I’d add, “…assuming the null hypothesis were true to begin with and that poor randomization was not the cause of the finding.”
3. **Opening a Restaurant** You are thinking about opening a restaurant and are searching for a good location. From research you have done, you know that the mean income of those living near the restaurant must be over $45,000 to support the type of upscale restaurant you wish to open. You decide to take a simple random sample of 50 people living near one potential location. Based on the mean income of this sample, you will decide whether to open a restaurant there. A number of similar studies have shown that σ = $5000.
a) Describe the two types of errors that you might make. Identify which is a Type 1 error and which is a
Type 2 error.
A Type I error would be opening a restaurant, only to watch it fail
A Type 2 error would be not opening the restaurant, then later finding out that the location would have been very successful, maybe because another restaurant opened up instead and is doing very well.
A type 1 error costs you money in this setting. A Type 2 error may be disappointing in terms of lost opportunity, but in real dollars hasn’t cost anything.
b) Which of the two types of error is most serious. Explain!
This question really should come before part a). The Null would be, “A restaurant put here will not succeed.” The Alternate is that it will succeed.
c) State the null and alternate hypotheses.
Tough question! Because of the money involved (and the risk of losing it), you could argue for a .01 level. On the other hand, you may not mind a little risk because of the potential upside to the investment. That would argue for a .1, maybe. This is why the idea of a significance level (an α level) may not be that useful. Look at the P-value and make your decision based on how you feel about the p-value in the context of the research.
d) If you had to choose one of the “standard” significance levels for your significance test, would you chose α = .01, .05, or .10? Justify your choice.
Assume α = .05 x-bar * = invnormal(.95, 45000, 5000/√50) = 46163.1
e) Based on your choice in part d), how high will the sample mean need to be before you decide to open a restaurant in that area? (Hint: This is finding x-bar *--the critical value for the test—the line in the sand beyond which you would consider a sample average income to be sufficiently high to justify opening the restaurant.
Step 1: Population: Households in the area around the restaurant site.
Parameter: Mean household income, mu
Ho: Mean household income is $45,000 (mu = 45000)
Ha: Mean household income is greater than $45,000 (mu > 45000)
Step 2: Procedure: One-sample z-test for means, sigma known
Conditions: SRS? We must assume these 50 households are an SRS drawn from all households in this area.
Normality of the sampling distribution: With a sample this large, the x-bar distribution is safely approximately normal.
Step 3: (Drawing, P-statement, etc.) Because the sample mean (46200) is greater than the calculated x-bar* (46163), the p-value must be less than .05. Reject Ho.
Step 4: Evidence from this sample provides significant evidence that the household income average in this neighborhood is greater than $45000. It appears safe to open the restaurant.
f) You take a sample of 50 households and find the mean to be $46200. Should you open the restaurant? Using all of your stats superpowers, justifying your decision, such as running a hypothesis test.
Add Sheets as necessary to produce a professional product. Word Processing is authorized but not required. Make it good. Classroom presenters will be chosen by random selection on my return .
Thanks for being my favorite course!
JFM
**Share with your friends:** |