- IQR = Q3 − Q1
- Outlier if: x < Q1 − 1.5·IQR OR x > Q3 + 1.5·IQR
- Mean is NOT resistant to outliers; Median IS resistant
(a) Calculate the IQR.
(b) Determine whether the maximum value of 67 is an outlier. Show your work using the 1.5 × IQR rule.
- SOCS: Shape, Outliers, Center, Spread — always in context
- Right-skewed → mean pulled right → mean > median
- Left-skewed → mean pulled left → mean < median
1h: ● ●
2h: ● ● ● ●
3h: ● ● ● ● ● ●
4h: ● ● ● ● ●
5h: ● ● ●
9h: ●
Describe the distribution of study hours completely. Be sure to address shape, center, spread, and outliers in context.
- z = (x − μ) / σ
- 68% within ±1σ, 95% within ±2σ, 99.7% within ±3σ
- P(X < x) = normalcdf(−∞, z) on calculator
(a) What is the z-score of a student who scored 4.4?
(b) What proportion of students scored between 2.4 and 4.0? Use the 68-95-99.7 rule and show your reasoning.
- ŷ = a + bx (ŷ = predicted value)
- b = r · (Sy / Sx)
- a = ȳ − b · x̄
- LSRL passes through (x̄, ȳ)
| Statistic | Sleep (hrs) | Reaction Time (ms) |
|---|---|---|
| Mean (x̄, ȳ) | 7.2 | 312 |
| Std Dev (Sx, Sy) | 1.1 | 45 |
| Correlation r | −0.78 | |
(a) Find the equation of the LSRL for predicting reaction time from hours of sleep.
(b) Predict the reaction time for a student who sleeps 6 hours. Interpret this value in context.
(c) Interpret the slope in context.
- residual = y − ŷ (observed minus predicted)
- r² = proportion of variation in y explained by x
- Good residual plot: random scatter, no pattern, no fan shape
(a) Calculate the residual for this student. Show your work.
(b) Interpret the residual in context.
(c) The r² value for this regression is 0.608. Interpret this value in context.
- Experiment → can conclude causation
- Observational study → association only (confounding possible)
- Confounding variable: related to both explanatory and response variables
- Double-blind: neither subjects nor evaluators know treatment
(a) Identify the explanatory variable and the response variable.
(b) Is this an experiment or an observational study? Justify your answer.
(c) Explain why the random assignment is important in this study.
- SRS: every subset of size n equally likely
- Voluntary response bias: overrepresents strong opinions
- Convenience sample: often biased, not representative
- Stratified: reduces variability within strata
(a) Identify the sampling method used.
(b) Describe one potential source of bias in this sample and explain how it might affect the results.
(c) Propose a better sampling method and explain why it would reduce bias.
- P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
- P(A|B) = P(A ∩ B) / P(B)
- Independent: P(A ∩ B) = P(A) · P(B)
- Mutually exclusive: P(A ∩ B) = 0
(a) What is the probability that a randomly selected student is enrolled in Math or English?
(b) What is the probability that a student is enrolled in English, given that they are enrolled in Math?
(c) Are enrollment in Math and English independent? Justify your answer numerically.
- μX = Σ[x · P(x)]
- σ²X = Σ[(x − μ)² · P(x)]
- If independent: Var(X ± Y) = Var(X) + Var(Y) always
- μ(aX + b) = a·μX + b; σ(aX + b) = |a|·σX
| Payoff X ($) | −5 | 0 | 10 | 20 |
|---|---|---|---|---|
| P(X = x) | 0.30 | 0.25 | 0.35 | 0.10 |
(a) Verify that this is a valid probability distribution.
(b) Calculate the expected value E(X). Interpret this value in context.
(c) Would you recommend playing this game? Justify your answer using your calculation.
- BINS: Binary, Independent, Number fixed, Same p
- P(X=k) = C(n,k) · p^k · (1−p)^(n−k)
- μ = np; σ = √(np(1−p))
- 10% condition: n ≤ 0.10·N for independence
(a) Verify that this situation satisfies the conditions for a binomial distribution (BINS).
(b) What is the probability that the student gets exactly 3 questions correct? Show your calculation.
(c) What is the expected number of correct answers, and what is the standard deviation?
- μ(x̄) = μ (sampling dist. centered at population mean)
- σ(x̄) = σ / √n (standard error of the mean)
- CLT: n ≥ 30 → sampling dist. approximately normal
- Larger n → smaller SE → less variability
(a) Describe the shape, mean, and standard error of the sampling distribution of x̄. Justify the shape.
(b) What is the probability that the sample mean weight exceeds 190 grams? Show your z-score calculation.
- μ(p̂) = p
- σ(p̂) = √(p(1−p)/n)
- Normality: np ≥ 10 AND n(1−p) ≥ 10 (Large Counts)
- Independence: n ≤ 0.10·N (10% condition)
(a) Verify the conditions for the sampling distribution of p̂ to be approximately normal. Show all checks.
(b) Describe the sampling distribution of p̂ (mean and standard deviation).
(c) What is the probability that the sample proportion is less than 0.60? Show your z-score calculation.
- CI: p̂ ± z* · √(p̂(1−p̂)/n)
- z*: 90% = 1.645, 95% = 1.960, 99% = 2.576
- Conditions: Random, 10%, Large Counts (np̂ ≥ 10, n(1−p̂) ≥ 10)
- Wider CI: higher confidence OR smaller n
(a) Check the conditions required to construct a confidence interval for the proportion.
(b) Construct a 95% confidence interval for the true proportion of voters who support the ordinance. Show all work.
(c) Interpret the interval in context. Does it suggest that a majority supports the ordinance? Explain.
- H₀: p = p₀ (null uses p₀, not p̂)
- z = (p̂ − p₀) / √(p₀(1−p₀)/n)
- P-value: P(data this extreme | H₀ true)
- Small P-value → strong evidence against H₀
(a) State the null and alternative hypotheses.
(b) Check the conditions for the z-test.
(c) Calculate the test statistic and P-value (use z-table: P(Z < −1.12) ≈ 0.131).
(d) At α = 0.05, state your conclusion in context.
- CI: x̄ ± t* · (s/√n)
- df = n − 1 (degrees of freedom)
- SE(x̄) = s / √n
- Use t when σ unknown; use z when σ known
(a) Check the conditions for a t-interval.
(b) Construct a 90% confidence interval for the true mean sugar content. Use t* = 1.729 for df = 19.
(c) Interpret the interval in context.
- t = (x̄ − μ₀) / (s/√n), df = n−1
- Paired data: d̄ = mean of differences, sd = std dev of differences
- Two-sided Hₐ: μ ≠ μ₀ → P-value = 2·P(T > |t|)
| Client | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| Before | 78 | 82 | 75 | 90 | 85 | 79 | 88 | 72 |
| After | 72 | 78 | 74 | 84 | 80 | 76 | 81 | 70 |
(a) Why is a paired t-test appropriate here?
(b) Calculate the differences (Before − After) and find d̄ and sd. (You may verify: d̄ = 4.875, sd ≈ 2.167)
(c) State H₀ and Hₐ, calculate the test statistic t, and use the fact that P-value ≈ 0.0003 to make a conclusion at α = 0.05.
- χ² = Σ[(O − E)² / E]
- df = k − 1 (k = number of categories)
- E = n · p (expected = total × claimed proportion)
- Condition: all expected counts ≥ 5
| Phenotype | A | B | C | D |
|---|---|---|---|---|
| Observed (O) | 92 | 28 | 25 | 15 |
| Expected (E) | 90 | 30 | 30 | 10 |
(a) State H₀ and Hₐ for this test.
(b) Verify the expected counts condition. Calculate the chi-square test statistic. Show your work for each category.
(c) With df = 3, the critical value at α = 0.05 is 7.815. State your conclusion.
- H₀: β = 0 (no linear relationship)
- t = b / SEb, df = n − 2
- CI: b ± t* · SEb
- Conditions: LINEAR, INDEPENDENT, NORMAL residuals, EQUAL variance, RANDOM (LINER)
| Term | Coef | SE Coef | t | P |
|---|---|---|---|---|
| Constant | 45.2 | 8.3 | 5.45 | <0.001 |
| Adspend | 3.84 | 1.12 | ? | 0.003 |
(a) Write the equation of the LSRL and interpret the slope in context.
(b) Calculate the t-statistic for the slope (show work).
(c) Using α = 0.01, is there convincing evidence of a positive linear relationship between advertising spend and sales? Justify with reference to the P-value.
(d) Construct a 95% confidence interval for the slope β. Use t* = 2.086 for df = 20.
- t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)
- df ≥ min(n₁−1, n₂−1) (conservative)
- Conditions: Two independent random samples, normality/CLT for each
| Group | n | x̄ | s |
|---|---|---|---|
| Method A | 30 | 83.4 | 9.2 |
| Method B | 35 | 78.1 | 11.6 |
(a) State appropriate hypotheses to test whether Method A produces higher mean scores than Method B.
(b) Check the conditions for a two-sample t-test.
(c) Calculate the test statistic t. Use df = 29 (conservative). Given that P-value ≈ 0.031, state your conclusion at α = 0.05 in context.
- E = (row total × col total) / n
- df = (r−1)(c−1)
- χ² = Σ[(O−E)²/E]
- H₀: variables are independent (no association)
| Good Sleep | Poor Sleep | Row Total | |
|---|---|---|---|
| Exercises Regularly | 72 | 28 | 100 |
| Does Not Exercise | 48 | 52 | 100 |
| Col Total | 120 | 80 | 200 |
(a) State H₀ and Hₐ.
(b) Calculate all four expected counts. Verify the condition (all E ≥ 5).
(c) Calculate the chi-square test statistic. Show work for each cell.
(d) With df = 1 and χ²_critical = 3.841 (α = 0.05), state your conclusion in context.
Complete Answer Key & Solutions
Detailed step-by-step explanations for all 20 questions
(a) IQR = Q3 − Q1 = 34 − 18 = 16
(b) Upper fence = Q3 + 1.5 × IQR = 34 + 1.5(16) = 34 + 24 = 58
Since 67 > 58, the maximum value of 67 IS an outlier by the 1.5 × IQR rule.
✓ IQR = 16; Upper fence = 58; 67 is an outlierShape: The distribution is approximately unimodal and slightly right-skewed, with most values clustered between 2–4 hours.
Outliers: The value at 9 hours is a potential outlier — it is far separated from the bulk of the data.
Center: The median is approximately 3 hours of study time.
Spread: The data range from 1 to 9 hours, with an IQR of approximately 2 hours (Q1≈2, Q3≈4).
✓ Must address SOCS in context of study hours(a) z = (x − μ) / σ = (4.4 − 3.2) / 0.8 = 1.2 / 0.8 = 1.5
A score of 4.4 is 1.5 standard deviations above the mean.
(b) 2.4 = μ − σ (one SD below) and 4.0 = μ + σ (one SD above).
By the 68-95-99.7 rule, approximately 68% of scores fall within ±1 standard deviation of the mean, so P(2.4 < X < 4.0) ≈ 0.68.
✓ z = 1.5; P(2.4 < X < 4.0) ≈ 68%(a) b = r · (Sy/Sx) = (−0.78)(45/1.1) = (−0.78)(40.909) ≈ −31.91 ms per hour
a = ȳ − b·x̄ = 312 − (−31.91)(7.2) = 312 + 229.75 ≈ 541.75
ŷ = 541.75 − 31.91x(b) For x = 6: ŷ = 541.75 − 31.91(6) = 541.75 − 191.46 ≈ 350.3 ms
We predict a student who sleeps 6 hours will have a reaction time of about 350 ms.
(c) Slope: For each additional hour of sleep, the model predicts reaction time decreases by approximately 31.91 ms.
✓ b ≈ −31.91, a ≈ 541.75; prediction ≈ 350.3 ms(a) ŷ = 630 − 44.05(8) = 630 − 352.4 = 277.6 ms
Residual = observed − predicted = 290 − 277.6 = +12.4 ms
(b) The student's actual reaction time was 12.4 ms higher than what the model predicted for someone sleeping 8 hours.
(c) r² = 0.608 means approximately 60.8% of the variation in reaction time is explained by the linear relationship with hours of sleep. The remaining 39.2% is due to other factors.
✓ Residual = +12.4 ms; r² = 60.8% of variation explained(a) Explanatory variable: whether a student receives the tutoring program. Response variable: SAT score (post-program).
(b) This is an experiment because the researcher actively assigns students to the tutoring program or control group (imposes a treatment). Causation can be inferred.
(c) Random assignment helps ensure that the two groups are roughly equivalent on potential confounding variables (prior knowledge, motivation, socioeconomic status) before the treatment, allowing any difference in outcomes to be attributed to the tutoring program rather than pre-existing differences.
✓ Experiment; random assignment controls for confounding variables(a) This is systematic random sampling (every 5th student).
(b) Potential bias: Undercoverage — students who do not eat in the cafeteria (e.g., those who bring lunch or leave campus) are excluded. Students who use the cafeteria may have more positive opinions about the food, causing the results to overestimate satisfaction.
(c) Better method: Use a simple random sample from the complete student roster (a list of all students). This gives every student an equal chance of being selected, reducing undercoverage bias and making the results more representative of all students.
✓ Systematic sampling; undercoverage bias; SRS from school roster is better(a) P(M ∪ E) = P(M) + P(E) − P(M ∩ E) = 0.45 + 0.30 − 0.18 = 0.57
(b) P(E|M) = P(E ∩ M) / P(M) = 0.18 / 0.45 = 0.40
(c) If independent: P(M) × P(E) = 0.45 × 0.30 = 0.135 ≠ 0.18 = P(M ∩ E).
Since P(M ∩ E) ≠ P(M)·P(E), Math and English enrollment are NOT independent.
✓ P(M∪E)=0.57; P(E|M)=0.40; NOT independent (0.135≠0.18)(a) Sum of probabilities: 0.30 + 0.25 + 0.35 + 0.10 = 1.00 ✓. All P(x) ≥ 0 ✓. Valid distribution.
(b) E(X) = (−5)(0.30) + (0)(0.25) + (10)(0.35) + (20)(0.10)
= −1.50 + 0 + 3.50 + 2.00 = $4.00
(c) Since E(X) = $4.00 > 0, on average a player wins $4 per game. The game is favorable for the player and is worth playing — in the long run, the player expects to profit.
✓ Probabilities sum to 1; E(X) = $4.00; game favors the player(a) BINS check: Binary (correct/incorrect) ✓; Independent (each question guessed independently) ✓; Number of trials fixed (n = 10) ✓; Same p = 0.25 for each question ✓. Conditions met.
(b) P(X = 3) = C(10,3) · (0.25)³ · (0.75)⁷
= 120 · 0.015625 · 0.133484 ≈ 0.2503There is approximately a 25.0% chance of guessing exactly 3 correct.
(c) μ = np = 10 × 0.25 = 2.5 expected correct answers.
σ = √(np(1−p)) = √(10 × 0.25 × 0.75) = √1.875 ≈ 1.369
✓ BINS verified; P(X=3) ≈ 0.2503; μ=2.5, σ≈1.369(a) Shape: Although the population is skewed right, because n = 49 ≥ 30, the Central Limit Theorem guarantees the sampling distribution of x̄ is approximately normal.
Mean: μ(x̄) = μ = 185 g
Standard Error: SE = σ/√n = 30/√49 = 30/7 ≈ 4.286 g
(b) z = (190 − 185) / 4.286 = 5/4.286 ≈ 1.167
P(x̄ > 190) = P(Z > 1.167) ≈ 1 − 0.8784 ≈ 0.122 (approximately 12.2%)
✓ x̄ ~ N(185, 4.286) by CLT; P(x̄>190) ≈ 0.122(a) Conditions:
• Random: random sample ✓
• 10% condition: n = 80 ≤ 0.10 × 2000 = 200 ✓
• Large Counts: np = 80(0.65) = 52 ≥ 10 ✓; n(1−p) = 80(0.35) = 28 ≥ 10 ✓
(b) μ(p̂) = 0.65; σ(p̂) = √(0.65×0.35/80) = √(0.002844) ≈ 0.05333
(c) z = (0.60 − 0.65) / 0.05333 = −0.05 / 0.05333 ≈ −0.938
P(p̂ < 0.60) = P(Z < −0.938) ≈ 0.174 (approximately 17.4%)
✓ All conditions met; σ(p̂)≈0.0533; P(p̂<0.60)≈0.174p̂ = 87/150 = 0.58
(a) Conditions: Random sample ✓; 10% (150 ≤ 10% of all voters) ✓; Large Counts: np̂ = 87 ≥ 10 ✓, n(1−p̂) = 63 ≥ 10 ✓
(b) SE = √(0.58×0.42/150) = √(0.001624) ≈ 0.04030
CI = 0.58 ± 1.960 × 0.04030 = 0.58 ± 0.079095% CI: (0.501, 0.659)
(c) We are 95% confident that the true proportion of voters who support the ordinance is between 50.1% and 65.9%. Since the entire interval is above 0.50, this provides evidence that a majority of voters support the ordinance.
✓ p̂=0.58; 95% CI = (0.501, 0.659); majority supportedp̂ = 42/120 = 0.35
(a) H₀: p = 0.40 (40% of boxes have a prize); Hₐ: p < 0.40 (fewer than 40% have prizes) — one-sided, left-tailed.
(b) Conditions: Random ✓; 10% (120 ≤ 10% of all boxes) ✓; Large Counts under H₀: np₀=48≥10 ✓, n(1−p₀)=72≥10 ✓
(c) z = (0.35 − 0.40) / √(0.40×0.60/120) = −0.05 / √(0.002) = −0.05 / 0.04472 ≈ −1.118
P-value = P(Z < −1.12) ≈ 0.131
(d) Since P-value = 0.131 > α = 0.05, we fail to reject H₀. There is not sufficient evidence at the 5% significance level that fewer than 40% of boxes contain a prize.
✓ z≈−1.118; P-value≈0.131 > 0.05; fail to reject H₀(a) Conditions: Random sample of 20 cereals ✓; 10% condition (20 cereals ≤ 10% of all cereals) ✓; Population approximately normal (stated) ✓
(b) SE = s/√n = 3.8/√20 = 3.8/4.472 ≈ 0.8497
CI = 14.3 ± 1.729 × 0.8497 = 14.3 ± 1.46990% CI: (12.83, 15.77) grams
(c) We are 90% confident that the true mean sugar content of breakfast cereals is between 12.83 and 15.77 grams.
✓ 90% CI = (12.83, 15.77) grams(a) A paired t-test is appropriate because the measurements are not independent — each client provides both a before and after measurement. The data are naturally paired by individual.
(b) Differences (Before − After): 6, 4, 1, 6, 5, 3, 7, 2. d̄ = 34/8 = 4.875, sd ≈ 2.167 (given).
(c) H₀: μd = 0 (no reduction in heart rate); Hₐ: μd > 0 (heart rate decreases)
t = d̄ / (sd/√n) = 4.875 / (2.167/√8) = 4.875 / 0.7659 ≈ 6.364Since P-value ≈ 0.0003 < α = 0.05, we reject H₀. There is convincing evidence that the gym program significantly reduces resting heart rate.
✓ d̄=4.875; t≈6.364; P≈0.0003 < 0.05 → reject H₀(a) H₀: The offspring follow the 9:3:3:1 ratio (the model fits). Hₐ: The offspring do NOT follow the 9:3:3:1 ratio.
(b) All expected counts (90, 30, 30, 10) ≥ 5 ✓
χ² = (92−90)²/90 + (28−30)²/30 + (25−30)²/30 + (15−10)²/10 = 4/90 + 4/30 + 25/30 + 25/10 = 0.044 + 0.133 + 0.833 + 2.500 = 3.511(c) Since χ² = 3.511 < χ²_critical = 7.815 (α = 0.05, df = 3), we fail to reject H₀. There is not convincing evidence that the offspring distribution differs from the predicted 9:3:3:1 ratio.
✓ χ² ≈ 3.511 < 7.815; fail to reject H₀; data consistent with 9:3:3:1(a) ŷ = 45.2 + 3.84x. Slope interpretation: For each additional $1,000 spent on advertising, monthly sales are predicted to increase by approximately $3,840.
(b) t = b / SEb = 3.84 / 1.12 ≈ 3.429
(c) P-value = 0.003 < α = 0.01. We reject H₀: β = 0. There is convincing evidence of a positive linear relationship between advertising spend and monthly sales.
(d) 95% CI: b ± t*·SEb = 3.84 ± 2.086(1.12) = 3.84 ± 2.336
95% CI for β: (1.504, 6.176) (thousands of dollars)
Since 0 is not in the interval, this confirms a positive linear relationship.
✓ t≈3.429; P=0.003 < 0.01; 95% CI = (1.504, 6.176)(a) H₀: μA − μB = 0 (no difference in mean scores); Hₐ: μA − μB > 0 (Method A produces higher mean scores) — one-sided, right-tailed.
(b) Conditions: Two independent random samples ✓; Both n ≥ 30, so CLT guarantees approximate normality ✓
(c) SE = √(s²A/nA + s²B/nB) = √(9.2²/30 + 11.6²/35) = √(2.8213 + 3.8423) = √6.6636 ≈ 2.581
t = (83.4 − 78.1) / 2.581 = 5.3 / 2.581 ≈ 2.054P-value ≈ 0.031 < α = 0.05 → Reject H₀.
There is convincing evidence at the 5% level that Method A produces a higher mean exam score than Method B.
✓ t≈2.054; P≈0.031 < 0.05 → reject H₀; Method A is better(a) H₀: Sleep quality and exercise habits are independent (no association). Hₐ: Sleep quality and exercise habits are NOT independent (there is an association).
(b) Expected counts [E = (row total × col total) / n]:
• Exercises / Good Sleep: (100 × 120)/200 = 60
• Exercises / Poor Sleep: (100 × 80)/200 = 40
• No Exercise / Good Sleep: (100 × 120)/200 = 60
• No Exercise / Poor Sleep: (100 × 80)/200 = 40
All expected counts ≥ 5 ✓
χ² = (72−60)²/60 + (28−40)²/40 + (48−60)²/60 + (52−40)²/40 = 144/60 + 144/40 + 144/60 + 144/40 = 2.4 + 3.6 + 2.4 + 3.6 = 12.0(d) χ² = 12.0 > χ²_critical = 3.841 (α = 0.05, df = 1) → Reject H₀. There is convincing evidence of an association between regular exercise and sleep quality.
✓ All E≥5; χ²=12.0 > 3.841 → reject H₀; association exists