Unit 1 · Exploring One-Variable Data
01
Unit 1 · Summarizing Data
Measures of Center & Spread
📖 Core Concept
The five-number summary (Min, Q1, Median, Q3, Max) describes the distribution of a quantitative variable. IQR = Q3 − Q1 measures spread of the middle 50%. An observation is an outlier if it falls below Q1 − 1.5·IQR or above Q3 + 1.5·IQR.
🧠 Memorize
  • IQR = Q3 − Q1
  • Outlier if: x < Q1 − 1.5·IQR OR x > Q3 + 1.5·IQR
  • Mean is NOT resistant to outliers; Median IS resistant
✏️ Worked Example
Data: {2, 5, 7, 9, 12, 15, 22}. Find IQR and identify any outliers.
Q1=5, Q3=15, IQR=10. Fences: [−10, 30]. No outliers.
A data set of 7 values has the following five-number summary: Min = 12, Q1 = 18, Median = 25, Q3 = 34, Max = 67.

(a) Calculate the IQR.
(b) Determine whether the maximum value of 67 is an outlier. Show your work using the 1.5 × IQR rule.
Your Answer
Answer:
02
Unit 1 · Distributions
Describing Shape, Center, Spread
📖 Core Concept
When describing a distribution, always address S·O·C·S: Shape (symmetric/skewed left/skewed right, unimodal/bimodal), Outliers, Center (mean or median), and Spread (IQR or standard deviation). In a right-skewed distribution, mean > median.
🧠 Memorize
  • SOCS: Shape, Outliers, Center, Spread — always in context
  • Right-skewed → mean pulled right → mean > median
  • Left-skewed → mean pulled left → mean < median
✏️ Worked Example
A histogram of household incomes is strongly right-skewed. Which measure of center is more appropriate, and why?
Median is better — it is resistant to the high-income outliers that pull the mean upward.
The following dotplot shows the number of hours students spent studying for an AP exam:

1h: ● ●
2h: ● ● ● ●
3h: ● ● ● ● ● ●
4h: ● ● ● ● ●
5h: ● ● ●
9h: ●

Describe the distribution of study hours completely. Be sure to address shape, center, spread, and outliers in context.
Your Answer
Answer:
03
Unit 1 · Normal Distribution
Z-Score & Standardization
📖 Core Concept
A z-score measures how many standard deviations an observation is from the mean: z = (x − μ) / σ. The standard normal distribution N(0,1) allows us to find probabilities using a z-table. The 68-95-99.7 rule gives approximate probabilities for 1, 2, and 3 standard deviations.
🧠 Memorize
  • z = (x − μ) / σ
  • 68% within ±1σ, 95% within ±2σ, 99.7% within ±3σ
  • P(X < x) = normalcdf(−∞, z) on calculator
✏️ Worked Example
Heights of adult males ~ N(70, 3) inches. What proportion are taller than 73 inches?
z = (73−70)/3 = 1.0 → P(Z > 1) = 1 − 0.8413 = 0.1587 ≈ 15.87%
Scores on an AP Statistics exam are approximately normally distributed with mean μ = 3.2 and standard deviation σ = 0.8 (on the 1–5 scale).

(a) What is the z-score of a student who scored 4.4?
(b) What proportion of students scored between 2.4 and 4.0? Use the 68-95-99.7 rule and show your reasoning.
Your Answer
Answer:
Unit 2 · Exploring Two-Variable Data
04
Unit 2 · Linear Regression
Least-Squares Regression Line
📖 Core Concept
The LSRL (least-squares regression line) is ŷ = a + bx, where b = r·(Sy/Sx) and a = ȳ − b·x̄. The slope b represents the predicted change in y per 1-unit increase in x. The LSRL always passes through the point (x̄, ȳ).
🧠 Memorize
  • ŷ = a + bx (ŷ = predicted value)
  • b = r · (Sy / Sx)
  • a = ȳ − b · x̄
  • LSRL passes through (x̄, ȳ)
✏️ Worked Example
r = 0.85, x̄ = 10, ȳ = 50, Sx = 2, Sy = 8. Find the LSRL equation.
b = 0.85·(8/2) = 3.4; a = 50 − 3.4·10 = 16; ŷ = 16 + 3.4x
A study of 25 students recorded hours of sleep (x) and reaction time in milliseconds (y). The summary statistics are:

StatisticSleep (hrs)Reaction Time (ms)
Mean (x̄, ȳ)7.2312
Std Dev (Sx, Sy)1.145
Correlation r−0.78

(a) Find the equation of the LSRL for predicting reaction time from hours of sleep.
(b) Predict the reaction time for a student who sleeps 6 hours. Interpret this value in context.
(c) Interpret the slope in context.
Your Answer
Answer:
05
Unit 2 · Residuals & r²
Residual Analysis & Coefficient of Determination
📖 Core Concept
A residual = observed y − predicted ŷ. The coefficient of determination r² represents the proportion of variation in y that is explained by the linear relationship with x. A residual plot with no pattern indicates a linear model is appropriate.
🧠 Memorize
  • residual = y − ŷ (observed minus predicted)
  • r² = proportion of variation in y explained by x
  • Good residual plot: random scatter, no pattern, no fan shape
✏️ Worked Example
ŷ = 50 + 3x. For x = 4, observed y = 65. Find the residual.
ŷ = 50 + 3(4) = 62; residual = 65 − 62 = +3 (the model underpredicts by 3)
Using the LSRL from Question 4 (ŷ = 630 − 44.05x, reaction time predicted from sleep), a student who slept 8 hours had an actual reaction time of 290 ms.

(a) Calculate the residual for this student. Show your work.
(b) Interpret the residual in context.
(c) The r² value for this regression is 0.608. Interpret this value in context.
Your Answer
Answer:
Unit 3 · Collecting Data
06
Unit 3 · Study Design
Experiments vs. Observational Studies
📖 Core Concept
In an experiment, researchers actively impose treatments and randomly assign subjects to treatment groups — only experiments can establish causation. In an observational study, researchers observe without interfering — only association (not causation) can be concluded. Key principles of experiment design: Control, Randomization, Replication, Blinding.
🧠 Memorize
  • Experiment → can conclude causation
  • Observational study → association only (confounding possible)
  • Confounding variable: related to both explanatory and response variables
  • Double-blind: neither subjects nor evaluators know treatment
✏️ Worked Example
Students who eat breakfast score higher on tests. Can we conclude breakfast causes better scores?
No — this is observational. Confounders (sleep, socioeconomic status) may explain the association.
A researcher wants to determine whether a new tutoring program improves SAT scores. She recruits 60 students and randomly assigns 30 to receive tutoring and 30 to a control group that studies independently. Pre- and post-scores are recorded.

(a) Identify the explanatory variable and the response variable.
(b) Is this an experiment or an observational study? Justify your answer.
(c) Explain why the random assignment is important in this study.
Your Answer
Answer:
07
Unit 3 · Sampling Methods
Random Sampling & Bias
📖 Core Concept
A simple random sample (SRS) gives every group of n individuals an equal chance of selection. Stratified random sampling divides the population into subgroups and randomly samples from each. Cluster sampling randomly selects entire groups. Common biases: undercoverage, voluntary response bias, nonresponse bias.
🧠 Memorize
  • SRS: every subset of size n equally likely
  • Voluntary response bias: overrepresents strong opinions
  • Convenience sample: often biased, not representative
  • Stratified: reduces variability within strata
✏️ Worked Example
A radio station asks listeners to call in their opinion on a new policy. What type of bias is present?
Voluntary response bias — only those with strong opinions (usually against) call in, making results unrepresentative.
A school administrator wants to survey students about lunch quality. She stands at the cafeteria entrance and surveys every 5th student who enters on Tuesday.

(a) Identify the sampling method used.
(b) Describe one potential source of bias in this sample and explain how it might affect the results.
(c) Propose a better sampling method and explain why it would reduce bias.
Your Answer
Answer:
Unit 4 · Probability, Random Variables & Distributions
08
Unit 4 · Probability Rules
Addition, Multiplication & Conditional Probability
📖 Core Concept
Addition rule: P(A ∪ B) = P(A) + P(B) − P(A ∩ B). Multiplication rule: P(A ∩ B) = P(A) · P(B|A). Events are independent if P(A|B) = P(A). Conditional probability: P(A|B) = P(A ∩ B) / P(B).
🧠 Memorize
  • P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
  • P(A|B) = P(A ∩ B) / P(B)
  • Independent: P(A ∩ B) = P(A) · P(B)
  • Mutually exclusive: P(A ∩ B) = 0
✏️ Worked Example
P(A) = 0.4, P(B) = 0.3, P(A ∩ B) = 0.12. Are A and B independent?
Check: P(A)·P(B) = 0.4·0.3 = 0.12 = P(A∩B). Yes, A and B are independent.
At a college, 45% of students are enrolled in Math (M), 30% are enrolled in English (E), and 18% are enrolled in both.

(a) What is the probability that a randomly selected student is enrolled in Math or English?
(b) What is the probability that a student is enrolled in English, given that they are enrolled in Math?
(c) Are enrollment in Math and English independent? Justify your answer numerically.
Your Answer
Answer:
09
Unit 4 · Discrete Random Variables
Expected Value & Variance of a Random Variable
📖 Core Concept
For a discrete random variable X: μX = Σ[x · P(x)] (expected value). σ²X = Σ[(x − μ)² · P(x)] (variance). If X and Y are independent, then σ²(X+Y) = σ²X + σ²Y and σ²(X−Y) = σ²X + σ²Y (variances always add for independent variables).
🧠 Memorize
  • μX = Σ[x · P(x)]
  • σ²X = Σ[(x − μ)² · P(x)]
  • If independent: Var(X ± Y) = Var(X) + Var(Y) always
  • μ(aX + b) = a·μX + b; σ(aX + b) = |a|·σX
✏️ Worked Example
X: P(0)=0.2, P(1)=0.5, P(2)=0.3. Find E(X).
E(X) = 0(0.2) + 1(0.5) + 2(0.3) = 0 + 0.5 + 0.6 = 1.1
A game has the following payoff distribution:

Payoff X ($)−501020
P(X = x)0.300.250.350.10

(a) Verify that this is a valid probability distribution.
(b) Calculate the expected value E(X). Interpret this value in context.
(c) Would you recommend playing this game? Justify your answer using your calculation.
Your Answer
Answer:
10
Unit 4 · Binomial Distribution
Binomial Probability
📖 Core Concept
A binomial setting requires: Binary outcomes, Independent trials, fixed Number of trials, and same probability of Success (BINS). Formula: P(X = k) = C(n,k) · p^k · (1−p)^(n−k). Mean: μ = np. SD: σ = √(np(1−p)).
🧠 Memorize
  • BINS: Binary, Independent, Number fixed, Same p
  • P(X=k) = C(n,k) · p^k · (1−p)^(n−k)
  • μ = np; σ = √(np(1−p))
  • 10% condition: n ≤ 0.10·N for independence
✏️ Worked Example
n = 5, p = 0.3. Find P(X = 2).
C(5,2)·(0.3)²·(0.7)³ = 10·0.09·0.343 = 0.3087
A multiple-choice test has 10 questions, each with 4 choices. A student who did not study guesses randomly on every question.

(a) Verify that this situation satisfies the conditions for a binomial distribution (BINS).
(b) What is the probability that the student gets exactly 3 questions correct? Show your calculation.
(c) What is the expected number of correct answers, and what is the standard deviation?
Your Answer
Answer:
Unit 5 · Sampling Distributions
11
Unit 5 · Central Limit Theorem
Sampling Distribution of x̄
📖 Core Concept
The Central Limit Theorem (CLT): For sufficiently large n (n ≥ 30), the sampling distribution of x̄ is approximately normal with mean μ(x̄) = μ and standard error SE = σ/√n, regardless of the population's shape. This allows us to calculate probabilities for sample means.
🧠 Memorize
  • μ(x̄) = μ (sampling dist. centered at population mean)
  • σ(x̄) = σ / √n (standard error of the mean)
  • CLT: n ≥ 30 → sampling dist. approximately normal
  • Larger n → smaller SE → less variability
✏️ Worked Example
Population: μ = 100, σ = 15. Sample n = 36. Find the SE and describe the sampling distribution.
SE = 15/√36 = 2.5. By CLT: x̄ ~ N(100, 2.5) approximately.
The weights of apples at a farm are skewed right with mean μ = 185 grams and standard deviation σ = 30 grams. A random sample of n = 49 apples is selected.

(a) Describe the shape, mean, and standard error of the sampling distribution of x̄. Justify the shape.
(b) What is the probability that the sample mean weight exceeds 190 grams? Show your z-score calculation.
Your Answer
Answer:
12
Unit 5 · Sampling Distribution of p̂
Sampling Distribution of a Sample Proportion
📖 Core Concept
The sampling distribution of p̂ has mean μ(p̂) = p and standard deviation σ(p̂) = √(p(1−p)/n). For the distribution to be approximately normal, both np ≥ 10 and n(1−p) ≥ 10 must hold (Large Counts condition). Also need the 10% condition: n ≤ 0.10·N.
🧠 Memorize
  • μ(p̂) = p
  • σ(p̂) = √(p(1−p)/n)
  • Normality: np ≥ 10 AND n(1−p) ≥ 10 (Large Counts)
  • Independence: n ≤ 0.10·N (10% condition)
✏️ Worked Example
p = 0.6, n = 100. Verify normality and find σ(p̂).
np = 60 ≥ 10 ✓, n(1−p) = 40 ≥ 10 ✓. σ(p̂) = √(0.6·0.4/100) = 0.049
According to a national survey, 65% of high school students own a smartphone. A researcher surveys a random sample of 80 students from a large high school with 2,000 students.

(a) Verify the conditions for the sampling distribution of p̂ to be approximately normal. Show all checks.
(b) Describe the sampling distribution of p̂ (mean and standard deviation).
(c) What is the probability that the sample proportion is less than 0.60? Show your z-score calculation.
Your Answer
Answer:
Unit 6 · Inference for Categorical Data: Proportions
13
Unit 6 · Confidence Intervals
One-Sample z-Interval for a Proportion
📖 Core Concept
A confidence interval for a proportion: p̂ ± z* · √(p̂(1−p̂)/n). The confidence level (e.g., 95%) means: if we repeated this procedure many times, about 95% of resulting intervals would capture the true population proportion p. Common z*: 90% → 1.645, 95% → 1.960, 99% → 2.576.
🧠 Memorize
  • CI: p̂ ± z* · √(p̂(1−p̂)/n)
  • z*: 90% = 1.645, 95% = 1.960, 99% = 2.576
  • Conditions: Random, 10%, Large Counts (np̂ ≥ 10, n(1−p̂) ≥ 10)
  • Wider CI: higher confidence OR smaller n
✏️ Worked Example
n = 200, p̂ = 0.54. Construct a 95% CI for p.
0.54 ± 1.96·√(0.54·0.46/200) = 0.54 ± 0.069 = (0.471, 0.609)
In a random sample of 150 voters, 87 said they support a new city ordinance.

(a) Check the conditions required to construct a confidence interval for the proportion.
(b) Construct a 95% confidence interval for the true proportion of voters who support the ordinance. Show all work.
(c) Interpret the interval in context. Does it suggest that a majority supports the ordinance? Explain.
Your Answer
Answer:
14
Unit 6 · Significance Testing
One-Sample z-Test for a Proportion
📖 Core Concept
A significance test uses sample data to evaluate evidence against H₀. Steps: State H₀ and Hₐ, check conditions, calculate the test statistic z = (p̂ − p₀) / √(p₀(1−p₀)/n), find the P-value, and make a conclusion. Reject H₀ if P-value < α.
🧠 Memorize
  • H₀: p = p₀ (null uses p₀, not p̂)
  • z = (p̂ − p₀) / √(p₀(1−p₀)/n)
  • P-value: P(data this extreme | H₀ true)
  • Small P-value → strong evidence against H₀
✏️ Worked Example
H₀: p = 0.5, p̂ = 0.56, n = 100. Find the test statistic.
z = (0.56 − 0.50) / √(0.5·0.5/100) = 0.06 / 0.05 = 1.20
A cereal company claims that 40% of boxes contain a prize. A consumer advocacy group suspects the true proportion is less than 40%. They randomly sample 120 boxes and find 42 with prizes.

(a) State the null and alternative hypotheses.
(b) Check the conditions for the z-test.
(c) Calculate the test statistic and P-value (use z-table: P(Z < −1.12) ≈ 0.131).
(d) At α = 0.05, state your conclusion in context.
Your Answer
Answer:
Unit 7 · Inference for Quantitative Data: Means
15
Unit 7 · t-Procedures
One-Sample t-Interval for a Mean
📖 Core Concept
When σ is unknown, use the t-distribution with df = n − 1 degrees of freedom. Confidence interval: x̄ ± t* · (s/√n). Conditions: Random sample, 10% condition, and either population is normal OR n ≥ 30 (CLT). Use t-table with df = n − 1.
🧠 Memorize
  • CI: x̄ ± t* · (s/√n)
  • df = n − 1 (degrees of freedom)
  • SE(x̄) = s / √n
  • Use t when σ unknown; use z when σ known
✏️ Worked Example
x̄ = 82, s = 10, n = 16. Construct a 95% CI. (t* with df=15 is 2.131)
82 ± 2.131·(10/4) = 82 ± 5.33 = (76.67, 87.33)
A nutritionist measures the sugar content (in grams) of a random sample of 20 breakfast cereals. The sample mean is x̄ = 14.3 g and the sample standard deviation is s = 3.8 g. Assume the data come from an approximately normal population.

(a) Check the conditions for a t-interval.
(b) Construct a 90% confidence interval for the true mean sugar content. Use t* = 1.729 for df = 19.
(c) Interpret the interval in context.
Your Answer
Answer:
16
Unit 7 · t-Test
One-Sample t-Test for a Mean
📖 Core Concept
The one-sample t-test statistic is t = (x̄ − μ₀) / (s/√n), with df = n − 1. The P-value is found from the t-distribution. A paired t-test is used when data are naturally paired (before/after, matched pairs) — compute differences d = x₁ − x₂ and treat as one-sample t-test on d.
🧠 Memorize
  • t = (x̄ − μ₀) / (s/√n), df = n−1
  • Paired data: d̄ = mean of differences, sd = std dev of differences
  • Two-sided Hₐ: μ ≠ μ₀ → P-value = 2·P(T > |t|)
✏️ Worked Example
H₀: μ = 50, x̄ = 47, s = 6, n = 9. Find t.
t = (47 − 50)/(6/3) = −3/2 = −1.5, df = 8
A gym claims their 8-week program reduces resting heart rate (BPM) in clients. The resting heart rates (BPM) for 8 randomly selected clients before and after the program:

Client12345678
Before7882759085798872
After7278748480768170

(a) Why is a paired t-test appropriate here?
(b) Calculate the differences (Before − After) and find d̄ and sd. (You may verify: d̄ = 4.875, sd ≈ 2.167)
(c) State H₀ and Hₐ, calculate the test statistic t, and use the fact that P-value ≈ 0.0003 to make a conclusion at α = 0.05.
Your Answer
Answer:
Unit 8 · Inference for Categorical Data: Chi-Square
17
Unit 8 · Chi-Square Tests
Chi-Square Test for Goodness of Fit
📖 Core Concept
The chi-square goodness-of-fit test tests whether observed categorical counts match a claimed distribution. Test statistic: χ² = Σ[(O − E)² / E], where O = observed count and E = expected count. df = number of categories − 1. Conditions: Random sample, all expected counts ≥ 5.
🧠 Memorize
  • χ² = Σ[(O − E)² / E]
  • df = k − 1 (k = number of categories)
  • E = n · p (expected = total × claimed proportion)
  • Condition: all expected counts ≥ 5
✏️ Worked Example
Fair die rolled 60 times. Expected per face = 10. Observed face-1 count = 14. Contribution to χ²?
(14−10)²/10 = 16/10 = 1.6
A genetics experiment predicts offspring will appear in ratio 9:3:3:1 for four phenotypes. In a random sample of 160 offspring, the observed counts are:

PhenotypeABCD
Observed (O)92282515
Expected (E)90303010

(a) State H₀ and Hₐ for this test.
(b) Verify the expected counts condition. Calculate the chi-square test statistic. Show your work for each category.
(c) With df = 3, the critical value at α = 0.05 is 7.815. State your conclusion.
Your Answer
Answer:
Unit 9 · Inference for Quantitative Data: Slopes
18
Unit 9 · Regression Inference
t-Test & CI for the Slope β
📖 Core Concept
To test whether a linear relationship exists in the population, we test H₀: β = 0 using t = b / SEb with df = n − 2. A confidence interval for the true slope β is b ± t* · SEb. If 0 is NOT in the confidence interval, we have evidence of a linear relationship.
🧠 Memorize
  • H₀: β = 0 (no linear relationship)
  • t = b / SEb, df = n − 2
  • CI: b ± t* · SEb
  • Conditions: LINEAR, INDEPENDENT, NORMAL residuals, EQUAL variance, RANDOM (LINER)
✏️ Worked Example
b = 2.5, SEb = 0.8. Find the t-statistic for H₀: β = 0.
t = 2.5 / 0.8 = 3.125. With small P-value, reject H₀ — linear relationship exists.
A regression analysis of advertising spend (x, in $1000s) vs. monthly sales (y, in $1000s) for 22 retail stores yields the following computer output:

TermCoefSE CoeftP
Constant45.28.35.45<0.001
Adspend3.841.12?0.003

(a) Write the equation of the LSRL and interpret the slope in context.
(b) Calculate the t-statistic for the slope (show work).
(c) Using α = 0.01, is there convincing evidence of a positive linear relationship between advertising spend and sales? Justify with reference to the P-value.
(d) Construct a 95% confidence interval for the slope β. Use t* = 2.086 for df = 20.
Your Answer
Answer:
19
Unit 6–7 · Two-Sample Inference
Two-Sample t-Test for Difference of Means
📖 Core Concept
The two-sample t-test compares the means of two independent populations. H₀: μ₁ − μ₂ = 0. Test statistic: t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂). Use the conservative df = min(n₁−1, n₂−1), or technology gives exact df. CI: (x̄₁ − x̄₂) ± t* · SE.
🧠 Memorize
  • t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)
  • df ≥ min(n₁−1, n₂−1) (conservative)
  • Conditions: Two independent random samples, normality/CLT for each
✏️ Worked Example
Group 1: x̄=80, s=10, n=25. Group 2: x̄=74, s=12, n=30. Find the t-statistic.
SE = √(100/25 + 144/30) = √(4+4.8) = √8.8 ≈ 2.97; t = (80−74)/2.97 ≈ 2.02
A researcher compares exam scores between two teaching methods. Randomly selected students from each group:

Groupns
Method A3083.49.2
Method B3578.111.6

(a) State appropriate hypotheses to test whether Method A produces higher mean scores than Method B.
(b) Check the conditions for a two-sample t-test.
(c) Calculate the test statistic t. Use df = 29 (conservative). Given that P-value ≈ 0.031, state your conclusion at α = 0.05 in context.
Your Answer
Answer:
20
Unit 8 · Chi-Square Test of Independence
Chi-Square Test for Association / Independence
📖 Core Concept
The chi-square test of independence tests whether two categorical variables are associated. Expected counts: E = (row total × column total) / table total. df = (rows − 1)(columns − 1). H₀: the two variables are independent (no association).
🧠 Memorize
  • E = (row total × col total) / n
  • df = (r−1)(c−1)
  • χ² = Σ[(O−E)²/E]
  • H₀: variables are independent (no association)
✏️ Worked Example
Row total = 60, col total = 80, n = 200. Find E.
E = (60 × 80) / 200 = 4800 / 200 = 24
A survey of 200 adults asked about exercise habits and sleep quality:

Good SleepPoor SleepRow Total
Exercises Regularly7228100
Does Not Exercise4852100
Col Total12080200

(a) State H₀ and Hₐ.
(b) Calculate all four expected counts. Verify the condition (all E ≥ 5).
(c) Calculate the chi-square test statistic. Show work for each cell.
(d) With df = 1 and χ²_critical = 3.841 (α = 0.05), state your conclusion in context.
Your Answer
Answer:
0
/20
Your Final Score
0
Correct
0
Incorrect
0
Skipped

Complete Answer Key & Solutions

Detailed step-by-step explanations for all 20 questions

01Unit 1 · IQR & Outlier Detection

(a) IQR = Q3 − Q1 = 34 − 18 = 16

(b) Upper fence = Q3 + 1.5 × IQR = 34 + 1.5(16) = 34 + 24 = 58

Since 67 > 58, the maximum value of 67 IS an outlier by the 1.5 × IQR rule.

✓ IQR = 16; Upper fence = 58; 67 is an outlier
02Unit 1 · Describing Distributions (SOCS)

Shape: The distribution is approximately unimodal and slightly right-skewed, with most values clustered between 2–4 hours.

Outliers: The value at 9 hours is a potential outlier — it is far separated from the bulk of the data.

Center: The median is approximately 3 hours of study time.

Spread: The data range from 1 to 9 hours, with an IQR of approximately 2 hours (Q1≈2, Q3≈4).

✓ Must address SOCS in context of study hours
03Unit 1 · Z-Score & 68-95-99.7 Rule

(a) z = (x − μ) / σ = (4.4 − 3.2) / 0.8 = 1.2 / 0.8 = 1.5

A score of 4.4 is 1.5 standard deviations above the mean.

(b) 2.4 = μ − σ (one SD below) and 4.0 = μ + σ (one SD above).

By the 68-95-99.7 rule, approximately 68% of scores fall within ±1 standard deviation of the mean, so P(2.4 < X < 4.0) ≈ 0.68.

✓ z = 1.5; P(2.4 < X < 4.0) ≈ 68%
04Unit 2 · Least-Squares Regression Line

(a) b = r · (Sy/Sx) = (−0.78)(45/1.1) = (−0.78)(40.909) ≈ −31.91 ms per hour

a = ȳ − b·x̄ = 312 − (−31.91)(7.2) = 312 + 229.75 ≈ 541.75

ŷ = 541.75 − 31.91x

(b) For x = 6: ŷ = 541.75 − 31.91(6) = 541.75 − 191.46 ≈ 350.3 ms

We predict a student who sleeps 6 hours will have a reaction time of about 350 ms.

(c) Slope: For each additional hour of sleep, the model predicts reaction time decreases by approximately 31.91 ms.

✓ b ≈ −31.91, a ≈ 541.75; prediction ≈ 350.3 ms
05Unit 2 · Residuals & r²

(a) ŷ = 630 − 44.05(8) = 630 − 352.4 = 277.6 ms

Residual = observed − predicted = 290 − 277.6 = +12.4 ms

(b) The student's actual reaction time was 12.4 ms higher than what the model predicted for someone sleeping 8 hours.

(c) r² = 0.608 means approximately 60.8% of the variation in reaction time is explained by the linear relationship with hours of sleep. The remaining 39.2% is due to other factors.

✓ Residual = +12.4 ms; r² = 60.8% of variation explained
06Unit 3 · Experiment Design

(a) Explanatory variable: whether a student receives the tutoring program. Response variable: SAT score (post-program).

(b) This is an experiment because the researcher actively assigns students to the tutoring program or control group (imposes a treatment). Causation can be inferred.

(c) Random assignment helps ensure that the two groups are roughly equivalent on potential confounding variables (prior knowledge, motivation, socioeconomic status) before the treatment, allowing any difference in outcomes to be attributed to the tutoring program rather than pre-existing differences.

✓ Experiment; random assignment controls for confounding variables
07Unit 3 · Sampling Methods & Bias

(a) This is systematic random sampling (every 5th student).

(b) Potential bias: Undercoverage — students who do not eat in the cafeteria (e.g., those who bring lunch or leave campus) are excluded. Students who use the cafeteria may have more positive opinions about the food, causing the results to overestimate satisfaction.

(c) Better method: Use a simple random sample from the complete student roster (a list of all students). This gives every student an equal chance of being selected, reducing undercoverage bias and making the results more representative of all students.

✓ Systematic sampling; undercoverage bias; SRS from school roster is better
08Unit 4 · Probability Rules

(a) P(M ∪ E) = P(M) + P(E) − P(M ∩ E) = 0.45 + 0.30 − 0.18 = 0.57

(b) P(E|M) = P(E ∩ M) / P(M) = 0.18 / 0.45 = 0.40

(c) If independent: P(M) × P(E) = 0.45 × 0.30 = 0.135 ≠ 0.18 = P(M ∩ E).

Since P(M ∩ E) ≠ P(M)·P(E), Math and English enrollment are NOT independent.

✓ P(M∪E)=0.57; P(E|M)=0.40; NOT independent (0.135≠0.18)
09Unit 4 · Expected Value of a Discrete RV

(a) Sum of probabilities: 0.30 + 0.25 + 0.35 + 0.10 = 1.00 ✓. All P(x) ≥ 0 ✓. Valid distribution.

(b) E(X) = (−5)(0.30) + (0)(0.25) + (10)(0.35) + (20)(0.10)

= −1.50 + 0 + 3.50 + 2.00 = $4.00

(c) Since E(X) = $4.00 > 0, on average a player wins $4 per game. The game is favorable for the player and is worth playing — in the long run, the player expects to profit.

✓ Probabilities sum to 1; E(X) = $4.00; game favors the player
10Unit 4 · Binomial Distribution

(a) BINS check: Binary (correct/incorrect) ✓; Independent (each question guessed independently) ✓; Number of trials fixed (n = 10) ✓; Same p = 0.25 for each question ✓. Conditions met.

(b) P(X = 3) = C(10,3) · (0.25)³ · (0.75)⁷

= 120 · 0.015625 · 0.133484 ≈ 0.2503

There is approximately a 25.0% chance of guessing exactly 3 correct.

(c) μ = np = 10 × 0.25 = 2.5 expected correct answers.

σ = √(np(1−p)) = √(10 × 0.25 × 0.75) = √1.875 ≈ 1.369

✓ BINS verified; P(X=3) ≈ 0.2503; μ=2.5, σ≈1.369
11Unit 5 · Central Limit Theorem

(a) Shape: Although the population is skewed right, because n = 49 ≥ 30, the Central Limit Theorem guarantees the sampling distribution of x̄ is approximately normal.

Mean: μ(x̄) = μ = 185 g

Standard Error: SE = σ/√n = 30/√49 = 30/7 ≈ 4.286 g

(b) z = (190 − 185) / 4.286 = 5/4.286 ≈ 1.167

P(x̄ > 190) = P(Z > 1.167) ≈ 1 − 0.8784 ≈ 0.122 (approximately 12.2%)

✓ x̄ ~ N(185, 4.286) by CLT; P(x̄>190) ≈ 0.122
12Unit 5 · Sampling Distribution of p̂

(a) Conditions:

Random: random sample ✓

10% condition: n = 80 ≤ 0.10 × 2000 = 200 ✓

Large Counts: np = 80(0.65) = 52 ≥ 10 ✓; n(1−p) = 80(0.35) = 28 ≥ 10 ✓

(b) μ(p̂) = 0.65; σ(p̂) = √(0.65×0.35/80) = √(0.002844) ≈ 0.05333

(c) z = (0.60 − 0.65) / 0.05333 = −0.05 / 0.05333 ≈ −0.938

P(p̂ < 0.60) = P(Z < −0.938) ≈ 0.174 (approximately 17.4%)

✓ All conditions met; σ(p̂)≈0.0533; P(p̂<0.60)≈0.174
13Unit 6 · One-Sample z-Interval for Proportion

p̂ = 87/150 = 0.58

(a) Conditions: Random sample ✓; 10% (150 ≤ 10% of all voters) ✓; Large Counts: np̂ = 87 ≥ 10 ✓, n(1−p̂) = 63 ≥ 10 ✓

(b) SE = √(0.58×0.42/150) = √(0.001624) ≈ 0.04030

CI = 0.58 ± 1.960 × 0.04030 = 0.58 ± 0.0790

95% CI: (0.501, 0.659)

(c) We are 95% confident that the true proportion of voters who support the ordinance is between 50.1% and 65.9%. Since the entire interval is above 0.50, this provides evidence that a majority of voters support the ordinance.

✓ p̂=0.58; 95% CI = (0.501, 0.659); majority supported
14Unit 6 · One-Sample z-Test for Proportion

p̂ = 42/120 = 0.35

(a) H₀: p = 0.40 (40% of boxes have a prize); Hₐ: p < 0.40 (fewer than 40% have prizes) — one-sided, left-tailed.

(b) Conditions: Random ✓; 10% (120 ≤ 10% of all boxes) ✓; Large Counts under H₀: np₀=48≥10 ✓, n(1−p₀)=72≥10 ✓

(c) z = (0.35 − 0.40) / √(0.40×0.60/120) = −0.05 / √(0.002) = −0.05 / 0.04472 ≈ −1.118

P-value = P(Z < −1.12) ≈ 0.131

(d) Since P-value = 0.131 > α = 0.05, we fail to reject H₀. There is not sufficient evidence at the 5% significance level that fewer than 40% of boxes contain a prize.

✓ z≈−1.118; P-value≈0.131 > 0.05; fail to reject H₀
15Unit 7 · One-Sample t-Interval for Mean

(a) Conditions: Random sample of 20 cereals ✓; 10% condition (20 cereals ≤ 10% of all cereals) ✓; Population approximately normal (stated) ✓

(b) SE = s/√n = 3.8/√20 = 3.8/4.472 ≈ 0.8497

CI = 14.3 ± 1.729 × 0.8497 = 14.3 ± 1.469

90% CI: (12.83, 15.77) grams

(c) We are 90% confident that the true mean sugar content of breakfast cereals is between 12.83 and 15.77 grams.

✓ 90% CI = (12.83, 15.77) grams
16Unit 7 · Paired t-Test

(a) A paired t-test is appropriate because the measurements are not independent — each client provides both a before and after measurement. The data are naturally paired by individual.

(b) Differences (Before − After): 6, 4, 1, 6, 5, 3, 7, 2. d̄ = 34/8 = 4.875, sd ≈ 2.167 (given).

(c) H₀: μd = 0 (no reduction in heart rate); Hₐ: μd > 0 (heart rate decreases)

t = d̄ / (sd/√n) = 4.875 / (2.167/√8) = 4.875 / 0.7659 ≈ 6.364

Since P-value ≈ 0.0003 < α = 0.05, we reject H₀. There is convincing evidence that the gym program significantly reduces resting heart rate.

✓ d̄=4.875; t≈6.364; P≈0.0003 < 0.05 → reject H₀
17Unit 8 · Chi-Square Goodness of Fit

(a) H₀: The offspring follow the 9:3:3:1 ratio (the model fits). Hₐ: The offspring do NOT follow the 9:3:3:1 ratio.

(b) All expected counts (90, 30, 30, 10) ≥ 5 ✓

χ² = (92−90)²/90 + (28−30)²/30 + (25−30)²/30 + (15−10)²/10 = 4/90 + 4/30 + 25/30 + 25/10 = 0.044 + 0.133 + 0.833 + 2.500 = 3.511

(c) Since χ² = 3.511 < χ²_critical = 7.815 (α = 0.05, df = 3), we fail to reject H₀. There is not convincing evidence that the offspring distribution differs from the predicted 9:3:3:1 ratio.

✓ χ² ≈ 3.511 < 7.815; fail to reject H₀; data consistent with 9:3:3:1
18Unit 9 · Inference for Slope

(a) ŷ = 45.2 + 3.84x. Slope interpretation: For each additional $1,000 spent on advertising, monthly sales are predicted to increase by approximately $3,840.

(b) t = b / SEb = 3.84 / 1.12 ≈ 3.429

(c) P-value = 0.003 < α = 0.01. We reject H₀: β = 0. There is convincing evidence of a positive linear relationship between advertising spend and monthly sales.

(d) 95% CI: b ± t*·SEb = 3.84 ± 2.086(1.12) = 3.84 ± 2.336

95% CI for β: (1.504, 6.176) (thousands of dollars)

Since 0 is not in the interval, this confirms a positive linear relationship.

✓ t≈3.429; P=0.003 < 0.01; 95% CI = (1.504, 6.176)
19Unit 7 · Two-Sample t-Test

(a) H₀: μA − μB = 0 (no difference in mean scores); Hₐ: μA − μB > 0 (Method A produces higher mean scores) — one-sided, right-tailed.

(b) Conditions: Two independent random samples ✓; Both n ≥ 30, so CLT guarantees approximate normality ✓

(c) SE = √(s²A/nA + s²B/nB) = √(9.2²/30 + 11.6²/35) = √(2.8213 + 3.8423) = √6.6636 ≈ 2.581

t = (83.4 − 78.1) / 2.581 = 5.3 / 2.581 ≈ 2.054

P-value ≈ 0.031 < α = 0.05 → Reject H₀.

There is convincing evidence at the 5% level that Method A produces a higher mean exam score than Method B.

✓ t≈2.054; P≈0.031 < 0.05 → reject H₀; Method A is better
20Unit 8 · Chi-Square Test of Independence

(a) H₀: Sleep quality and exercise habits are independent (no association). Hₐ: Sleep quality and exercise habits are NOT independent (there is an association).

(b) Expected counts [E = (row total × col total) / n]:

• Exercises / Good Sleep: (100 × 120)/200 = 60

• Exercises / Poor Sleep: (100 × 80)/200 = 40

• No Exercise / Good Sleep: (100 × 120)/200 = 60

• No Exercise / Poor Sleep: (100 × 80)/200 = 40

All expected counts ≥ 5 ✓

χ² = (72−60)²/60 + (28−40)²/40 + (48−60)²/60 + (52−40)²/40 = 144/60 + 144/40 + 144/60 + 144/40 = 2.4 + 3.6 + 2.4 + 3.6 = 12.0

(d) χ² = 12.0 > χ²_critical = 3.841 (α = 0.05, df = 1) → Reject H₀. There is convincing evidence of an association between regular exercise and sleep quality.

✓ All E≥5; χ²=12.0 > 3.841 → reject H₀; association exists