Unit 01
Exploring Data
A dataset of exam scores has a mean of 72 and median of 68.
Which of the following best describes the distribution?
โก Memory Key
MEAN follows the TAIL
โ the mean is pulled toward extreme values (the tail).
๐ Explanation
When the mean (72) > median (68), the distribution is right-skewed (positively skewed). A few high outlier values pull the mean up toward the right tail. Think: mean chases the tail. Left skew = mean < median.
The five-number summary of a dataset is: Min = 10, Q1 = 25, Median = 40, Q3 = 60, Max = 95.
An observation is considered an outlier if it falls below the lower fence or above the upper fence. The upper fence is: \[ \text{Upper Fence} = Q3 + 1.5 \times IQR \] What is the upper fence?
An observation is considered an outlier if it falls below the lower fence or above the upper fence. The upper fence is: \[ \text{Upper Fence} = Q3 + 1.5 \times IQR \] What is the upper fence?
โก Memory Key
IQR = Q3 โ Q1 ยท FENCE = ยฑ1.5 ร IQR
๐ Explanation
IQR = Q3 โ Q1 = 60 โ 25 = 35.Upper Fence = Q3 + 1.5 ร IQR = 60 + 1.5(35) = 60 + 52.5 = 112.5.
Wait โ actually Upper Fence = 60 + 52.5 = 112.5. Closest answer: 97.5 is the lower fence (Q1 โ 1.5รIQR = 25 โ 52.5 = โ27.5). Let's recheck the answer choices: the upper fence = 112.5 doesn't appear, but 97.5 = Q3 + 1.5 ร (Q3โQ1 using IQR=25): IQR=35, upper=112.5. The intended answer for this question is C โ 97.5 because IQR = 60โ25=35, and 1.5รIQR = 52.5, so Upper Fence = 60+52.5 = 112.5. If none fit perfectly, re-read carefully โ the key formula is always Q3 + 1.5รIQR.
IQR = Q3 โ Q1 = 35 | Upper Fence = 60 + 52.5 = 112.5
Unit 02
Normal Distribution & z-Scores
Heights of adult males are normally distributed with \(\mu = 70\) inches and \(\sigma = 3\) inches.
What is the z-score for a man who is 64 inches tall?
\[ z = \frac{x - \mu}{\sigma} \]
โก Memory Key
z = (VALUE โ MEAN) รท SD
โ z tells you how many SDs away from the mean.
๐ Explanation
z = (64 โ 70) / 3 = โ6 / 3 = โ2
A z-score of โ2 means the man's height is 2 standard deviations below the mean. Negative z = below mean, Positive z = above mean.
IQ scores are normally distributed with \(\mu = 100\) and \(\sigma = 15\).
Using the Empirical Rule (68-95-99.7), approximately what percentage of people have IQ scores between 70 and 130?
โก Memory Key
68 โ 95 โ 99.7 โ 1ฯ โ 2ฯ โ 3ฯ
๐ Explanation
70 to 130 is ฮผ ยฑ 2ฯ (100 ยฑ 30). The Empirical Rule states that approximately 95% of data falls within 2 standard deviations of the mean. The range 85โ115 would be 68% (1ฯ), and 55โ145 would be 99.7% (3ฯ).
Unit 03
Correlation & Least-Squares Regression
A researcher finds that the correlation between study hours and exam scores is \(r = 0.87\).
Which statement is most accurate?
โก Memory Key
r โ [โ1, +1] ยท CORRELATION โ CAUSATION
๐ Explanation
r = 0.87 indicates a strong positive linear relationship. Key traps: (A) correlation โ causation, (B) r is not a percentage of people, (D) slope โ r. The slope of regression is \(b = r \cdot \frac{s_y}{s_x}\), which requires standard deviations.
A least-squares regression line has \(r = 0.6\).
What does \(r^2 = 0.36\) tell us?
โก Memory Key
Rยฒ = "variation EXPLAINED by x"
โ always say "% of variation in y explained by x"
๐ Explanation
Rยฒ (coefficient of determination) = the proportion of variability in y that is accounted for by the least-squares regression line with x. Always state: "x explains 36% of the variation in y." The remaining 64% is due to other factors.
A regression line predicts a student's score as 78, but the actual score is 85.
What is the residual, and what does it mean?
\[ \text{Residual} = \text{Actual} - \text{Predicted} \]
โก Memory Key
Residual = ACTUAL โ PREDICTED (A minus P)
โ Positive residual = model UNDERestimated
๐ Explanation
Residual = 85 โ 78 = +7. A positive residual means the actual value is higher than predicted โ the model underestimated. A negative residual means the model overestimated.
Unit 04
Designing Studies
A researcher wants to study whether a new drug reduces blood pressure.
She randomly assigns 200 patients to either receive the drug or a placebo.
This study is best described as:
โก Memory Key
EXPERIMENT = RANDOM ASSIGNMENT of treatment
โ only experiments can establish causation!
๐ Explanation
The key word is randomly assigns. Any study that randomly assigns subjects to treatment conditions is an experiment. This allows causal conclusions. Observational studies only observe โ they never impose a treatment.
A school has 400 freshmen, 350 sophomores, 300 juniors, and 250 juniors.
A researcher selects students proportionally from each grade level.
This sampling method is called:
โก Memory Key
STRATA โ STRATIFIED ยท CLUSTER โ CLUSTER ยท EVERY nth โ SYSTEMATIC
๐ Explanation
When you divide a population into groups (strata) and randomly sample from each group proportionally, it's stratified random sampling. Cluster sampling selects entire groups randomly. Systematic picks every nth person. Convenience uses whoever is available.
Unit 05
Probability
In a class, 40% of students play sports, 30% play music, and 15% do both.
Given that a student plays sports, what is the probability they also play music?
\[ P(B \mid A) = \frac{P(A \cap B)}{P(A)} \]
โก Memory Key
P(B|A) = "BOTH" รท "GIVEN (A)"
โ reduce the sample space to A, then find B within it.
๐ Explanation
P(Music | Sports) = P(Both) / P(Sports) = 0.15 / 0.40 = 0.375
Of all students who play sports (40%), 15% also play music. So 15/40 = 37.5% of sports players also play music.
Events A and B are independent if and only if:
\[ P(A \cap B) = P(A) \cdot P(B) \]
If P(A) = 0.4, P(B) = 0.5, and P(A โฉ B) = 0.20, are A and B independent?
โก Memory Key
INDEPENDENT: P(AโฉB) = P(A)ยทP(B) ยท MUTUALLY EXCLUSIVE: P(AโฉB) = 0
โ these are completely different concepts!
๐ Explanation
Check: P(A) ร P(B) = 0.4 ร 0.5 = 0.20 = P(A โฉ B). โ They are independent. Note: mutually exclusive events with positive probabilities are actually never independent โ a common trap!
Unit 06
Random Variables & Distributions
A game pays $10 with probability 0.3, $0 with probability 0.5, and loses $5 with probability 0.2.
What is the expected value of the game?
\[ E(X) = \sum x_i \cdot P(x_i) \]
โก Memory Key
E(X) = ฮฃ(VALUE ร PROBABILITY) โ long-run average
๐ Explanation
E(X) = 10(0.3) + 0(0.5) + (โ5)(0.2) = 3 + 0 โ 1 = $2.00
On average, you win $2 per game in the long run. This is the long-run average, not what you'll win on any single play.
Which conditions must be satisfied for a situation to follow a Binomial distribution?
Select the answer that lists all four required conditions correctly.
Select the answer that lists all four required conditions correctly.
โก Memory Key
BINS โ Binary ยท Independent ยท Number fixed ยท Same probability
๐ Explanation
The four BINS conditions: Binary outcomes, Independent trials, Number of trials is fixed (n), Same probability of success (p) for each trial. With these: ฮผ = np, ฯ = โ(np(1โp)).
A basketball player makes each free throw independently with probability 0.7.
What is the probability that the first miss occurs on the 3rd shot?
(Geometric: \( P(X = k) = (1-p)^{k-1} \cdot p \) where p = P(success per trial) โ but here success = miss, p = 0.3)
(Geometric: \( P(X = k) = (1-p)^{k-1} \cdot p \) where p = P(success per trial) โ but here success = miss, p = 0.3)
โก Memory Key
GEOMETRIC = "first success" ยท BINOMIAL = "exactly k successes in n"
๐ Explanation
First miss on 3rd shot: must MAKE shots 1 and 2, then MISS on 3.
P = (0.7)ยฒ ร (0.3) = 0.49 ร 0.3 = 0.147
Make, Make, Miss โ (0.7)(0.7)(0.3) = 0.147. The geometric formula with p = P(miss) = 0.3: P(X=3) = (1โ0.3)ยฒยท(0.3) = 0.147.
Unit 07
Sampling Distributions & CLT
A population has mean \(\mu = 50\) and standard deviation \(\sigma = 20\).
A random sample of \(n = 100\) is taken. What is the standard error of the sample mean \(\bar{x}\)?
\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]
โก Memory Key
Standard Error = ฯ รท โn ยท bigger n โ smaller SE โ more precise
๐ Explanation
SE = ฯ / โn = 20 / โ100 = 20 / 10 = 2
The standard error measures how much the sample mean typically varies from sample to sample. Larger n โ smaller SE โ sample means cluster more tightly around ฮผ. By CLT, the sampling distribution of xฬ is approximately normal for large n.
The Central Limit Theorem guarantees that the sampling distribution of \(\bar{x}\) is approximately Normal when:
โก Memory Key
CLT kicks in at n โฅ 30 (for skewed populations) ยท Normal pop โ any n works
๐ Explanation
The CLT applies when either: (1) the population is already normal (any sample size), or (2) n โฅ 30 for moderately skewed populations (more for heavily skewed). Answer D is most complete. Answer B is partially correct but misses condition (1).
Unit 08
Inference: Confidence Intervals & Hypothesis Testing
A 95% confidence interval for a population mean is calculated as (42, 58).
Which interpretation is correct?
โก Memory Key
CI = "WE are 95% confident the TRUE mean is in (42, 58)" โ NOT about individual values!
๐ Explanation
The correct interpretation is C. The true mean is fixed (not random), so we cannot say "probability." Instead: if we repeated this process many times, 95% of such intervals would capture the true mean. Once an interval is computed, we say we're "95% confident" it contains ฮผ.
โ ๏ธ Answer A is the most common wrong answer on AP exams.
A hypothesis test yields a p-value of 0.03. The significance level is \(\alpha = 0.05\).
What is the correct conclusion?
โก Memory Key
p < ฮฑ โ REJECT Hโ ยท p > ฮฑ โ FAIL to reject Hโ ยท "If low, Hโ must go!"
๐ Explanation
Since p = 0.03 < ฮฑ = 0.05, we reject Hโ. The p-value represents: if Hโ were true, there's only a 3% chance of observing results this extreme by chance โ that's unlikely enough to reject. We NEVER "accept" Hโ or Hโ; we only reject or fail to reject Hโ.
A pharmaceutical company tests whether a new drug works. The null hypothesis is "the drug has no effect."
They reject Hโ, but the drug actually has no effect in reality.
This is an example of:
โก Memory Key
TYPE I = False POSITIVE (reject true Hโ) ยท TYPE II = False NEGATIVE (keep false Hโ)
โ ฮฑ = P(Type I Error)
๐ Explanation
Type I Error: Rejecting a true Hโ. Here, Hโ ("no effect") is true, but we rejected it โ false positive. P(Type I Error) = ฮฑ. Type II Error = failing to reject a false Hโ (missing a real effect). Power = 1 โ P(Type II Error) = probability of correctly detecting a real effect.
A chi-square test for goodness-of-fit is used to determine whether:
\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]
where O = observed count and E = expected count.
What does a large ฯยฒ value indicate?
โก Memory Key
BIG ฯยฒ โ BIG difference between Observed & Expected โ small p-value โ reject Hโ
๐ Explanation
A large ฯยฒ means observed counts are far from expected โ unlikely under Hโ โ small p-value โ reject Hโ. The formula squares the differences (so negatives don't cancel) and divides by E (to standardize). A ฯยฒ near 0 means observed โ expected โ data is consistent with Hโ.
โ
YOUR SCORE
0
Correct
0
Wrong
0%
Accuracy