Statistics Mastery

Distribution Shapes · Scatter Plots · Correlation · Linear Regression

0Answered
20Total
0Correct
⏱ Elapsed Time
00:00
Distribution Shapes Symmetric · Skewed Right · Skewed Left · Bimodal · Uniform

Distribution Shape — Key Concepts

Example Household income in the US → Skewed Right (most earn $40k–$80k, but a few billionaires create a long right tail)
Q 01 Distribution Shapes
Most students at a school score between 85 and 100 on their final exam, but a few students score very low (near 0–30). Which distribution shape best describes the exam scores?
📝 Explanation The correct answer is D — Skewed Left. Most data (high scores 85–100) is on the right side, but the tail stretches to the left where a few very low scores exist. The mean is pulled left by those outliers: Mean < Median.
Q 02 Distribution Shapes
A survey records how many minutes people exercise per week. Most people don't exercise at all (0 min), but some exercise up to 600 minutes. What is the likely shape?
📝 Explanation A — Skewed Right. Most people cluster at 0 (the left), and the tail extends far to the right for the heavy exercisers. This is the classic skewed-right pattern: Mean > Median > Mode.
Q 03 Distribution Shapes
A music festival attracts both young adults (ages 18–25) and middle-aged fans (ages 45–55) in roughly equal numbers, with few attendees in between. What distribution shape does this represent?
📝 Explanation B — Bimodal. Two separate peaks exist — one around age 20, one around age 50 — with a valley in between. Bimodal distributions signal two distinct sub-groups in the data.
Q 04 Distribution Shapes — Mean vs Median
For a dataset that is skewed right, which of the following is always true about the relationship between the mean and median?
📝 Explanation C — Mean > Median. In a right-skewed distribution, the extreme high values in the right tail pull the mean upward more than the median (which is resistant to outliers). Key rule: Skewed Right → Mean > Median | Skewed Left → Mean < Median.
Q 05 Distribution Shapes
A teacher assigns a fair six-sided die roll as each student's grade (1–6). Every outcome is equally likely. Which distribution shape describes the grades?
📝 Explanation D — Uniform. Each value (1 through 6) has exactly 1/6 probability. The histogram is perfectly flat with no peaks or tails. A uniform distribution has no mode.
Scatter Plots & Correlation Direction · Strength · Outliers · Correlation Coefficient r

Scatter Plot Correlation — Key Concepts

Example Hours studied (x) vs test score (y) → Strong positive correlation (r ≈ 0.92)
Q 06 Correlation Direction
A scatter plot shows that as the number of absences from school increases, the student's final grade decreases. What type of correlation is this?
📝 Explanation B — Negative Correlation. As x (absences) goes up, y (grade) goes down. The pattern slopes downward. This is negative (inverse) correlation. The r-value would be negative (e.g., r ≈ −0.85).
Q 07 Correlation Coefficient r
Which value of the correlation coefficient r indicates the strongest linear relationship?
📝 Explanation D — r = −0.95. Strength is measured by |r|. Comparing: |0.75| = 0.75, |−0.50| = 0.50, |0.10| = 0.10, |−0.95| = 0.95 — the largest. The sign only indicates direction, not strength.
Q 08 Correlation vs Causation
A study finds a strong positive correlation (r = 0.91) between ice cream sales and drowning rates. What is the best conclusion?
📝 Explanation C — Lurking variable. Hot weather increases both ice cream sales AND swimming (thus drowning risk). This is the classic example: Correlation does not imply causation. A lurking (confounding) variable drives both variables simultaneously.
Q 09 Scatter Plot — Outliers
In a scatter plot showing hours of study vs. quiz score, most points follow a strong positive trend, but one student studied 1 hour and scored 98. This point is called a(n):
📝 Explanation B — Outlier. A point that does not follow the general pattern of the rest of the data is an outlier. It falls far from the regression line and can significantly affect the correlation coefficient r and the line of best fit.
Q 10 Interpreting r²
A linear regression for sleep hours vs. quiz score gives r = 0.98. What percentage of the variation in quiz scores is explained by the number of hours of sleep?
📝 Explanation C — 96.04%. The coefficient of determination is r² = (0.98)² = 0.9604 = 96.04%. This means 96.04% of the variability in quiz scores can be explained by sleep hours. The remaining 3.96% is due to other factors.
Line of Best Fit (Linear Regression) Equation · Slope · Intercept · Prediction

Linear Regression — Key Concepts

Example — Sleep & Quiz Score Data: (1,52) (2,58) (3,63) (4,67) (5,74) (6,79) (7,84) (8,88)
Line of best fit:
ŷ = 5.20x + 47.21
Slope = 5.20 → each extra hour of sleep predicts 5.20 more points on the quiz.
Q 11 Interpreting Slope
The line of best fit for sleep hours (x) and quiz score (y) is ŷ = 5.20x + 47.21. What is the correct interpretation of the slope 5.20?
📝 Explanation B. The slope always means: "for each 1-unit increase in x, y changes by [slope] units." Template: "For each additional [x-unit], the predicted [y] increases/decreases by [|slope|]."
Q 12 Prediction Using Regression
Using the equation ŷ = 5.20x + 47.21, predict the quiz score for a student who slept 6 hours.
📝 Explanation C — 78.41. Substitute x = 6:
ŷ = 5.20(6) + 47.21 = 31.20 + 47.21 = 78.41
The actual data value at x = 6 is 79, so our prediction is very close!
Q 13 Y-Intercept Interpretation
In the equation ŷ = 5.20x + 47.21, what does the y-intercept 47.21 represent?
📝 Explanation A. The y-intercept is always the predicted y when x = 0. Here: a student sleeping 0 hours is predicted to score 47.21. Note: this may not be realistic (extrapolation below data range), but that is the mathematical interpretation.
Q 14 Residuals
A student slept 4 hours and scored 67 on the quiz. The regression equation ŷ = 5.20x + 47.21 predicts 68.01. What is the residual?
📝 Explanation C — −1.01. Residual = Actual − Predicted:
e = y − ŷ = 67 − 68.01 = −1.01
A negative residual means the actual value is below the regression line — the model slightly overestimated.
Q 15 Extrapolation Warning
The data for sleep vs. quiz score covers x = 1 to 8 hours. A teacher uses the model ŷ = 5.20x + 47.21 to predict the score for 20 hours of sleep. This is an example of:
📝 Explanation B — Extrapolation. x = 20 is far outside the observed range of 1–8. Using the regression line here is unreliable because we don't know if the linear relationship continues. Predicted score would be 5.20(20) + 47.21 = 151.21 — impossible on a 100-point quiz!
Mixed Challenge Problems Combining Concepts — High Difficulty
Q 16 Reading Scatter Plot Data
Use the table below. A student wants to find the mean quiz score.
Hours of Sleep (x)12345678
Quiz Score (y)5258636774798488
What is the mean quiz score?
📝 Explanation C — 70.625.
x̄ = (52+58+63+67+74+79+84+88) ÷ 8 = 565 ÷ 8 = 70.625
Q 17 Matching r to Scatter Plot Description
A scatter plot of temperature (°F) vs. hot cocoa sales shows points scattered with no clear pattern — the points look like a random cloud. Which r value fits best?
📝 Explanation B — r = 0.05. A random cloud with no pattern means no linear correlation → r ≈ 0. r = 0.05 is closest to zero. (Note: in reality, temperature and hot cocoa sales have a strong negative correlation — but the question describes a random cloud specifically.)
Q 18 Regression Line — New Scenario
A regression line for advertising spend (x, in $thousands) vs. sales revenue (y, in $thousands) is ŷ = 8.5x + 12. A company spends $5,000 on advertising. What is the predicted revenue?
📝 Explanation B — $54,500. x = 5 (thousands). Substitute:
ŷ = 8.5(5) + 12 = 42.5 + 12 = 54.5 thousand = $54,500
Don't forget: x = 5 because the units are already in $thousands.
Q 19 Distribution Shape from Statistics
A dataset has Mean = 45, Median = 62, and Mode = 70. What is the most likely shape of this distribution?
📝 Explanation C — Skewed Left. Rule: Mean < Median < Mode → Skewed Left. Here 45 < 62 < 70 confirms the distribution is skewed left (the mean is dragged down by low outliers in the left tail).
Q 20 Comprehensive — All Topics
A researcher finds r = −0.87 between stress level (x) and sleep quality (y). Which pair of statements is BOTH correct?
📝 Explanation B. |r| = 0.87 → strong (≥ 0.8). Negative sign → as stress increases, sleep quality decreases (negative direction). We say "tends to predict" — not "causes" — because correlation ≠ causation. Option D is wrong because it reverses the cause-effect claim incorrectly.

Quiz Complete ✦

Correct
Wrong
Time
✗ Questions You Got Wrong — Review These!