Statistics
Box Plots & Correlation

Master the concepts, learn from examples, and conquer exam questions — with instant feedback

BOX-AND-WHISKER QUARTILES & IQR OUTLIER TEST CORRELATION REGRESSION SPEARMAN'S rₛ CASIO fx GUIDE

📚 Core Concepts & Examples

Click each topic to expand. Study these before attempting the questions.

📦 Concept 1 — Box-and-Whisker Diagrams
DEFINITION A box-and-whisker diagram (box plot) displays the five-number summary of a dataset:
Minimum · Q1 (lower quartile) · Median (Q2) · Q3 (upper quartile) · Maximum
FIVE-NUMBER SUMMARY \[\text{Min} \quad Q_1 \quad \text{Median} \quad Q_3 \quad \text{Max}\] \[\text{IQR} = Q_3 - Q_1\]

How to read a box plot:

  • The box spans from Q1 to Q3 — this contains the middle 50% of data.
  • The vertical line inside the box marks the median (Q2).
  • The whiskers extend to the minimum and maximum (excluding outliers).
  • Any point beyond the outlier fences is shown as an isolated dot.
WORKED EXAMPLE
Data (ordered): 0.24, 0.26, 0.27, 0.28, 0.29, 0.31, 0.35, 0.42, 0.46
Solution:
n = 9, so:
• Median = 5th value = 0.29
• Q1 = median of lower half (0.24,0.26,0.27,0.28) = (0.26+0.27)/2 = 0.265
• Q3 = median of upper half (0.31,0.35,0.42,0.46) = (0.35+0.42)/2 = 0.385
• IQR = 0.385 − 0.265 = 0.12
🖩 CASIO fx — Finding Quartiles
MODE2 (STAT) → 1 (1-VAR)
Enter data in List, then press SHIFT + 1 (STAT menu)
5 (Var) → scroll to find Q1, Med, Q3
💡 Use arrow to scroll through statistics
🎯 Concept 2 — Outlier Detection
OUTLIER FENCES A data value \(x\) is an outlier if it lies beyond the inner fences:
OUTLIER TEST (IQR METHOD) \[\text{Lower fence} = Q_1 - 1.5 \times \text{IQR}\] \[\text{Upper fence} = Q_3 + 1.5 \times \text{IQR}\] A value is an outlier if \(x < \text{Lower fence}\) or \(x > \text{Upper fence}\)
WORKED EXAMPLE — Is 0.46 an outlier?
Given: Q1 = 0.27, Q3 = 0.35, IQR = 0.08
Upper fence = 0.35 + 1.5(0.08) = 0.35 + 0.12 = 0.47
Since 0.46 < 0.47, 0.46 is NOT an outlier. ✓
(In fact 0.46 is the maximum — the whisker extends to it.)
🖩 CASIO fx — Outlier Check
After finding Q1, Q3 manually:
Calculate: Upper fence = Q3 + 1.5 × IQR
0.35 + 1.5 × 0.08 =
Compare result against the suspect value.
📊 Concept 3 — Distribution Shape from Box Plots
SKEWNESS The position of the median within the box and the lengths of the whiskers reveal the shape.
ShapeMedian positionWhisker lengths
SymmetricCentre of boxEqual
Positive skew (right)Closer to Q1Right whisker longer
Negative skew (left)Closer to Q3Left whisker longer
WORKED EXAMPLE
A box plot has Q1 = 0.27, Median = 0.28, Q3 = 0.35.
The median is closer to Q1, and the right whisker is longer.
→ This is positively skewed (skewed right). The long tail extends to higher values.
📈 Concept 4 — Pearson's Correlation Coefficient (r)
DEFINITION Pearson's r measures the strength and direction of a linear relationship between two variables. Always: \(-1 \leq r \leq 1\)
INTERPRETATION GUIDE \[r = 1\text{: perfect positive} \quad r = -1\text{: perfect negative} \quad r = 0\text{: no linear correlation}\]
|r| > 0.75: Strong    0.5 < |r| ≤ 0.75: Moderate    |r| ≤ 0.5: Weak
WORKED EXAMPLE
For homework tasks vs math marks data, r = 0.412.
|0.412| ≤ 0.5 → Weak positive correlation
As homework tasks submitted increases, math mark tends to increase slightly, but the relationship is weak.
🖩 CASIO fx — Finding Pearson's r
MODE3 (STAT) → 2 (A+BX)
Enter x-values in column 1, y-values in column 2
Press SHIFT 15 (Reg) → 3 → get r
💡 Make sure DiagnosticON is set: SHIFT MODEDiagON
📉 Concept 5 — Regression Line & Interpretation
REGRESSION EQUATION \[y = mx + c\] where \(m\) = slope (gradient), \(c\) = y-intercept
INTERPRETING THE SLOPE m "For every 1-unit increase in x, y increases/decreases by m units on average."

VALID USE OF REGRESSION LINE Only valid to predict within the range of the data (interpolation). Predicting outside the data range is extrapolation — unreliable.
WORKED EXAMPLE
Regression line: y = 0.158x + 4.26 (x = homework tasks, y = math mark)
Slope m = 0.158: For each additional homework task submitted, the math mark increases by 0.158 marks on average.

If x = 0 (no tasks): Predicted mark = 4.26.
But x = 0 is outside the data range (x ranges from 5 to 25), so this prediction is not reliable (extrapolation).
🖩 CASIO fx — Regression Line
After entering bivariate data in STAT mode (A+BX):
SHIFT 15 (Reg) → 1 for A (= c intercept)
SHIFT 15 (Reg) → 2 for B (= m slope)
To predict: type x-value → SHIFT 155 (ŷ) → =
🏅 Concept 6 — Spearman's Rank Correlation (rₛ)
SPEARMAN'S FORMULA \[r_s = 1 - \frac{6\sum d_i^2}{n(n^2-1)}\] where \(d_i\) = difference in ranks for pair \(i\), \(n\) = number of pairs
TIED RANKS If two values tie, assign each the average of the ranks they would have occupied.
e.g. two values tied for 2nd and 3rd place each get rank 2.5
WORKED EXAMPLE — Mini dataset
x: 5, 6, 10   y: 4, 4, 8
Rank x: 1, 2, 3   Rank y: 1.5, 1.5, 3  (tie → 1.5)
d: −0.5, 0.5, 0   d²: 0.25, 0.25, 0   Σd² = 0.5
\[r_s = 1 - \frac{6(0.5)}{3(9-1)} = 1 - \frac{3}{24} = 1 - 0.125 = 0.875\] Strong positive rank correlation.
🖩 CASIO fx — Spearman's rₛ
There is no direct rₛ button. Method:
1. Write ranked data as new x, y columns
2. Enter ranked data into STAT → A+BX mode
3. Read off Pearson's r for the ranked data = rₛ
Or: Calculate Σd² manually, then use the formula above with SHIFT key for arithmetic.

🧪 Practice Questions

20 exam-style questions. Select any question below, or scroll through. Submit each answer to see instant feedback.

🎓 Results

0Correct
0Wrong
Time
Percentage