📊 Topic 1 — Measures of Central Tendency & Spread
Mean · Median · Mode · Range · Variance · Standard Deviation · IQR
- Mean — arithmetic average: add all values ÷ count
- Median — middle value when data is ordered. If \(n\) values: position = \(\frac{n+1}{2}\)
- Mode — most frequently occurring value(s)
- For grouped frequency tables, estimated mean = \(\dfrac{\sum f \cdot x}{\sum f}\)
Enter all data values with EXE after each. Then press AC → SHIFT + 1 → 5 (Var) → 2 for \(\bar{x}\) (mean), 3 for \(\sigma x\) (population SD), 4 for \(sx\) (sample SD).
For frequency data: use LIST mode — enter x values in List 1, frequencies in List 2, then SET Freq = List 2.
Data: 3, 5, 7, 7, 9, 11, 11, 11, 13 (n = 9)
Median: Position = (9+1)/2 = 5th value = 9
Mode: 11 appears 3 times → 11
A student recorded the number of books read each month for 8 months:
2, 5, 3, 8, 5, 4, 5, 8
The mean number of books read per month is equal to 5. If the value 8 appears twice and the student claims the mean is exactly 5, verify whether this is correct and state the mean.
- Sum all values: 2 + 5 + 3 + 8 + 5 + 4 + 5 + 8 = 40
- Count: n = 8
- Mean = 40 ÷ 8 = 5.0 ✓
- The student's claim IS correct.
The number of siblings of students in a class is:
0, 3, 2, 1, 1, 0, 2, 1, 0, 0, 0, x, 2, 1, 0, 3
The mean number of siblings is equal to 1. Find the value of \(x\).
- Sum without x: 0+3+2+1+1+0+2+1+0+0+0+2+1+0+3 = 16
- Total n = 16 values. Mean = 1 → total sum must = 16
- 16 + x = 16 → \(x = 0\)
The ages of 11 students in a class are:
14, 16, 15, 13, 17, 15, 14, 16, 15, 14, 13
What is the median age?
- Order: 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 17
- n = 11 → median position = (11+1)/2 = 6th value
- 6th value = 15
The number of cups of coffee consumed by 55 teachers is shown below:
| Cups (x) | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| Frequency | 4 | 6 | 10 | 15 | 12 | 8 |
What are the mode and median for this data?
- Mode = value with highest frequency = 3 (f = 15)
- n = 55 → median position = (55+1)/2 = 28th value
- Cumulative freq: 0→4, 1→10, 2→20, 3→35. The 28th value falls in the group x = 3
- Median = 3
Enter x values (0–5) in List 1, frequencies in List 2 → AC → SHIFT 1 → 5 → 2 for mean, 3 for σ.
Using the same coffee data from Q4, calculate the mean number of cups consumed.
- \(\sum fx\) = 0×4 + 1×6 + 2×10 + 3×15 + 4×12 + 5×8 = 0+6+20+45+48+40 = 159
- \(\sum f\) = 4+6+10+15+12+8 = 55
- Mean = 159 ÷ 55 = 2.89 (to 3 s.f.)
📐 Topic 2 — Standard Deviation & Consistency
Variance · Population SD · Sample SD · Coefficient of Variation
- Standard deviation (σ or s) measures how spread out data is from the mean
- Low σ → data clustered near mean → more consistent
- High σ → data spread out → less consistent
- To compare consistency across different scales, use Coefficient of Variation (CV):
3 = \(\sigma x\) (population SD, used when data = entire group)
4 = \(sx\) (sample SD, used when data = sample from larger group)
For school/exam problems, use \(\sigma x\) unless told otherwise.
Grade 9: mean = 2.5, SD = 1.2 | Grade 10: mean = 2.1, SD = 1.4
Which grade has more consistent attendance?
Lower SD → data closer to mean → Grade 9 is more consistent.
Two basketball players' points per game over a season are recorded:
Player A: mean = 18.4, standard deviation = 3.2
Player B: mean = 21.0, standard deviation = 2.1
Which player is more consistent and why?
- Consistency is measured by standard deviation, not the mean
- Player A: σ = 3.2 | Player B: σ = 2.1
- 2.1 < 3.2 → Player B has smaller spread → more consistent
Using the coffee consumption data (Q4), the mean is 2.89 cups. What is the standard deviation (to 2 decimal places)?
- \(\sum f(x-\bar{x})^2\): use \(\bar{x} = 2.89\)
- = 4(0-2.89)²+6(1-2.89)²+10(2-2.89)²+15(3-2.89)²+12(4-2.89)²+8(5-2.89)²
- ≈ 33.37+21.47+7.92+0.18+14.78+35.69 = 113.41
- σ = √(113.41/55) = √2.0620 ≈ 1.44
Two factories produce bolts. Factory X: mean length = 50 mm, SD = 2 mm. Factory Y: mean length = 80 mm, SD = 2.8 mm.
Using the Coefficient of Variation, which factory produces more consistent bolts?
- CV = (σ/mean) × 100%
- Factory X: CV = (2/50) × 100% = 4%
- Factory Y: CV = (2.8/80) × 100% = 3.5%
- Lower CV = more consistent → Factory Y (3.5% < 4%)
- Note: simply comparing SD is misleading when means differ!
χ² Topic 3 — Chi-Squared Test of Independence
Hypotheses · Expected Frequencies · Degrees of Freedom · Critical Values
- Tests whether two categorical variables are independent
- H₀ (null): The two variables are independent (no association)
- H₁/Hₐ (alternative): The two variables are NOT independent (there is an association)
- Expected frequency for cell (row i, col j): \(E_{ij} = \dfrac{(\text{row total}) \times (\text{col total})}{\text{grand total}}\)
- Degrees of freedom: \(df = (\text{rows} - 1) \times (\text{cols} - 1)\)
For each cell: store O value, calculate E = (row total × col total) / grand total
Then compute (O−E)² / E for each cell and sum them.
Casio fx-991EX/CG series supports χ² directly via STAT → TEST → CHI.
Fish store data (from exam): 2 rows (Male/Female) × 4 cols (fish types). Grand total = 393.
= (170 × 119) / 393 ≈ 51.5
df = (2−1)(4−1) = 3
A researcher tests whether gender and preferred music genre are independent. Which pair of hypotheses is correct for a chi-squared test?
- The null hypothesis H₀ always states no effect / independence
- H₀: Gender and music preference are independent
- H₁: Gender and music preference are not independent (there is an association)
In a survey of 200 people, data on pet preference (Cat/Dog/Fish) and age group (Under 30 / 30 and over) is collected. What are the degrees of freedom for a chi-squared test of independence?
- Rows = 2 (age groups), Columns = 3 (pet types)
- df = (r − 1)(c − 1) = (2−1)(3−1) = 1 × 2 = 2
In a chi-squared test, the following data is observed:
| Barbs | Guppies | Goldfish | Tetras | Total | |
|---|---|---|---|---|---|
| Males | 57 | 65 | 67 | 34 | 223 |
| Females | 45 | 48 | 52 | 25 | 170 |
| Total | 102 | 113 | 119 | 59 | 393 |
What is the expected number of females who purchase Goldfish?
- E = (row total × column total) / grand total
- Row total (Females) = 170
- Column total (Goldfish) = 119
- Grand total = 393
- E = (170 × 119) / 393 = 20230 / 393 ≈ 51.5
A chi-squared test at the 5% significance level gives \(\chi^2_{calc} = 5.2\) with \(df = 3\). The critical value is \(\chi^2_{crit} = 7.815\).
What is the correct conclusion?
- Decision rule: Reject H₀ if \(\chi^2_{calc} > \chi^2_{crit}\)
- Here: 5.2 < 7.815 → we fail to reject H₀
- Conclusion: Insufficient evidence at 5% level to conclude an association exists
🔢 Topic 4 — Data Types, IQR & Outliers
Discrete · Continuous · Quartiles · Box Plots · Outliers
- Discrete: countable, specific values (e.g. number of children, goals scored)
- Continuous: measurable, any value in range (e.g. height, time, temperature)
- IQR = Q3 − Q1 (interquartile range — middle 50% of data)
- Outlier rule: a value is an outlier if it is below Q1 − 1.5×IQR or above Q3 + 1.5×IQR
6 → minX, 7 → Q1, 8 → Med, 9 → Q3, 0 → maxX
(Menu numbers may vary by model — look for "Min/Q1/Med/Q3/Max")
Which of the following is an example of continuous data?
- A, B — countable whole numbers → discrete
- D — shoe sizes come in fixed values (6, 6.5, 7…) → discrete
- C — time can be any value (e.g. 12.347 seconds) → continuous ✓
The ordered test scores of 10 students are:
42, 55, 58, 63, 67, 71, 74, 78, 83, 90
Find the interquartile range (IQR).
- n = 10. Lower half: 42,55,58,63,67 → Q1 = median = 58
- Upper half: 71,74,78,83,90 → Q3 = median = 78
- IQR = Q3 − Q1 = 78 − 58 = 20
A dataset has Q1 = 20, Q3 = 35. Using the 1.5 × IQR rule, which value would be classified as an outlier?
- IQR = 35 − 20 = 15
- Lower fence: Q1 − 1.5×IQR = 20 − 22.5 = −2.5
- Upper fence: Q3 + 1.5×IQR = 35 + 22.5 = 57.5
- Values outside [−2.5, 57.5] are outliers
- A = 5 ✓ (within range), B = 10 ✓, C = 58 → outlier! (58 > 57.5), D = 55 ✓
🎯 Topic 5 — Grouped Data, Histograms & Probability
Grouped frequency · Histograms · Cumulative frequency · Basic probability
- For grouped data, use the midpoint of each class interval as \(x\)
- Estimated mean = \(\dfrac{\sum f \cdot m}{\sum f}\) where \(m\) = midpoint
- You cannot determine the actual mean exactly from grouped data — only an estimate
- Histogram: y-axis = frequency density = frequency ÷ class width (for unequal widths)
The ages of 40 concert attendees are grouped as follows:
| Age group | 10–20 | 20–30 | 30–40 | 40–50 |
|---|---|---|---|---|
| Frequency | 8 | 14 | 12 | 6 |
What is the estimated mean age?
- Midpoints: 15, 25, 35, 45
- ∑fm = 8×15 + 14×25 + 12×35 + 6×45 = 120+350+420+270 = 1160
- ∑f = 40
- Estimated mean = 1160/40 = 29.0
From the concert data in Q16, what is the cumulative frequency for ages up to 30?
- Cumulative frequency up to 30 = freq(10–20) + freq(20–30)
- = 8 + 14 = 22
- This means 22 out of 40 attendees are under 30 years old (55%)
From the fish store data in Q11, a customer is chosen at random. What is the probability that the customer is a female who bought Tetras?
- P(female AND Tetras) = (number of females buying Tetras) / (total customers)
- = 25 / 393
- Note: 25/170 would be P(Tetras | Female) — a conditional probability
A histogram uses unequal class widths. The class 20 ≤ x < 25 has frequency 30 and class width 5. The class 25 ≤ x < 35 has frequency 40 and class width 10. Which class has the greater frequency density?
- Frequency Density = Frequency ÷ Class Width
- Class 20–25: FD = 30 ÷ 5 = 6
- Class 25–35: FD = 40 ÷ 10 = 4
- 6 > 4 → Class 20–25 has greater frequency density
A teacher wants to determine whether there is a statistically significant relationship between students' preferred learning style (Visual / Auditory / Kinaesthetic) and their exam grade (A / B / C / D). The teacher collects data from 120 students.
Which statistical test is most appropriate, and what are the degrees of freedom?
- Two categorical variables → use chi-squared test of independence (not t-test)
- Rows: 3 learning styles, Columns: 4 grade categories
- df = (r−1)(c−1) = (3−1)(4−1) = 2 × 3 = 6