๐Ÿ“Š AP Statistics Quiz

20๋ฌธ์ œ ยท ํ•ต์‹ฌ ๊ฐœ๋… ์ด์ •๋ฆฌ ยท ๊ธฐ์ดˆโ†’๊ณ ๊ธ‰

โœ… 0 ์ •๋‹ต
โŒ 0 ์˜ค๋‹ต
๐Ÿ“ 0/20 ์™„๋ฃŒ
๐ŸŒฑ ๊ธฐ์ดˆ (Q1โ€“5) ยท Descriptive Statistics
1
๐Ÿ“ Measures of Center & Spread
โญ Easy
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
ํ‰๊ท (Mean): ๋ชจ๋“  ๊ฐ’์˜ ํ•ฉ รท ๊ฐœ์ˆ˜
์ค‘์•™๊ฐ’(Median): ์ •๋ ฌ ํ›„ ๊ฐ€์šด๋ฐ ๊ฐ’ โ†’ ์ด์ƒ๊ฐ’์— ๊ฐ•๊ฑด(robust)
๋ถ„์‚ฐ(Variance): sยฒ = ฮฃ(xแตข-xฬ„)ยฒ/(n-1)
๋‹ค์Œ ๋ฐ์ดํ„ฐ์…‹์˜ ์ค‘์•™๊ฐ’(median)์€?

{ 3, 7, 2, 9, 1, 5, 8 }
๐Ÿ“– ํ•ด์„ค
์ •๋ ฌํ•˜๋ฉด: 1, 2, 3, 5, 7, 8, 9
7๊ฐœ ๋ฐ์ดํ„ฐ โ†’ 4๋ฒˆ์งธ ๊ฐ’ = 5๊ฐ€ ์ค‘์•™๊ฐ’.
ํ‰๊ท ์€ (1+2+3+5+7+8+9)/7 = 35/7 = 5 โ†’ ์šฐ์—ฐํžˆ ๊ฐ™์ง€๋งŒ ์ด์ƒ๊ฐ’์ด ์žˆ์œผ๋ฉด ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค!
2
๐Ÿ“ฆ Boxplot & IQR
โญ Easy
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
IQR = Q3 โˆ’ Q1
์ด์ƒ๊ฐ’ ๊ธฐ์ค€: Q1 โˆ’ 1.5ร—IQR ๋ฏธ๋งŒ ๋˜๋Š” Q3 + 1.5ร—IQR ์ดˆ๊ณผ
์–ด๋–ค ๋ฐ์ดํ„ฐ์˜ Q1 = 20, Q3 = 40 ์ผ ๋•Œ, ์ด์ƒ๊ฐ’(outlier)์œผ๋กœ ํŒ๋ณ„๋˜๋Š” ๊ฐ’์€?
Q1=20 Q3=40 ?
๐Ÿ“– ํ•ด์„ค
IQR = 40 โˆ’ 20 = 20
์ƒํ•œ fence = Q3 + 1.5ร—IQR = 40 + 30 = 70
ํ•˜ํ•œ fence = Q1 โˆ’ 1.5ร—IQR = 20 โˆ’ 30 = โˆ’10
โ†’ 70์€ ์ƒํ•œ fence์™€ ์ •ํ™•ํžˆ ๊ฐ™์œผ๋ฏ€๋กœ ์ด์ƒ๊ฐ’! (์ดˆ๊ณผ ๊ธฐ์ค€์ด๋ฏ€๋กœ 70์€ ๊ฒฝ๊ณ„์„ )
AP ์‹œํ—˜ ๊ธฐ์ค€: fence๋ฅผ ์ดˆ๊ณผํ•˜๋ฉด ์ด์ƒ๊ฐ’ โ†’ 70์ด fence = ์ด์ƒ๊ฐ’์œผ๋กœ ์ฒ˜๋ฆฌ.
3
๐Ÿ“Š Distribution Shape
โญ Easy
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
Right skew(์šฐํŽธํฌ): Mean > Median (๊ผฌ๋ฆฌ๊ฐ€ ์˜ค๋ฅธ์ชฝ)
Left skew(์ขŒํŽธํฌ): Mean < Median (๊ผฌ๋ฆฌ๊ฐ€ ์™ผ์ชฝ)
Symmetric: Mean โ‰ˆ Median
์†Œ๋“ ๋ถ„ํฌ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์–ด๋–ค ํ˜•ํƒœ๋ฅผ ๋ณด์ด๋ฉฐ, ์ด๋•Œ ํ‰๊ท ๊ณผ ์ค‘์•™๊ฐ’์˜ ๊ด€๊ณ„๋Š”?
Median Mean ๊ธด ๊ผฌ๋ฆฌ โ†’
๐Ÿ“– ํ•ด์„ค
์†Œ๋“์€ ์šฐํŽธํฌ(Right skewed): ๋Œ€๋ถ€๋ถ„์€ ๋‚ฎ์€ ์†Œ๋“, ์†Œ์ˆ˜ ๊ณ ์†Œ๋“์ž๊ฐ€ ๊ผฌ๋ฆฌ๋ฅผ ๋‹น๊ฒจ Mean > Median.
์‹œํ—˜ ํŒ: ๊ผฌ๋ฆฌ ๋ฐฉํ–ฅ = ํŽธํฌ ๋ฐฉํ–ฅ โ†’ ์˜ค๋ฅธ์ชฝ ๊ผฌ๋ฆฌ โ†’ right skew โ†’ mean์ด median๋ณด๋‹ค ํฌ๋‹ค.
4
๐Ÿ”ข z-score & Normal Distribution
โญ Easy
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
z = (x โˆ’ ฮผ) / ฯƒ
z-score: ํ‰๊ท ์œผ๋กœ๋ถ€ํ„ฐ ํ‘œ์ค€ํŽธ์ฐจ ๋ช‡ ๋ฐฐ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€
68-95-99.7 Rule: ฮผยฑ1ฯƒ=68%, ฮผยฑ2ฯƒ=95%, ฮผยฑ3ฯƒ=99.7%
์‹œํ—˜ ์ ์ˆ˜ ๋ถ„ํฌ: ฮผ = 70์ , ฯƒ = 10์  (์ •๊ทœ๋ถ„ํฌ).
์ ์ˆ˜๊ฐ€ 85์ ์ธ ํ•™์ƒ์˜ z-score๋Š”?
๐Ÿ“– ํ•ด์„ค
z = (85 โˆ’ 70) / 10 = 15/10 = 1.5
โ†’ ์ด ํ•™์ƒ์€ ํ‰๊ท ๋ณด๋‹ค 1.5 ํ‘œ์ค€ํŽธ์ฐจ ์œ„์— ์žˆ์Œ.
z > 0: ํ‰๊ท  ์ด์ƒ / z < 0: ํ‰๊ท  ์ดํ•˜
5
๐Ÿ”— Correlation
โญ Easy
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
์ƒ๊ด€๊ณ„์ˆ˜ r: โˆ’1 โ‰ค r โ‰ค 1
|r| ๊ฐ€ 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๊ฐ•ํ•œ ์„ ํ˜•๊ด€๊ณ„ / r = 0 โ†’ ์„ ํ˜•๊ด€๊ณ„ ์—†์Œ
์ƒ๊ด€โ‰ ์ธ๊ณผ๊ด€๊ณ„! (Correlation โ‰  Causation)
์ƒ๊ด€๊ณ„์ˆ˜(r)์— ๋Œ€ํ•œ ์„ค๋ช… ์ค‘ ์˜ณ์ง€ ์•Š์€ ๊ฒƒ์€?
๐Ÿ“– ํ•ด์„ค
E๊ฐ€ ํ‹€๋ฆผ!
r = 0์€ ์„ ํ˜•๊ด€๊ณ„๊ฐ€ ์—†์Œ์„ ๋œปํ•˜์ง€, ์–ด๋–ค ๊ด€๊ณ„๋„ ์—†๋‹ค๋Š” ๋œป์ด ์•„๋‹˜.
์˜ˆ: y = xยฒ ํ˜•ํƒœ์˜ ๊ฐ•ํ•œ ๋น„์„ ํ˜•๊ด€๊ณ„๋„ r โ‰ˆ 0 ์ผ ์ˆ˜ ์žˆ์Œ.
โ†’ AP ์‹œํ—˜ ๋‹จ๊ณจ ํ•จ์ •!
๐ŸŒฟ ์ค‘๊ธ‰ (Q6โ€“12) ยท Probability & Inference
6
๐ŸŽฒ Probability Rules
๐Ÿ”ฅ Medium
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
๋…๋ฆฝ์‚ฌ๊ฑด: P(AโˆฉB) = P(A)ยทP(B)
๋ฐฐ๋ฐ˜์‚ฌ๊ฑด: P(AโˆชB) = P(A) + P(B)
์ผ๋ฐ˜: P(AโˆชB) = P(A) + P(B) โˆ’ P(AโˆฉB)
P(A) = 0.4, P(B) = 0.3, A์™€ B๋Š” ๋…๋ฆฝ.
P(A โˆช B) = ?
๐Ÿ“– ํ•ด์„ค
๋…๋ฆฝ์ด๋ฏ€๋กœ P(AโˆฉB) = 0.4 ร— 0.3 = 0.12
P(AโˆชB) = 0.4 + 0.3 โˆ’ 0.12 = 0.58
๋ฐฐ๋ฐ˜(disjoint)๊ณผ ๋…๋ฆฝ(independent)์„ ํ—ท๊ฐˆ๋ฆฌ์ง€ ๋ง ๊ฒƒ!
๋ฐฐ๋ฐ˜ โ†’ P(AโˆฉB) = 0 / ๋…๋ฆฝ โ†’ P(AโˆฉB) = P(A)ยทP(B)
7
๐Ÿ”€ Conditional Probability & Bayes
๐Ÿ”ฅ Medium
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
P(A|B) = P(AโˆฉB) / P(B)
์ด์ค‘ ํ‘œ (Two-way table) ํ™œ์šฉ์ด ํ•ต์‹ฌ!
์•„๋ž˜ ํ‘œ์—์„œ ์Šคํฌ์ธ ๋ฅผ ์ข‹์•„ํ•˜๋Š” ํ•™์ƒ ์ค‘ ๋‚จํ•™์ƒ์ผ ํ™•๋ฅ ์€?
์Šคํฌ์ธ  ์ข‹์•„ํ•จ์Šคํฌ์ธ  ์‹ซ์–ดํ•จํ•ฉ๊ณ„
๋‚จํ•™์ƒ602080
์—ฌํ•™์ƒ4080120
ํ•ฉ๊ณ„100100200
๐Ÿ“– ํ•ด์„ค
P(๋‚จ|์Šคํฌ์ธ ) = P(๋‚จโˆฉ์Šคํฌ์ธ ) / P(์Šคํฌ์ธ )
= 60/200 รท 100/200 = 60/100 = 0.60
์กฐ๊ฑด๋ถ€ํ™•๋ฅ ์€ ๋ถ„๋ชจ๋ฅผ ์กฐ๊ฑด ์‚ฌ๊ฑด์˜ ์ดํ•ฉ์œผ๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ!
8
๐Ÿ“ˆ Binomial Distribution
๐Ÿ”ฅ Medium
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
์ดํ•ญ๋ถ„ํฌ ์กฐ๊ฑด: BINS
ยท Binary (์„ฑ๊ณต/์‹คํŒจ) ยท Independent ยท N fixed ยท Same p
P(X=k) = C(n,k) ยท pแต ยท (1-p)โฟโปแต
ฮผ = np, ฯƒ = โˆš(np(1-p))
๋™์ „์„ 10๋ฒˆ ๋˜์งˆ ๋•Œ, ์•ž๋ฉด์ด ์ •ํ™•ํžˆ 3๋ฒˆ ๋‚˜์˜ฌ ํ™•๋ฅ ์€?
(๋ฐ˜์˜ฌ๋ฆผํ•˜์—ฌ ์†Œ์ˆ˜ ๋‘˜์งธ ์ž๋ฆฌ๊นŒ์ง€)
๐Ÿ“– ํ•ด์„ค
P(X=3) = C(10,3) ร— (0.5)ยณ ร— (0.5)โท
= 120 ร— (1/8) ร— (1/128)
= 120/1024 โ‰ˆ 0.117 โ‰ˆ 0.12
9
๐Ÿงช Sampling Distribution & CLT
๐Ÿ”ฅ Medium
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
์ค‘์‹ฌ๊ทนํ•œ์ •๋ฆฌ(CLT): ๋ชจ์ง‘๋‹จ ๋ถ„ํฌ ์ƒ๊ด€์—†์ด, n์ด ์ถฉ๋ถ„ํžˆ ํฌ๋ฉด(nโ‰ฅ30)
ํ‘œ๋ณธํ‰๊ท ์˜ ๋ถ„ํฌ โ†’ ์ •๊ทœ๋ถ„ํฌ
xฬ„ ~ N(ฮผ, ฯƒ/โˆšn)
๋ชจ์ง‘๋‹จ: ฮผ = 50, ฯƒ = 12. ํฌ๊ธฐ n = 36์ธ ํ‘œ๋ณธ์˜ ํ‘œ๋ณธํ‰๊ท  xฬ„์˜
ํ‘œ์ค€์˜ค์ฐจ(Standard Error)๋Š”?
๐Ÿ“– ํ•ด์„ค
SE = ฯƒ/โˆšn = 12/โˆš36 = 12/6 = 2
โ†’ ํ‘œ๋ณธ ํฌ๊ธฐ๊ฐ€ ํด์ˆ˜๋ก SE๊ฐ€ ์ž‘์•„์ง = ์ถ”์ •์ด ๋” ์ •ํ™•ํ•ด์ง!
SE๋ฅผ ฯƒ(๋ชจํ‘œ์ค€ํŽธ์ฐจ)์™€ ํ˜ผ๋™ํ•˜์ง€ ๋ง ๊ฒƒ.
10
๐Ÿ“ Confidence Interval
๐Ÿ”ฅ Medium
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
95% CI for ฮผ (ฯƒ known): xฬ„ ยฑ z* ยท (ฯƒ/โˆšn)
z* = 1.645 (90%), 1.960 (95%), 2.576 (99%)
ํ•ด์„: "์ด ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌํ•œ ๊ตฌ๊ฐ„์˜ 95%๊ฐ€ ๋ชจ์ˆ˜๋ฅผ ํฌํ•จํ•œ๋‹ค"
n=100, xฬ„=75, ฯƒ=20์ธ ํ‘œ๋ณธ์œผ๋กœ 95% ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ๊ตฌํ•˜๋ฉด?
๐Ÿ“– ํ•ด์„ค
ME = z* ร— SE = 1.960 ร— (20/โˆš100) = 1.960 ร— 2 = 3.92
CI: 75 ยฑ 3.92 โ†’ (71.08, 78.92)
์ฃผ์˜: "์ด ํŠน์ • ๊ตฌ๊ฐ„์ด ๋ชจ์ˆ˜๋ฅผ ํฌํ•จํ•  ํ™•๋ฅ ์ด 95%"๋ผ๋Š” ํ•ด์„์€ ํ‹€๋ฆผ!
11
โš–๏ธ Hypothesis Testing โ€” Type I & II Error
๐Ÿ”ฅ Medium
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
Type I Error (ฮฑ): Hโ‚€๊ฐ€ ์ฐธ์ธ๋ฐ ๊ธฐ๊ฐ = "False Positive"
Type II Error (ฮฒ): Hโ‚€๊ฐ€ ๊ฑฐ์ง“์ธ๋ฐ ์ฑ„ํƒ = "False Negative"
Power = 1 โˆ’ ฮฒ: ๊ฒ€์ •์˜ ํž˜
์‹ค์ œ๋กœ๋Š” ์ƒˆ ์•ฝ์ด ํšจ๊ณผ๊ฐ€ ์—†๋Š”๋ฐ, ํ†ต๊ณ„ ๊ฒ€์ • ๊ฒฐ๊ณผ ํšจ๊ณผ ์žˆ๋‹ค๊ณ  ๊ฒฐ๋ก ์„ ๋‚ด๋ ธ๋‹ค.
์ด๋Š” ์–ด๋–ค ์˜ค๋ฅ˜์ธ๊ฐ€?
๐Ÿ“– ํ•ด์„ค
ํšจ๊ณผ ์—†์Œ = Hโ‚€ ์ฐธ์ธ ์ƒํ™ฉ โ†’ Hโ‚€๋ฅผ ๊ธฐ๊ฐ โ†’ Type I Error
Type I: Hโ‚€ ์ฐธ + ๊ธฐ๊ฐ (ํ™•๋ฅ  = ฮฑ)
Type II: Hโ‚€ ๊ฑฐ์ง“ + ์ฑ„ํƒ (ํ™•๋ฅ  = ฮฒ)
์˜ํ•™์—์„œ Type I = ํšจ๊ณผ ์—†๋Š” ์•ฝ์„ ์ฒ˜๋ฐฉํ•˜๋Š” ๋” ์œ„ํ—˜ํ•œ ์‹ค์ˆ˜!
12
๐Ÿ“‰ Linear Regression
๐Ÿ”ฅ Medium
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
ํšŒ๊ท€์ง์„ : ลท = bโ‚€ + bโ‚x
๊ธฐ์šธ๊ธฐ bโ‚ = rยท(Sy/Sx)
์ž”์ฐจ(Residual) = y โˆ’ ลท (์‹ค์ œ๊ฐ’ โˆ’ ์˜ˆ์ธก๊ฐ’)
rยฒ = ๊ฒฐ์ •๊ณ„์ˆ˜: y ๋ณ€๋™์˜ ๋ช‡ %๋ฅผ x๋กœ ์„ค๋ช…ํ•˜๋‚˜
ํšŒ๊ท€์ง์„  ลท = 3 + 2x์—์„œ, x = 5์ผ ๋•Œ ์‹ค์ œ ๊ด€์ธก๊ฐ’ y = 16์ด์—ˆ๋‹ค.
์ž”์ฐจ(residual)๋Š”?
๐Ÿ“– ํ•ด์„ค
ลท = 3 + 2(5) = 13
Residual = y โˆ’ ลท = 16 โˆ’ 13 = 3
โ†’ ์–‘์˜ ์ž”์ฐจ: ์‹ค์ œ๊ฐ’์ด ์˜ˆ์ธก๊ฐ’๋ณด๋‹ค ํฌ๋‹ค (underestimate)
์ž”์ฐจ ํ”Œ๋กฏ์ด ํŒจํ„ด ์—†์ด ๋žœ๋คํ•˜๋ฉด ์„ ํ˜•๋ชจ๋ธ์ด ์ ํ•ฉ!
๐ŸŒณ ๊ณ ๊ธ‰ (Q13โ€“17) ยท Inference Deep Dive
13
๐Ÿงฎ One-Sample t-test
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
ฯƒ ๋ชจ๋ฆ„ โ†’ t-test ์‚ฌ์šฉ
t = (xฬ„ โˆ’ ฮผโ‚€) / (s/โˆšn), df = n โˆ’ 1
์กฐ๊ฑด: ๋žœ๋ค ์ƒ˜ํ”Œ, ์ •๊ทœ๋ถ„ํฌ or nโ‰ฅ30 (CLT)
์ปคํ”ผ์ˆ์ด "์ปคํ”ผ 1์ž” ํ‰๊ท  250ml"๋ผ๊ณ  ์ฃผ์žฅ. ์†Œ๋น„์ž๋‹จ์ฒด๊ฐ€ 16์ž” ์กฐ์‚ฌ:
xฬ„ = 245ml, s = 8ml. ฮผโ‚€ = 250ml์— ๋Œ€ํ•œ t-๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰์€?
๐Ÿ“– ํ•ด์„ค
t = (245 โˆ’ 250) / (8/โˆš16) = โˆ’5 / (8/4) = โˆ’5/2 = โˆ’2.50
df = 16 โˆ’ 1 = 15
|t| = 2.50 > t*(df=15, ฮฑ=0.05, ๋‹จ์ธก) โ‰ˆ 1.753 โ†’ Hโ‚€ ๊ธฐ๊ฐ!
โ†’ ์‹ค์ œ๋กœ 250ml๋ณด๋‹ค ์ ๊ฒŒ ์ฃผ๊ณ  ์žˆ๋‹ค๋Š” ์ฆ๊ฑฐ.
14
๐ŸŽฏ p-value ํ•ด์„
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
p-value: Hโ‚€๊ฐ€ ์ฐธ์ผ ๋•Œ ๊ด€์ฐฐ๋œ ํ†ต๊ณ„๋Ÿ‰ ์ด์ƒ์˜ ๊ทน๋‹จ์ ์ธ ๊ฐ’์ด ๋‚˜์˜ฌ ํ™•๋ฅ 
p < ฮฑ โ†’ Hโ‚€ ๊ธฐ๊ฐ (ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜)
p โ‰ฅ ฮฑ โ†’ Hโ‚€ ๊ธฐ๊ฐ ์‹คํŒจ
๊ฐ€์„ค๊ฒ€์ • ๊ฒฐ๊ณผ p-value = 0.03, ์œ ์˜์ˆ˜์ค€ ฮฑ = 0.05.
๋‹ค์Œ ์ค‘ ์˜ฌ๋ฐ”๋ฅธ ํ•ด์„์€?
๐Ÿ“– ํ•ด์„ค
B๊ฐ€ ์ •๋‹ต! p-value์˜ ์ •ํ™•ํ•œ ์ •์˜:
"Hโ‚€๊ฐ€ ์ฐธ์ด๋ผ๋Š” ๊ฐ€์ • ํ•˜์—, ์šฐ๋ฆฌ๊ฐ€ ์–ป์€ ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰๋ณด๋‹ค ๋” ๊ทน๋‹จ์ ์ธ ๊ฐ’์ด ๋‚˜์˜ฌ ํ™•๋ฅ "
p = 0.03 < ฮฑ = 0.05 โ†’ Hโ‚€ ๊ธฐ๊ฐ
A(Hโ‚€ ์ฐธ ํ™•๋ฅ ), C(Hโ‚ ์ฐธ ํ™•๋ฅ ) โ†’ ๋นˆ๋„์ฃผ์˜ ํ†ต๊ณ„์—์„œ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•œ ํ•ด์„!
15
๐Ÿ”ฒ Chi-Square Test
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
์ ํ•ฉ๋„ ๊ฒ€์ • (Goodness of Fit): ๊ด€์ธก ๋ถ„ํฌ = ๊ธฐ๋Œ€ ๋ถ„ํฌ?
๋…๋ฆฝ์„ฑ ๊ฒ€์ • (Test of Independence): ๋‘ ๋ฒ”์ฃผ๋ณ€์ˆ˜ ๋…๋ฆฝ?
ฯ‡ยฒ = ฮฃ (Oโˆ’E)ยฒ/E, ๋ชจ๋“  ๊ธฐ๋Œ€๋นˆ๋„ โ‰ฅ 5
์ฃผ์‚ฌ์œ„๊ฐ€ ๊ณต์ •ํ•œ์ง€ ๊ฒ€์ •. 60๋ฒˆ ๋˜์ ธ ๊ฐ ๋ฉด์ด ๋‚˜์˜จ ํšŸ์ˆ˜: {8,12,10,9,11,10}.
ฯ‡ยฒ ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰์€? (๊ธฐ๋Œ€๋นˆ๋„: ๊ฐ ๋ฉด๋งˆ๋‹ค 10)
๐Ÿ“– ํ•ด์„ค
E = 10 for each face
ฯ‡ยฒ = (8โˆ’10)ยฒ/10 + (12โˆ’10)ยฒ/10 + (10โˆ’10)ยฒ/10 + (9โˆ’10)ยฒ/10 + (11โˆ’10)ยฒ/10 + (10โˆ’10)ยฒ/10
= 4/10 + 4/10 + 0 + 1/10 + 1/10 + 0 = 10/10 = 1.0
df = kโˆ’1 = 5, ฯ‡ยฒ(5, 0.05) = 11.07 โ†’ 1.0 < 11.07 โ†’ Hโ‚€ ๊ธฐ๊ฐ ์‹คํŒจ (๊ณต์ •ํ•œ ์ฃผ์‚ฌ์œ„)
16
๐Ÿ”ฌ Two-Sample Inference
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
๋‘ ๋ชจํ‰๊ท  ๋น„๊ต: t = (xฬ„โ‚ โˆ’ xฬ„โ‚‚) / โˆš(sโ‚ยฒ/nโ‚ + sโ‚‚ยฒ/nโ‚‚)
๋‘ ๋ชจ๋น„์œจ ๋น„๊ต: z = (pฬ‚โ‚ โˆ’ pฬ‚โ‚‚) / โˆš(pฬ‚c(1โˆ’pฬ‚c)(1/nโ‚+1/nโ‚‚))
Pooled proportion pฬ‚c ์‚ฌ์šฉ (Hโ‚€: pโ‚=pโ‚‚์ผ ๋•Œ)
A๋ฐ˜(n=50): ํ•ฉ๊ฒฉ๋ฅ  72%. B๋ฐ˜(n=80): ํ•ฉ๊ฒฉ๋ฅ  60%.
๋‘ ๋น„์œจ์ด ๊ฐ™๋‹ค๋Š” Hโ‚€ ๊ฒ€์ • ์‹œ pooled proportion pฬ‚c๋Š”?
๐Ÿ“– ํ•ด์„ค
์„ฑ๊ณต ์ˆ˜: A = 50ร—0.72 = 36, B = 80ร—0.60 = 48
pฬ‚c = (36+48)/(50+80) = 84/130 โ‰ˆ 0.6462 โ‰ˆ 0.648
Pooled proportion์€ ๋‘ ๊ทธ๋ฃน์„ ํ•ฉ์ณ์„œ ์ „์ฒด ์„ฑ๊ณต๋ฅ ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ!
17
๐ŸŒŠ Geometric Distribution
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
๊ธฐํ•˜๋ถ„ํฌ: ์ฒซ ๋ฒˆ์งธ ์„ฑ๊ณต๊นŒ์ง€ ์‹œ๋„ ํšŸ์ˆ˜
P(X=k) = (1โˆ’p)^(kโˆ’1) ยท p
ฮผ = 1/p, ฯƒยฒ = (1โˆ’p)/pยฒ
๋†๊ตฌ ์ž์œ ํˆฌ ์„ฑ๊ณต๋ฅ  p = 0.40. ์ฒซ ๋ฒˆ์งธ ์„ฑ๊ณต์ด ์ •ํ™•ํžˆ 3๋ฒˆ์งธ ์‹œ๋„์—์„œ ์ผ์–ด๋‚  ํ™•๋ฅ ์€?
๐Ÿ“– ํ•ด์„ค
P(X=3) = (1โˆ’0.4)^(3โˆ’1) ร— 0.4
= (0.6)ยฒ ร— 0.4 = 0.36 ร— 0.4 = 0.144
ํ•ด์„: ์ฒ˜์Œ 2๋ฒˆ ์‹คํŒจ(0.6ยฒ), 3๋ฒˆ์งธ ์„ฑ๊ณต(0.4)
๐Ÿ”ฅ ์ตœ๊ณ ๋‚œ๋„ (Q18โ€“20) ยท Expert Level
18
๐Ÿ“ Regression Inference & rยฒ
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
ํšŒ๊ท€ ๊ธฐ์šธ๊ธฐ t-๊ฒ€์ •: Hโ‚€: ฮฒโ‚ = 0
t = bโ‚ / SE(bโ‚), df = nโˆ’2
rยฒ = (SSModel)/(SSTotal): ์„ค๋ช…๋œ ๋ณ€๋™ ๋น„์œจ
์ž”์ฐจ ํ”Œ๋กฏ ํŒจํ„ด โ†’ ์„ ํ˜• ๋ชจ๋ธ ๋ถ€์ ํ•ฉ ์‹ ํ˜ธ
ํšŒ๊ท€๋ถ„์„ ๊ฒฐ๊ณผ: r = โˆ’0.87, n = 25.
๋‹ค์Œ ์ค‘ ์˜ณ์€ ๊ฒƒ์„ ๋ชจ๋‘ ๊ณ ๋ฅด๋ฉด?

โ… . x๊ฐ€ 1 ์ฆ๊ฐ€ํ•˜๋ฉด y๋Š” ๊ฐ์†Œํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค.
โ…ก. rยฒ = 0.7569์ด๋ฏ€๋กœ y ๋ณ€๋™์˜ ์•ฝ 75.7%๊ฐ€ x๋กœ ์„ค๋ช…๋œ๋‹ค.
โ…ข. r = โˆ’0.87์€ ์ธ๊ณผ๊ด€๊ณ„๋ฅผ ์˜๋ฏธํ•œ๋‹ค.
๐Ÿ“– ํ•ด์„ค
โ…  โœ…: r < 0 โ†’ ์Œ์˜ ์„ ํ˜•๊ด€๊ณ„ โ†’ xโ†‘, yโ†“ ๊ฒฝํ–ฅ
โ…ก โœ…: rยฒ = (โˆ’0.87)ยฒ = 0.7569 โ‰ˆ 75.7% ์„ค๋ช…
โ…ข โŒ: ์ƒ๊ด€๊ณ„์ˆ˜๋Š” ์„ ํ˜•๊ด€๊ณ„ ๊ฐ•๋„, ์ธ๊ณผ๊ด€๊ณ„(causation)๋Š” ์‹คํ—˜์„ค๊ณ„๋กœ๋งŒ ํ™•์ธ
โ†’ ์ •๋‹ต: โ… ๊ณผ โ…ก
19
๐Ÿ”ญ Power of a Test
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
Power = 1 โˆ’ ฮฒ = Hโ‚์ด ์ฐธ์ผ ๋•Œ Hโ‚€ ๊ธฐ๊ฐ ํ™•๋ฅ 
Power ์ฆ๊ฐ€ ๋ฐฉ๋ฒ•:
ยท ฮฑ ์ฆ๊ฐ€ ยท ํ‘œ๋ณธํฌ๊ธฐ(n) ์ฆ๊ฐ€ ยท ํšจ๊ณผํฌ๊ธฐ(effect size) ์ฆ๊ฐ€ ยท ฯƒ ๊ฐ์†Œ
์—ฐ๊ตฌ์ž๊ฐ€ ๊ฒ€์ •์˜ ํ†ต๊ณ„์  ๊ฒ€์ •๋ ฅ(power)์„ ๋†’์ด๋ ค ํ•œ๋‹ค.
๋‹ค์Œ ์ค‘ power๋ฅผ ๋†’์ด์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ•์€?
๐Ÿ“– ํ•ด์„ค
C๊ฐ€ ์ •๋‹ต: ฮฑ๋ฅผ ๋‚ฎ์ถ”๋ฉด ๊ธฐ๊ฐ์—ญ์ด ์ข์•„์ง โ†’ Hโ‚€ ๊ธฐ๊ฐํ•˜๊ธฐ ๋” ์–ด๋ ค์›Œ์ง โ†’ Power ๊ฐ์†Œ
Power์™€ ฮฑ๋Š” trade-off ๊ด€๊ณ„!
ฮฑโ†“ โ†’ Type I Errorโ†“, BUT Powerโ†“, Type II Errorโ†‘
AP ์‹œํ—˜์—์„œ ๋งค์šฐ ์ž์ฃผ ์ถœ์ œ๋˜๋Š” ๊ฐœ๋….
20
๐Ÿ† Experimental Design & Lurking Variables
๐Ÿ’Ž Hard
๐Ÿ’ก ํ•ต์‹ฌ ๊ฐœ๋…
๋ฌด์ž‘์œ„ ์‹คํ—˜(RCT): ์ธ๊ณผ๊ด€๊ณ„ ์ฃผ์žฅ ๊ฐ€๋Šฅ
๊ด€์ฐฐ ์—ฐ๊ตฌ: ์ƒ๊ด€๊ด€๊ณ„๋งŒ, ์ธ๊ณผ๊ด€๊ณ„ X
๊ต๋ž€๋ณ€์ˆ˜(Confounding/Lurking variable): ๋‘ ๋ณ€์ˆ˜ ๋ชจ๋‘์— ์˜ํ–ฅ
๋ธ”๋กœํ‚น(Blocking): ์•Œ๋ ค์ง„ ๋ณ€๋™ ์›์ธ ํ†ต์ œ
"์•„์ด์Šคํฌ๋ฆผ ํŒ๋งค๋Ÿ‰๊ณผ ์ต์‚ฌ ์‚ฌ๊ณ  ๊ฑด์ˆ˜ ์‚ฌ์ด์— r = 0.89์˜ ๊ฐ•ํ•œ ์–‘์˜ ์ƒ๊ด€์ด ์žˆ๋‹ค."
์ด์— ๋Œ€ํ•œ ๊ฐ€์žฅ ์ ์ ˆํ•œ ์„ค๋ช…์€?
๐Ÿ“– ํ•ด์„ค
D๊ฐ€ ์ •๋‹ต: ์ „ํ˜•์ ์ธ ๊ต๋ž€๋ณ€์ˆ˜ ๋ฌธ์ œ!
๊ธฐ์˜จ(temperature)์ด ๋†’์•„์ง€๋Š” ์—ฌ๋ฆ„์—:
โ†’ ์•„์ด์Šคํฌ๋ฆผ ํŒ๋งคโ†‘ AND ์ˆ˜์˜ ์ธ๊ตฌโ†‘ โ†’ ์ต์‚ฌ ์‚ฌ๊ณ โ†‘
๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„(r=0.89)๊ฐ€ ์žˆ์–ด๋„ ์ธ๊ณผ๊ด€๊ณ„๋Š” ๋ฌด์ž‘์œ„ ์‹คํ—˜(RCT)์œผ๋กœ๋งŒ ํ™•๋ฆฝ ๊ฐ€๋Šฅ.
AP Statistics ์ตœ๋นˆ์ถœ ๊ฐœ๋…: Correlation โ‰  Causation ๐ŸŽฏ