Sampling & evidence

You can't ask everyone, test everything, or rerun history. Statistics' deepest trick is measuring a small sample and saying something honest about the whole — including exactly how unsure you are.

Build the intuition

Samples wobble — predictably

Poll 100 people twice and you'll get different answers; that's sampling noise. The miracle: the wobble shrinks like 1/√n. Quadruple the sample, halve the noise. This one formula explains why polls of just 1,000 can read a nation of millions — and why precision gets expensive fast.

\text{noise} \propto \frac{1}{\sqrt{n}}

The null question

Your new web design converted 54% vs the old 50%. Real, or luck? Statistics asks: if nothing were truly different, how often would chance alone produce a gap this big? If that's rare (the famous p < 0.05), the result earns “significant.” Surprise-under-boredom is the entire idea of hypothesis testing.

Significant ≠ important, and other fine print

With enormous samples, trivial differences become “significant” — detectable isn't the same as mattering. And one-in-twenty thresholds mean one-in-twenty flukes: a lab testing 20 hypotheses expects one false alarm. Replication, effect sizes, and pre-registered claims are the honest follow-through.

See it move

InteractiveThe law of large numbers

Flips40

After 40 flips: 23 heads = 57.5%. Single flips are pure chance — but the running average is drawn, inevitably, toward 50%. Randomness has long-run structure.

Sampling in miniature: with 20 flips the proportion lies freely; with 400 it can barely stray from the truth. Noise shrinks like 1/√n.

A worked example

Is the new button actually better?

Old checkout: 500 visitors, 50 purchases (10%). New: 500 visitors, 65 purchases (13%).
Sampling noise for 500-person groups is roughly ±1.5 points either side — gaps of 3 points happen by luck only a few percent of the time.
Verdict: probably real, worth shipping — and worth re-measuring after launch. With 50 visitors instead of 500, the same percentages would have meant nothing.

Out in the world

Why drug trials need thousands

A drug helping 1 person in 50 is a huge public-health win that's statistically invisible in a 40-person trial. Trial sizes are computed from the 1/√n law before a single patient enrolls — statistics decides medicine's experiments before medicine runs them.

Common confusion, cleared

“A bigger population needs a proportionally bigger sample.”

Noise depends on sample size, barely on population size: 1,000 people read a city or a continent almost equally well. (Soup tasting: one spoonful works for any size pot — if stirred.)

“p < 0.05 means 95% chance the effect is real.”

It means: if there were no effect, data this extreme would occur under 5% of the time. Different statement — the gap between them is where most statistical misreporting lives.

Check yourself

PracticeQuick check

To halve your poll's margin of error, you need…
About what fraction of bell-curve values fall within ±2σ?

Recap

Sample noise shrinks like 1/√n — small samples can speak for millions.
Hypothesis testing asks how surprising the data would be if nothing were real.
Significant ≠ important; replication and effect size complete the story.

Progress saves in this browser.