Probability & statistics · 02 · Data without lies · 8 min
Center, spread & honest summaries
A thousand numbers are unreadable; a summary is unavoidable. The craft of statistics starts with summarizing without distorting — knowing what the mean hides, what the median resists, and what spread reveals.
Build the intuition
Mean vs median: the billionaire test
Nine people earn $50k; a billionaire walks into the room. The mean salary rockets past $100M — the median stays $50k. Means follow the money; medians follow the middle person. Skewed data (incomes, house prices, wait times) is median territory; “average” headlines deserve suspicion.
Spread is half the story
Two cities can share a mean temperature of 15°C — one ranging 5–25°, the other −20–50°. Same center, utterly different lives. Standard deviation (σ) measures typical distance from the mean: small σ, clustered and predictable; large σ, scattered and volatile.
Distribution: the full portrait
Beyond two numbers lies the shape: pile the data into a histogram and look. Symmetric bell? One lonely peak or two (two populations mixed)? A long tail of extremes? Many wrong conclusions die at the moment someone actually plots the data.
See it move
μ slides the center; σ stretches the spread. Two dials describe the whole population — when the shape is a bell.
A worked example
Which commute is better?
Route A: mean 30 min, σ = 2. Route B: mean 28 min, σ = 12.
B is faster on average — but its spread means 50+ minute disasters happen regularly.
With a 35-minute deadline, A almost never fails; B fails often. The mean said B; the spread said A — and the spread was right.
Out in the world
Manufacturing lives on σ
A bolt factory's mean diameter can be perfect while variance quietly produces failures. Quality control is variance control — “six sigma” is literally a promise about standard deviations. Consistency, not averages, is what you can build bridges on.
Common confusion, cleared
“The average person earns the average salary.”
With skewed data most people sit below the mean — a few giants drag it up. The median is where the middle person actually stands.
“More data automatically beats better data.”
A biased mountain loses to an honest hill: polling a million gym members about exercise tells you about gym members. How data was gathered outranks how much.
Recap
- Mean follows the money; median follows the middle; check which fits.
- Spread (σ) tells you what to expect around the center — always ask for it.
- Plot the shape before trusting any summary.