Skip to content
LearnMathora

Probability & statistics · 04 · Reading a distribution · 8 min

Easy

Shape, percentiles & spread

A single average hides almost everything interesting about data. The shape of a distribution — where it piles up, how it spreads, whether it leans — is where the real information lives. This lesson teaches you to read that shape from figures.

Build the intuition

Percentiles & quartiles: position by rank

The 90th percentile is the value below which 90% of the data falls. Quartiles cut the data into four equal-count groups: Q1 (25th), the median (50th), Q3 (75th). 'Your baby is in the 60th percentile for height' means 60% of babies are shorter — a position, not a score. Percentiles describe data by rank, immune to outliers.

The IQR and the box plot

The interquartile range, IQR = Q3 − Q1, captures the middle 50% of the data in one number — a spread measure that ignores extremes. A box plot draws it: a box from Q1 to Q3, a line at the median, whiskers to the typical range, and dots for outliers. One glance reveals center, spread, skew, and anomalies.

IQR=Q3Q1\text{IQR} = Q_3 - Q_1

Skewness: which way it leans

Symmetric data (heights) has a centered peak with matching tails. Right-skewed data (incomes, house prices, wait times) has a long right tail — a few huge values pulling the mean above the median. Left-skewed leans the other way. The mean-vs-median gap is a skewness detector: mean above median means right-skewed, and the median is the more honest 'typical' value.

See it move

InteractiveThe bell curve
0
1
1
Mean 0, spread σ = 1. Within ±1σ of the mean lives 68.3% of everything. (±1σ ≈ 68%, ±2σ ≈ 95% — the most useful rule of thumb in statistics.)

Start with the symmetric bell, then carry its ±σ and percentile intuition to skewed real-world data — where mean and median part ways.

A worked example

Read a salary box plot

  1. A company's salaries: Q1 = $52k, median = $68k, Q3 = $85k, with dots out near $400k (executives).

  2. IQR = 85 − 52 = $33k — the middle half spans that range.

  3. The long upper whisker and far outliers mean right skew: the mean salary is dragged well above the $68k median.

  4. Quoting the mean here would mislead; the median $68k is the honest 'typical' salary. The box plot showed it instantly.

Out in the world

ML feature diagnostics

Before training, data scientists box-plot every feature: skewed features get log-transformed, outliers get investigated, and weird shapes reveal data-quality bugs. Reading distribution shape is step one of every serious modeling pipeline.

Common confusion, cleared

The 90th percentile means a score of 90%.

It means 90% of values fall below it — a rank position. The actual value could be anything; percentile is about standing, not magnitude.

Outliers should always be removed.

Sometimes they're errors; sometimes they're the most important data (fraud, failures, breakthroughs). Box plots flag them for investigation, not automatic deletion.

Check yourself

PracticeQuick check

  1. Mean = $71k but median = $58k. The distribution is…

Recap

  • Percentiles and quartiles describe data by rank, resisting outliers.
  • IQR = Q3 − Q1 is the middle-50% spread; box plots draw it at a glance.
  • Skew shows as a mean–median gap and a long tail — trust the median when skewed.

Progress saves in this browser.