Random variables & distributions

A random variable is a number whose value is set by chance — a die roll, tomorrow's demand, a measurement's error. Its distribution is the full menu of possible values and their probabilities. A handful of named distributions describe a startling share of the random world.

Build the intuition

Random variables and their distributions

A random variable X assigns a number to each random outcome; its distribution lists how probability is spread across those numbers. Discrete variables (counts) have a probability per value that sums to 1; continuous variables (measurements) have a density curve whose area is 1, where probability lives in areas, not points.

Expectation and variance: center and spread of chance

The expected value E[X] is the long-run average — each value weighted by its probability, the balance point of the distribution. Variance measures how far outcomes typically scatter from that center (standard deviation is its square root, in the variable's own units). Together they summarize any distribution's location and width — the random-variable echo of mean and spread for data.

E[X] = \sum_x x\,P(x), \qquad \mathrm{Var}(X) = E[(X - E[X])^2]

The distributions worth knowing by name

Bernoulli: one yes/no trial. Binomial: successes in n trials (conversions, defects). Poisson: counts of rare independent events in a window (arrivals, typos). Uniform: total ignorance between bounds. Exponential: waiting time until the next event. Normal: the bell that sums and averages converge to. Each has a parameter or two that fully determines its shape — learn the stories and you can model most of the random world.

See it move

InteractiveThe distribution lab

p(success)0.5

Binomial. Successes in 12 independent tries at p = 0.5 — conversions, defects, heads. Skewed when p strays from ½; already going bell-shaped in the middle.

Six distributions, one stage. Switch between them, move each one's dial, and read where it shows up in the world — the named shapes of chance.

A worked example

Model support-ticket arrivals

A help desk gets 3 tickets per hour on average, arriving independently. How many next hour?
Independent rare events at a fixed rate → Poisson with λ = 3. P(exactly k) = e^{−3}3^k/k!
P(0 tickets) = e^{−3} ≈ 5%; the most likely counts are 2 and 3; P(7+) is small but nonzero.
Staffing, queue design, and SLA promises all flow from naming the distribution correctly.

Out in the world

Distributions are the priors of ML

Naive Bayes assumes feature distributions; linear regression assumes normal errors; generative models learn distributions outright. Choosing or learning the right distribution is where domain knowledge enters a model — and where wrong assumptions quietly wreck predictions.

Common confusion, cleared

“Expected value is the most likely outcome.”

It's the long-run average, which may be impossible: the expected value of a die is 3.5. It's a balance point, not a prediction of any single roll.

“For continuous variables, P(X = exactly 5) is some small number.”

It's exactly zero — continuous probability lives in intervals (areas under the density), not single points. You ask P(4.9 < X < 5.1), never P(X = 5).

Check yourself

PracticeQuick check

Counting independent rare events (calls per minute) is best modeled by…
The expected value of one fair six-sided die roll is…

Recap

A random variable's distribution maps values to probabilities (mass or density).
E[X] is the probability-weighted center; variance is the spread.
Bernoulli, Binomial, Poisson, Uniform, Exponential, Normal model most of the random world.

Progress saves in this browser.