Orthogonality, projections & least squares

Perpendicularity sounds like a geometric curiosity. It's actually the engine of approximation: orthogonal directions can be measured independently, and projecting onto them gives the best possible simplified version of anything — data, signals, images.

Build the intuition

Projection: the shadow that's closest

Project vector v onto direction u: the shadow point is the closest point to v along u, and the dot product computes it. The leftover (v minus its shadow) is perpendicular to u — error and estimate at right angles. That right angle is the signature of every optimal approximation.

\text{proj}_u(v) = \frac{v \cdot u}{u \cdot u}\, u

Orthogonal bases: coordinates without interference

In a basis of mutually perpendicular directions, each coordinate is found by one independent projection — measuring one never disturbs another, and dropping one removes exactly its contribution. Sines and cosines of different frequencies are such a perpendicular family: the reason a spectrum's bins can be read (and discarded) independently.

Least squares: projection wearing a lab coat

Fitting a line to scattered data points means solving unsolvable equations — too many demands, two knobs. The resolution: project the data vector onto the plane of achievable predictions; the result minimizes total squared error. Every regression, trendline, and calibration curve is this one geometric move.

A^TA\,\hat{x} = A^T b

See it move

InteractiveVectors: arrows that add

a →2

a ↑1

b →-1

b ↑2

a = (2, 1) + b = (-1, 2) = (1, 3) — tip to tail. Dot product 0: the two are nearly perpendicular — they share almost nothing.

Steer b until its dot with a hits zero: you've separated estimate from error. Projection lives entirely in this readout.

A worked example

Fit a line by projection

Data: at x = 0, 1, 2 you measured y = 1, 3, 4. Want y = mx + c through all three — impossible (they're not collinear).
Least squares asks: which (m, c) makes the predictions closest to (1, 3, 4)? The normal equations give m = 1.5, c = 1.17.
Check the signature: the residuals (−0.17, +0.33, −0.17) are orthogonal to both the constant and the x direction — the error has no fittable component left. That's optimality, verified by a right angle.

Out in the world

Compression is dropped coordinates

JPEG and MP3 express blocks of image or sound in an orthogonal wave basis, then keep only the largest coordinates. Orthogonality guarantees what's dropped takes only its own contribution with it — why aggressive compression degrades gracefully instead of catastrophically.

Common confusion, cleared

“Least squares is a statistics recipe, separate from geometry.”

It is geometry: project the data onto the model's reachable subspace. The famous normal equations just say “make the error perpendicular to everything you can adjust.”

“Any basis is as good as any other.”

Skewed bases make coordinates interdependent — change one, recompute all. Orthogonal bases decouple them. That convenience is why Fourier, JPEG, and statistics all engineer orthogonality in.

Recap

Projection finds the closest point; the error stands perpendicular.
Orthogonal bases give independent, droppable coordinates.
Least squares = projecting data onto achievable predictions — geometry as estimation.

Progress saves in this browser.