Convolution

Convolution is how an LTI system actually computes its output: at every moment, a weighted blend of the input's recent past. It sounds exotic; you've done it whenever you've averaged a few days of temperatures to see the trend.

Build the intuition

Flip, slide, multiply, sum

To convolve input x with kernel h: at each output position, lay the (flipped) kernel over the input, multiply overlapping values, and total them. Then slide one step and repeat. The kernel is the system's memory — its weights decide how much each recent input moment matters now.

y[n] = \sum_k x[k]\, h[n-k]

The moving average is convolution with training wheels

Average each point with its neighbors — kernel [⅓, ⅓, ⅓] — and noise cancels while trends survive: a smoothing filter. Make the kernel longer for more calm, shorter for more detail. Every choice of kernel is a different system: smoothers, sharpeners, echoes, edge-detectors. The kernel is the personality.

The impulse response is the kernel

Feed an LTI system a single unit kick (the impulse) and the output traces out exactly h — the kernel itself. That's why one clap reveals a concert hall: the input was an impulse, so the recording was the hall's h. Convolution then predicts the hall's effect on any music. Last lesson's promise, kept.

y = x * h

See it move

InteractiveConvolution: the moving average

Kernel width9

Position60

At each position, the kernel (gold window) overlaps 9 input values, multiplies, and totals — producing one output point (orange). Slide it across the whole signal and the output curve emerges. Width 9: noise fades, edges soften.

Flip, slide, multiply, sum — performed live. The gold window is the kernel; each position it visits contributes one orange output point.

A worked example

Convolve by hand, once

Input x = [1, 2, 3], kernel h = [1, 1] (a two-point summer).
Slide and sum: y[0] = 1·1 = 1; y[1] = 1·1 + 2·1 = 3; y[2] = 2 + 3 = 5; y[3] = 3.
Result:
$y = [1, 3, 5, 3]$
Notice the output is longer than the input (3 + 2 − 1 = 4 points) — the kernel's memory smears the signal outward. Do this once by hand and convolution is never mystical again.

Out in the world

Convolutional neural networks

The “convolutional” in CNN is literally this operation: small learned kernels slide across an image, detecting edges, textures, then eyes and wheels in deeper layers. Computer vision's revolution was the discovery that the right kernels can be learned — but the sliding machinery is this lesson's.

Common confusion, cleared

“Convolution is just multiplication of two signals.”

Point-wise multiplying two signals is a different operation entirely. Convolution slides one across the other, blending neighborhoods — the result depends on alignment and memory, not just matching points.

“The flip in “flip and slide” is a technicality to ignore.”

The flip is causality's bookkeeping: it ensures the kernel weights the input's past, in the right order. Skip it and echoes arrive before the sound.

Check yourself

PracticeQuick check

An LTI system's response to a single unit impulse is…

Recap

Convolution: at each position, a kernel-weighted blend of the input.
The kernel = the impulse response = the system's complete personality.
Smoothing, blur, reverb, and CNNs are all the same operation.

Progress saves in this browser.