Topic · full course live

Optimization & gradient descent

How machines learn — calculus turned into training.

Hard

How does a model actually learn? It defines what 'wrong' means (a loss), computes which way is downhill (the gradient), and steps that way over and over (gradient descent). This course is the engine room of machine learning — where calculus, linear algebra, and statistics combine into training.

Your progress

5 lessons

Start course →

The big ideas

Learning is minimizing a loss

A loss function scores how wrong the model is; training searches the parameter landscape for its lowest valley.

The gradient points the way

Partial derivatives bundle into the gradient — the steepest-uphill direction. Step against it to descend.

Backpropagation is the chain rule

Deep models are nested functions; the chain rule, run backward with reuse, computes every gradient efficiently.

Shape decides difficulty

Convex losses have one reachable minimum; non-convex ones (neural nets) have many — yet train well in high dimensions.

The course — start at lesson one

Out in the world

Training neural networks

Backpropagation + stochastic gradient descent is how every deep model learns its billions of weights.

Regression & classification

Fitting any model is minimizing its loss — least squares, logistic regression, and beyond.

Support vector machines

Constrained optimization with Lagrange multipliers finds the widest-margin classifier.

This topic connects to

Eigenvectors & SVD →Statistics for machine learning →Correlation & regression →