Topic · full course live
Optimization & gradient descent
How machines learn — calculus turned into training.
How does a model actually learn? It defines what 'wrong' means (a loss), computes which way is downhill (the gradient), and steps that way over and over (gradient descent). This course is the engine room of machine learning — where calculus, linear algebra, and statistics combine into training.
Your progress
5 lessons
The big ideas
Learning is minimizing a loss
A loss function scores how wrong the model is; training searches the parameter landscape for its lowest valley.
The gradient points the way
Partial derivatives bundle into the gradient — the steepest-uphill direction. Step against it to descend.
Backpropagation is the chain rule
Deep models are nested functions; the chain rule, run backward with reuse, computes every gradient efficiently.
Shape decides difficulty
Convex losses have one reachable minimum; non-convex ones (neural nets) have many — yet train well in high dimensions.
The course — start at lesson one
- 01Loss functions & objectivesWhat 'wrong' means; the loss landscape.Medium
- 02Partial derivatives & the gradientOne knob at a time, then all at once — steepest descent.Medium
- 03The chain rule & backpropagationNested functions, multiplied rates, and the engine of deep learning.Hard
- 04Gradient descent & learning rateThe update loop, the step-size sweet spot, and SGD.Hard
- 05Convexity & constrained optimizationOne valley vs many; Lagrange multipliers and SVMs.Hard
Out in the world
Training neural networks
Backpropagation + stochastic gradient descent is how every deep model learns its billions of weights.
Regression & classification
Fitting any model is minimizing its loss — least squares, logistic regression, and beyond.
Support vector machines
Constrained optimization with Lagrange multipliers finds the widest-margin classifier.
This topic connects to