Skip to content
LearnMathora

Topic · full course live

Optimization & gradient descent

How machines learn — calculus turned into training.

Hard

How does a model actually learn? It defines what 'wrong' means (a loss), computes which way is downhill (the gradient), and steps that way over and over (gradient descent). This course is the engine room of machine learning — where calculus, linear algebra, and statistics combine into training.

Your progress

5 lessons

Start course →

The big ideas

Learning is minimizing a loss

A loss function scores how wrong the model is; training searches the parameter landscape for its lowest valley.

The gradient points the way

Partial derivatives bundle into the gradient — the steepest-uphill direction. Step against it to descend.

Backpropagation is the chain rule

Deep models are nested functions; the chain rule, run backward with reuse, computes every gradient efficiently.

Shape decides difficulty

Convex losses have one reachable minimum; non-convex ones (neural nets) have many — yet train well in high dimensions.

The course — start at lesson one

  1. 01Loss functions & objectivesWhat 'wrong' means; the loss landscape.
  2. 02Partial derivatives & the gradientOne knob at a time, then all at once — steepest descent.
  3. 03The chain rule & backpropagationNested functions, multiplied rates, and the engine of deep learning.
  4. 04Gradient descent & learning rateThe update loop, the step-size sweet spot, and SGD.
  5. 05Convexity & constrained optimizationOne valley vs many; Lagrange multipliers and SVMs.

Out in the world

Training neural networks

Backpropagation + stochastic gradient descent is how every deep model learns its billions of weights.

Regression & classification

Fitting any model is minimizing its loss — least squares, logistic regression, and beyond.

Support vector machines

Constrained optimization with Lagrange multipliers finds the widest-margin classifier.

This topic connects to