Find the direction the function climbs fastest.
The contour plot below shows . Drag anywhere. Then rotate the coral arrow , your “direction of travel.”
The right panel plots (the rate of change of when you walk in direction ) as a function of . As you rotate, that rate traces a curve.
Three things to find:
- Rotate until the rate is at its peak. There’s exactly one direction per point that’s “steepest uphill.”
- Rotate to the opposite direction. The rate is the negative of the peak: that’s “steepest downhill.”
- Rotate to perpendicular of either. The rate is zero; you’re walking along a contour, the function isn’t changing.
Press snap to ∇f on the widget to align with the special “peak” direction. Note its name. We’re going to call that direction the gradient and spend the rest of the lesson explaining it.
The definition
For , the gradient is the tuple of all partial derivatives:
The symbol is called “nabla” or just “del.” It means “compute every partial and stack them into a tuple.”
For : , , so .
(For our widget’s function , the gradient is . That’s the violet arrow you saw in the input plane.)
The gradient lives where the inputs live
This matters and the instinct is usually wrong, so we labour it for one sentence.
has 2 inputs and 1 output. Its graph is a surface in 3D. Its gradient is a vector in 2D, with the same number of components as the input. The gradient is NOT an arrow on the tilted surface. It lives in the input plane, next to the contour lines.
In higher dimensions this never looks right: a neural-network loss has millions of inputs and one output. The gradient is a vector in millions of dimensions. There’s no surface to draw. Contour-space reasoning is the only thing that scales.
Directional derivative: what the cosine curve was
For a unit vector (meaning ), the rate of change of in the direction , starting at point , is
Dot product the gradient with your unit direction. The number you get is the slope of along that direction at that point.
The cosine you watched on the right side of the widget is this dot product, plotted as rotates. From the geometric form :
- Same direction as (): rate is , the maximum.
- Perpendicular to : rate is , walking along the contour.
- Opposite to : rate is , the minimum.
You never have to redo the limit definition. One gradient, evaluated once, answers the rate in every direction.
Direct hit
Room temperature is . You’re at and walk in the direction (already a unit vector; check: ).
How fast does the temperature change per unit step?
Steepest ascent: the slogan
Two facts, bolted together, form one of the most important statements in calculus:
The gradient at a point tells you which direction increases the function fastest, and how fast it increases in that direction.
- Direction: (normalize to get a unit vector).
- Rate in that direction: .
- Direction of steepest descent: .
Machine learning is powered by one sentence: to reduce a loss, step a small amount in the direction . That’s gradient descent. You now know why it works and why it’s the right direction.
How fast does it climb?
For , what is the maximum rate of increase at ?
(Decimal to 3 places.)
Gradient ⊥ level curve
One more geometric fact, the one that makes contours useful as a visual. The gradient at is perpendicular to the level curve through .
One-line proof. Let trace a curve that stays on a level set, i.e., for all . Differentiate both sides with respect to . The right side is a constant, so . The left side, by a chain rule we formalize two lessons from now, is . So , meaning the gradient is perpendicular to the tangent direction of any curve sitting on a level set.
Visually: in a topographic map, the gradient points straight uphill, crossing contour lines at right angles. You’ll never see a gradient arrow go along a contour. It always goes across one, at 90°.
Check the perpendicularity
For , the level curve through is the circle . A tangent direction to that circle at is .
Compute .
Lesson complete