Multivariable Calculus: Partial Derivatives, Gradients, Jacobians · 16 min

The Gradient: every partial, bundled into one arrow

Pack all your partials into a tuple. That tuple is a direction, specifically the direction things get worse fastest. Or better fastest, depending on the sign.

0 / 0

Find the direction the function climbs fastest.

The contour plot below shows f(x,y)=x2+2y2f(x, y) = x^2 + 2 y^2. Drag p\mathbf{p} anywhere. Then rotate the coral arrow u\mathbf{u}, your “direction of travel.”

The right panel plots DufD_\mathbf{u} f (the rate of change of ff when you walk in direction u\mathbf{u}) as a function of θ\theta. As you rotate, that rate traces a curve.

f(x, y) = x² + 2y²

drag p · rotate u · watch the cosine emerge

-3-2-1123-2-112

Duf as you rotate u:

123456-3-2-1123

peak at θ = 50° (the angle of ∇f). Max value: 3.12 = |∇f|.

Three things to find:

  1. Rotate u\mathbf{u} until the rate is at its peak. There’s exactly one direction per point that’s “steepest uphill.”
  2. Rotate u\mathbf{u} to the opposite direction. The rate is the negative of the peak: that’s “steepest downhill.”
  3. Rotate to perpendicular of either. The rate is zero; you’re walking along a contour, the function isn’t changing.

Press snap to ∇f on the widget to align u\mathbf{u} with the special “peak” direction. Note its name. We’re going to call that direction the gradient and spend the rest of the lesson explaining it.

The definition

For f:RnRf: \mathbb{R}^n \to \mathbb{R}, the gradient is the tuple of all partial derivatives:

f(x)  =  (fx1,  fx2,  ,  fxn).\nabla f(\mathbf{x}) \;=\; \left(\, \frac{\partial f}{\partial x_1},\; \frac{\partial f}{\partial x_2},\; \dots,\; \frac{\partial f}{\partial x_n} \,\right).

The symbol \nabla is called “nabla” or just “del.” It means “compute every partial and stack them into a tuple.”

For f(x,y)=x2+y2f(x, y) = x^2 + y^2: fx=2x\dfrac{\partial f}{\partial x} = 2x, fy=2y\dfrac{\partial f}{\partial y} = 2y, so f=(2x,2y)\nabla f = (2x, 2y).

(For our widget’s function f=x2+2y2f = x^2 + 2y^2, the gradient is (2x,4y)(2x, 4y). That’s the violet arrow you saw in the input plane.)

The gradient lives where the inputs live

This matters and the instinct is usually wrong, so we labour it for one sentence.

ff has 2 inputs and 1 output. Its graph is a surface in 3D. Its gradient is a vector in 2D, with the same number of components as the input. The gradient is NOT an arrow on the tilted surface. It lives in the input plane, next to the contour lines.

In higher dimensions this never looks right: a neural-network loss has millions of inputs and one output. The gradient is a vector in millions of dimensions. There’s no surface to draw. Contour-space reasoning is the only thing that scales.

Directional derivative: what the cosine curve was

For a unit vector u\mathbf{u} (meaning u=1|\mathbf{u}| = 1), the rate of change of ff in the direction u\mathbf{u}, starting at point p\mathbf{p}, is

Duf(p)  =  f(p)u.D_{\mathbf{u}} f(\mathbf{p}) \;=\; \nabla f(\mathbf{p}) \cdot \mathbf{u}.

Dot product the gradient with your unit direction. The number you get is the slope of ff along that direction at that point.

The cosine you watched on the right side of the widget is this dot product, plotted as θ\theta rotates. From the geometric form fu=fcosθ\nabla f \cdot \mathbf{u} = |\nabla f| \cos\theta:

  • Same direction as f\nabla f (θ=0\theta = 0): rate is f|\nabla f|, the maximum.
  • Perpendicular to f\nabla f: rate is 00, walking along the contour.
  • Opposite to f\nabla f: rate is f-|\nabla f|, the minimum.

You never have to redo the limit definition. One gradient, evaluated once, answers the rate in every direction.

Direct hit

Room temperature is T(x,y)=100x22y2T(x, y) = 100 - x^2 - 2 y^2. You’re at (3,2)(3, 2) and walk in the direction u=(3/5,4/5)\mathbf{u} = (3/5, 4/5) (already a unit vector; check: 9/25+16/25=19/25 + 16/25 = 1).

How fast does the temperature change per unit step?

Steepest ascent: the slogan

Two facts, bolted together, form one of the most important statements in calculus:

The gradient f\nabla f at a point tells you which direction increases the function fastest, and how fast it increases in that direction.

  • Direction: f/f\nabla f / |\nabla f| (normalize to get a unit vector).
  • Rate in that direction: f|\nabla f|.
  • Direction of steepest descent: f/f-\nabla f / |\nabla f|.

Machine learning is powered by one sentence: to reduce a loss, step a small amount in the direction L-\nabla L. That’s gradient descent. You now know why it works and why it’s the right direction.

How fast does it climb?

For f(x,y)=x2+y2f(x, y) = x^2 + y^2, what is the maximum rate of increase at (1,2)(1, 2)?

(Decimal to 3 places.)

Gradient ⊥ level curve

One more geometric fact, the one that makes contours useful as a visual. The gradient at p\mathbf{p} is perpendicular to the level curve through p\mathbf{p}.

One-line proof. Let r(t)\mathbf{r}(t) trace a curve that stays on a level set, i.e., f(r(t))=cf(\mathbf{r}(t)) = c for all tt. Differentiate both sides with respect to tt. The right side is a constant, so d/dt=0d/dt = 0. The left side, by a chain rule we formalize two lessons from now, is f(r(t))r(t)\nabla f(\mathbf{r}(t)) \cdot \mathbf{r}'(t). So fr(t)=0\nabla f \cdot \mathbf{r}'(t) = 0, meaning the gradient is perpendicular to the tangent direction of any curve sitting on a level set.

Visually: in a topographic map, the gradient points straight uphill, crossing contour lines at right angles. You’ll never see a gradient arrow go along a contour. It always goes across one, at 90°.

Check the perpendicularity

For f(x,y)=x2+y2f(x, y) = x^2 + y^2, the level curve through (3,4)(3, 4) is the circle x2+y2=25x^2 + y^2 = 25. A tangent direction to that circle at (3,4)(3, 4) is t=(4,3)\mathbf{t} = (-4, 3).

Compute f(3,4)t\nabla f(3, 4) \cdot \mathbf{t}.

Lesson complete

Nice tinkering.