Multivariable Calculus: Partial Derivatives, Gradients, Jacobians · 15 min

Partial Derivatives: one knob at a time

Two inputs instead of one. Freeze one, move the other, and the ordinary derivative still works. That's all a partial derivative is.

0 / 0

Freeze one knob. Drag the other.

The function below has two inputs (xx and yy). The two panels show what happens when you freeze one and slide the other.

f(x, y) = x²y + 3y²

freeze one variable · slope of the 1D slice = partial derivative

slice at y = 1.00 → see ∂f/∂x

-2-112-224681012

∂f/∂x at (x, y) = 3.00

slice at x = 1.50 → see ∂f/∂y

-2-112-224681012

∂f/∂y at (x, y) = 8.25

The coral line is the tangent to each slice. Its slope is the partial. As you drag, both partials update independently; the function has two answers, one per direction.

Drag the xx slider with yy frozen. The left panel’s curve is the slice of ff at the current yy: a one-variable function. It has a slope. That slope is a perfectly ordinary derivative, except its name has the word “partial” in front.

Same trick on the right, with xx frozen. Same rule.

That’s the whole concept. Calculus already had a tool for one-variable slopes; partial derivatives just say “freeze everything you’re not currently asking about, then use the tool you already have.”

Two pictures of a multivariable function

A function of two variables lives in a 3D picture: height z=f(x,y)z = f(x, y) is a surface sitting over the (x,y)(x, y) plane. Pretty, but bad at high dimensions.

A more useful picture is the contour view: a topographic map. Draw curves of constant height f(x,y)=cf(x, y) = c for a few values of cc. Tighter contours = steeper ground. We’ll spend most of our time here, because contours collapse the height axis and leave a 2D picture in input space, where the action is. (Neural-network losses have millions of inputs and one output. There’s no surface to draw. Contours are the only picture that scales.)

Notation

Three spellings of the same thing:

  • fx\dfrac{\partial f}{\partial x}: explicit, textbook-y, the one you’ll see most often. The curly \partial (“partial”) distinguishes it from d/dxd/dx.
  • fxf_x: terse subscript form. The subscript is the variable being differentiated.
  • xf\partial_x f or if\partial_i f: compact, common in physics and ML matrix-calculus writeups.

All three mean “derivative of ff with everything except xx held constant.” Pick whichever is clearest in context. We’ll lean on fx\dfrac{\partial f}{\partial x} for definitions and fxf_x for quick calculation.

Just use the rules you already have

To compute fx\dfrac{\partial f}{\partial x} for f(x,y)=x2y+3y2f(x, y) = x^2 y + 3 y^2:

  • Treat yy as a constant. yy is a block of lead sitting on the table.
  • Differentiate as if yy were just, say, the number 7. x(x2y)=2xy\dfrac{\partial}{\partial x}(x^2 \cdot y) = 2 x y. And 3y23 y^2 doesn’t contain xx, so it’s constant with respect to xx: its partial is 00.
  • Answer: fx=2xy\dfrac{\partial f}{\partial x} = 2 x y.

To get fy\dfrac{\partial f}{\partial y}: swap roles. Freeze xx, differentiate the yy-parts. y(x2y)=x2\dfrac{\partial}{\partial y}(x^2 y) = x^2. y(3y2)=6y\dfrac{\partial}{\partial y}(3 y^2) = 6 y. Sum: fy=x2+6y\dfrac{\partial f}{\partial y} = x^2 + 6 y.

No new machinery. The power rule, the sum rule, and the constant-multiple rule all still apply. The only new habit is noticing which symbol is the variable you’re currently paying attention to.

Compute a partial

For f(x,y)=x2y+3y2f(x, y) = x^2 y + 3 y^2, compute fx\dfrac{\partial f}{\partial x} at the point (2,1)(2, 1).

(You can verify with the playground above: set x=2x = 2, y=1y = 1, read the left panel’s slope.)

And the other one

Same ff. Now compute fy\dfrac{\partial f}{\partial y} at (2,1)(2, 1).

The Clairaut surprise

Take the partial of a partial. y(fx)\dfrac{\partial}{\partial y}\left(\dfrac{\partial f}{\partial x}\right): differentiate ff with respect to xx first, then with respect to yy. Call that fxyf_{xy}.

You could also go the other way: differentiate with respect to yy first, then xx. Call that fyxf_{yx}.

Common sense says those might differ: different order of operations, different answers. Common sense is wrong. For any reasonably smooth function, they’re equal. This is Clairaut’s theorem (sometimes called Schwarz’s theorem):

2fxy  =  2fyx.\frac{\partial^2 f}{\partial x \partial y} \;=\; \frac{\partial^2 f}{\partial y \partial x}.

It takes a real proof to establish but is easy to verify on examples. Verify on one.

Verify Clairaut

For f(x,y)=x2y+xy3f(x, y) = x^2 y + x y^3, compute fxyf_{xy} at (1,2)(1, 2).

Try computing fyxf_{yx} the other way too. Clairaut says you’ll get the same number.

Lesson complete

Nice tinkering.