Freeze one knob. Drag the other.
The function below has two inputs ( and ). The two panels show what happens when you freeze one and slide the other.
Drag the slider with frozen. The left panel’s curve is the slice of at the current : a one-variable function. It has a slope. That slope is a perfectly ordinary derivative, except its name has the word “partial” in front.
Same trick on the right, with frozen. Same rule.
That’s the whole concept. Calculus already had a tool for one-variable slopes; partial derivatives just say “freeze everything you’re not currently asking about, then use the tool you already have.”
Two pictures of a multivariable function
A function of two variables lives in a 3D picture: height is a surface sitting over the plane. Pretty, but bad at high dimensions.
A more useful picture is the contour view: a topographic map. Draw curves of constant height for a few values of . Tighter contours = steeper ground. We’ll spend most of our time here, because contours collapse the height axis and leave a 2D picture in input space, where the action is. (Neural-network losses have millions of inputs and one output. There’s no surface to draw. Contours are the only picture that scales.)
Notation
Three spellings of the same thing:
- : explicit, textbook-y, the one you’ll see most often. The curly (“partial”) distinguishes it from .
- : terse subscript form. The subscript is the variable being differentiated.
- or : compact, common in physics and ML matrix-calculus writeups.
All three mean “derivative of with everything except held constant.” Pick whichever is clearest in context. We’ll lean on for definitions and for quick calculation.
Just use the rules you already have
To compute for :
- Treat as a constant. is a block of lead sitting on the table.
- Differentiate as if were just, say, the number 7. . And doesn’t contain , so it’s constant with respect to : its partial is .
- Answer: .
To get : swap roles. Freeze , differentiate the -parts. . . Sum: .
No new machinery. The power rule, the sum rule, and the constant-multiple rule all still apply. The only new habit is noticing which symbol is the variable you’re currently paying attention to.
Compute a partial
For , compute at the point .
(You can verify with the playground above: set , , read the left panel’s slope.)
And the other one
Same . Now compute at .
The Clairaut surprise
Take the partial of a partial. : differentiate with respect to first, then with respect to . Call that .
You could also go the other way: differentiate with respect to first, then . Call that .
Common sense says those might differ: different order of operations, different answers. Common sense is wrong. For any reasonably smooth function, they’re equal. This is Clairaut’s theorem (sometimes called Schwarz’s theorem):
It takes a real proof to establish but is easy to verify on examples. Verify on one.
Verify Clairaut
For , compute at .
Try computing the other way too. Clairaut says you’ll get the same number.
Lesson complete