The wrong rule, made obvious
The sum rule was so obliging — derivative of a sum is the sum of derivatives — that you might expect the same for products. Surely the derivative of a product is the product of the derivatives?
It is not. The cleanest disproof is one line. Take . The wrong rule predicts . The right answer (you already know) is .
So whatever the rule for products is, it has to recover from . We are about to derive it geometrically, and the answer will write itself.
Two strips plus a vanishing corner
A product is the area of a rectangle with sides and . Drag the parameter below. Both and grow a little. The new area shows up as three pieces, not two.
- A top strip of width and height . Area: .
- A right strip of height and width . Area: .
- A tiny corner square of size .
Now shrink . The two strips shrink linearly with . The corner shrinks quadratically — it has both and in it, each going to zero. In the limit, the corner vanishes and the two strips survive.
The product rule
Divide the change in area by and send :
Read it as: derivative of the first times the second, plus the first times the derivative of the second. The “plus” matters. The wrong rule misses the second term entirely.
Sanity-check it on with , so :
And on a real example, :
Both terms have to be there.
A product, at a point
Differentiate using the product rule.
What is ?
Quotients
Division is just multiplication by a reciprocal. One quick-and-dirty option is to write and use the product rule with the power rule (Karpathy’s micrograd does exactly this and never needs a dedicated division op). But there is also a direct rule worth knowing:
The mnemonic is centuries old: low d-high minus high d-low, square the bottom and away we go. The order in the numerator matters — flip and and you get the negative of the right answer.
A quotient, at a point
Let .
Compute to three decimals.
Two more rules in the bag
You now have:
These let you differentiate any product or ratio of the elementary functions from the previous lesson. What you cannot yet do is differentiate a composition — a function inside another, like or or the sigmoid that every neural network in the world is built on.
That is the chain rule, and it is the load-bearing piece of this entire module.
Lesson complete