A number that climbs out of a limit
Back in m2 you were told, with a promissory note, that is “the base that makes the calculus clean.” Time to cash that note.
Start with an expression that looks like it should be boring: . Drive up the slider in the left panel.
At it is just . At it is about . At , about . At , about . Push to a million and the dot has all but stopped moving, parked against a marker at roughly .
This is a limit, exactly the kind from the last lesson. As runs off to infinity, does not blow up and does not collapse. It settles, onto one specific number.
Read the climb
Set the left slider to .
What value does show? (To four decimals.)
That number is e
The number the climb settles on is :
It is irrational, like , and it is not chosen by committee. It is forced: it is wherever that limit happens to land. So far, though, it just looks like a curiosity. Why would anyone build a transformer’s softmax out of this particular number?
The answer is in the right panel.
The base whose slope is 1
The right panel draws and lets you drag the base . Every one of these curves passes through , because for any base. What differs is how steeply each one leaves that point.
The widget draws the tangent line at , the straight line the curve is travelling along as it crosses the -axis, and reports its slope. Drag and watch the slope readout:
- Small base, say : the curve leaves at a gentle slope, less than 1.
- Large base, say : it leaves steeply, slope more than 1.
Somewhere between them is exactly one base where the curve crosses with slope exactly 1. The readout locks green right there. And the base where that happens is .
The slope at zero
Drag to in the right panel. The tangent to at locks onto a clean value.
What is the slope of at ?
Why that makes calculus clean
Two definitions, as the limit of and as the slope-1 base, point at the same number. That is not a coincidence; it is two windows onto one object.
And the slope-1 property is the whole reason matters. Because leaves every point travelling at a slope equal to its own height, turns out to be the one function that is its own rate of change. The next module proves that. For now, hold onto the headline: is not “about 2.7.” It is the base that makes the calculus clean, and you have now seen, with your own hands, the property that earns it that title.
The natural log, briefly
If is a function, it has an inverse: the function that undoes it. That inverse is the natural logarithm, .
It is defined only for , because is always positive, so positive numbers are the only things can be handed back. It obeys every log law from m2, now with base :
That last one is just “the inverse undoes the function.” Whatever power you raised to, reads it straight back off.
Log of one
asks: to what power must you raise to get ?
What is ?
Log of a power
is the inverse of , so .
What is ?
The one picture under all of calculus
Now the idea this whole module has been building toward.
Take any smooth curve. Pick a point on it. Zoom in. Then zoom in again. Then again.
Drag the point onto , then zoom: 10 times, 100 times, 1000 times. Watch what happens. The parabola, unmistakably curved at full view, flattens. By 1000x magnification it is indistinguishable from a straight line.
This is local linearity. Up close, a smooth curve looks like a straight line. Every smooth function, everywhere, is secretly linear if you look closely enough. The curve has not changed; your window has, and at small enough scale curvature simply cannot be seen.
(One caution the widget handles for you: it zooms both axes by the same factor. Stretch one axis more than the other and any curve flattens trivially. The flattening here is real.)
The line has a name, almost
That straight line the curve becomes has a slope. At a different point on the curve it would be a different line with a different slope.
This module stops one step short on purpose. You have built the picture: zoom into a smooth curve and it becomes a line. The next module does the one remaining thing, it measures that line’s slope and gives it a name: the derivative. Every parabola, every exponential, every sine has a derivative, and it is nothing more exotic than the slope of the line you just watched appear.
Zoom in somewhere else
Drag the point to on and zoom in. The curve becomes a line again, but a differently tilted one.
What slope does that line have? (At it was ; use the parabola’s symmetry.)
Where this shows up: the whole arc
You now hold both ends of a transformer’s loss machinery. Softmax, the layer that turns raw scores into probabilities, is in a costume. Cross-entropy, the loss that scores the prediction, is in a costume. The two functions this lesson made you fluent in are the first and last things a transformer touches when it grades itself.
And the picture you just made, zoom into a smooth curve until it looks straight, is the entire engine of how the network learns. The next module turns that line into the derivative. The module after that runs the derivative backward through every layer, and that backward pass is training. You have built the intuition; calculus is next.
Lesson complete
Nice tinkering.
Before you go