Pre-calculus: the limit intuition · 30 min

The number that makes calculus clean

Meet e two completely different ways and watch them land on the same number. Recap the natural log. Then zoom into a smooth curve until it becomes a straight line, the single picture the next module turns into the derivative.

0 / 0

A number that climbs out of a limit

Back in m2 you were told, with a promissory note, that ee is “the base that makes the calculus clean.” Time to cash that note.

Start with an expression that looks like it should be boring: (1+1/n)n(1 + 1/n)^n. Drive nn up the slider in the left panel.

the limit

(1 + 1/n)n

22.22.42.62.83
value 2.00000

the slope-1 base

y = bx

-2-112123
slope at x=0 ln(b) = 0.6931

drag n toward one million on the left, drag b until the slope locks at 1 on the right

At n=1n = 1 it is just 22. At n=10n = 10 it is about 2.5942.594. At n=100n = 100, about 2.7052.705. At n=1000n = 1000, about 2.7172.717. Push nn to a million and the dot has all but stopped moving, parked against a marker at roughly 2.718282.71828.

This is a limit, exactly the kind from the last lesson. As nn runs off to infinity, (1+1/n)n(1 + 1/n)^n does not blow up and does not collapse. It settles, onto one specific number.

Read the climb

Set the left slider to n=10,000n = 10{,}000.

What value does (1+1/n)n(1 + 1/n)^n show? (To four decimals.)

That number is e

The number the climb settles on is ee:

e=limn(1+1n)n2.71828e = \lim_{n\to\infty} \left(1 + \frac{1}{n}\right)^n \approx 2.71828

It is irrational, like π\pi, and it is not chosen by committee. It is forced: it is wherever that limit happens to land. So far, though, it just looks like a curiosity. Why would anyone build a transformer’s softmax out of this particular number?

The answer is in the right panel.

The base whose slope is 1

The right panel draws y=bxy = b^x and lets you drag the base bb. Every one of these curves passes through (0,1)(0, 1), because b0=1b^0 = 1 for any base. What differs is how steeply each one leaves that point.

the limit

(1 + 1/n)n

22.22.42.62.83
value 2.00000

the slope-1 base

y = bx

-2-112123
slope at x=0 ln(b) = 0.6931

drag n toward one million on the left, drag b until the slope locks at 1 on the right

The widget draws the tangent line at x=0x = 0, the straight line the curve is travelling along as it crosses the yy-axis, and reports its slope. Drag bb and watch the slope readout:

  • Small base, say b=2b = 2: the curve leaves (0,1)(0,1) at a gentle slope, less than 1.
  • Large base, say b=4b = 4: it leaves steeply, slope more than 1.

Somewhere between them is exactly one base where the curve crosses with slope exactly 1. The readout locks green right there. And the base where that happens is ee.

The slope at zero

Drag bb to ee in the right panel. The tangent to y=exy = e^x at x=0x = 0 locks onto a clean value.

What is the slope of y=exy = e^x at x=0x = 0?

Why that makes calculus clean

Two definitions, ee as the limit of (1+1/n)n(1+1/n)^n and ee as the slope-1 base, point at the same number. That is not a coincidence; it is two windows onto one object.

And the slope-1 property is the whole reason ee matters. Because exe^x leaves every point travelling at a slope equal to its own height, exe^x turns out to be the one function that is its own rate of change. The next module proves that. For now, hold onto the headline: ee is not “about 2.7.” It is the base that makes the calculus clean, and you have now seen, with your own hands, the property that earns it that title.

The natural log, briefly

If exe^x is a function, it has an inverse: the function that undoes it. That inverse is the natural logarithm, lnx=logex\ln x = \log_e x.

It is defined only for x>0x > 0, because exe^x is always positive, so positive numbers are the only things ln\ln can be handed back. It obeys every log law from m2, now with base ee:

ln(xy)=lnx+lny,ln(xp)=plnx\ln(xy) = \ln x + \ln y, \qquad \ln(x^p) = p\,\ln xln1=0,lne=1,ln(ex)=x\ln 1 = 0, \qquad \ln e = 1, \qquad \ln(e^x) = x

That last one is just “the inverse undoes the function.” Whatever power you raised ee to, ln\ln reads it straight back off.

Log of one

lnx\ln x asks: to what power must you raise ee to get xx?

What is ln(1)\ln(1)?

Log of a power

ln\ln is the inverse of exe^x, so ln(ex)=x\ln(e^x) = x.

What is ln(e3)\ln(e^3)?

The one picture under all of calculus

Now the idea this whole module has been building toward.

Take any smooth curve. Pick a point on it. Zoom in. Then zoom in again. Then again.

-2-11234-2-11234
zoom 1x
point (1.000, 1.000)

Keep zooming. The curve becomes a straight line. The next module gives that line its name.

drag the dot · or tab + arrow keys

Drag the point onto y=x2y = x^2, then zoom: 10 times, 100 times, 1000 times. Watch what happens. The parabola, unmistakably curved at full view, flattens. By 1000x magnification it is indistinguishable from a straight line.

This is local linearity. Up close, a smooth curve looks like a straight line. Every smooth function, everywhere, is secretly linear if you look closely enough. The curve has not changed; your window has, and at small enough scale curvature simply cannot be seen.

(One caution the widget handles for you: it zooms both axes by the same factor. Stretch one axis more than the other and any curve flattens trivially. The flattening here is real.)

The line has a name, almost

That straight line the curve becomes has a slope. At a different point on the curve it would be a different line with a different slope.

This module stops one step short on purpose. You have built the picture: zoom into a smooth curve and it becomes a line. The next module does the one remaining thing, it measures that line’s slope and gives it a name: the derivative. Every parabola, every exponential, every sine has a derivative, and it is nothing more exotic than the slope of the line you just watched appear.

Zoom in somewhere else

Drag the point to x=1x = -1 on y=x2y = x^2 and zoom in. The curve becomes a line again, but a differently tilted one.

What slope does that line have? (At x=1x = 1 it was +2+2; use the parabola’s symmetry.)

Where this shows up: the whole arc

You now hold both ends of a transformer’s loss machinery. Softmax, the layer that turns raw scores into probabilities, is exe^x in a costume. Cross-entropy, the loss that scores the prediction, is ln\ln in a costume. The two functions this lesson made you fluent in are the first and last things a transformer touches when it grades itself.

And the picture you just made, zoom into a smooth curve until it looks straight, is the entire engine of how the network learns. The next module turns that line into the derivative. The module after that runs the derivative backward through every layer, and that backward pass is training. You have built the intuition; calculus is next.

Lesson complete

Nice tinkering.