Exponents, and the one rule that runs the show
for a counting number is copies of multiplied: . You met this in module 1.
From that, one rule follows immediately. Multiply by and you’ve lined up copies then more, so:
Hold onto this rule. It is the load-bearing one. Everything strange-looking about exponents, the zero, the negatives, the fractions, is forced by the demand that this rule keeps working.
What zero, negative, and fraction exponents must be
“Three copies of ” makes no sense for or or . So we don’t define them by counting. We define them by insisting stays true, and see what that forces.
Zero: . The only number that leaves unchanged when multiplied in is . So . Forced.
Negative: . So is whatever you multiply by to get , namely . Forced.
Fraction: . So is the number that squares to : . Forced.
None of these is an arbitrary convention. Each is the only value that keeps the product rule intact.
Exercise the laws
Simplify using the exponent laws. Combine the powers inside, then apply the outer exponent.
What is the value?
A fractional exponent
Evaluate . The denominator is a root, the numerator is a power: take the cube root of first, then square it.
What is the value?
The exponential function
So far the exponent moved. Now flip it: fix the base, let the exponent be the variable. That’s the exponential function:
A line adds a constant per step. An exponential multiplies by a constant per step: every time goes up by , the output is multiplied by . gives growth; gives decay. Populations, compound interest, radioactive material, and the loss curves you’ll stare at later all live on this shape.
The logarithm undoes the exponential
The exponential traps a variable up in the exponent. , what is ? You need the machine that runs backwards. That machine is the logarithm:
In words: answers the question “to what power must I raise to get ?” So , because .
That’s the entire definition. The logarithm is not a calculator button you press and trust. It is the inverse function of the exponential, exactly the inverse-machine idea from the functions lesson, applied to . Its domain is , because never produces zero or a negative, so its inverse is never asked about them.
Ask the logarithm's question
Evaluate from the definition: to what power must you raise to get ?
What is ?
The log laws come from the exponent laws
Because the logarithm is the exponential’s inverse, each exponent law reflects into a log law. The product rule is the important one. Reflected through the logarithm it becomes:
A logarithm turns a product into a sum. Its siblings: turns a quotient into a difference, and turns a power into a multiple.
One caution, the single most common log error: this is the log of a product. The log of a sum is nothing nice. does not equal . The sum of the logs is the log of the product, never the log of the sum.
e and ln, on credit
You’ll see one base constantly: , an irrational constant, and its logarithm , the natural log.
Why and not ? Because is the base that makes calculus come out clean: the exponential is the one function that is its own rate of change. That sentence can’t be cashed in yet, it needs module 5. For now, take it on credit: is just a base, a specific number near , and is its logarithm. Module 5 will tell you why it’s the base. Until then, every log law above works with exactly as written.
The identity we live by
Here is the payoff the whole module has been walking toward.
A model that predicts a sequence assigns a probability to each piece, then multiplies them for the joint probability: , a product of millions of numbers each between and .
Multiplying millions of small numbers is a disaster on a real computer. The product shrinks past the smallest number the machine can represent and silently collapses to exactly . That’s underflow, and once it happens every trace of the real value is gone.
The logarithm rescues this. Because turns products into sums:
Drag the three probabilities tiny. The multiply track collapses to and dies. The sum-of-logs track keeps the real number, every time. That collapse, and that survival, is the demo of the lesson.
Why a sum, specifically
Switching from a product to a sum of logs buys two things, and a model is trained on both.
It survives the arithmetic. A sum of a million moderate negative numbers is an ordinary, representable number. The product those logs came from would have underflowed to zero long ago. The sum is the only form that physically fits in the machine.
It can be differentiated cheaply. Module 5 will show that the rate of change of a sum is just the sum of the rates of change of its parts. A product of a million terms has no such mercy. Training means nudging parameters using exactly those rates, so the loss has to be a sum. That is why every loss function you’ll meet is written , a sum of logs, not a product of probabilities.
A negative log-likelihood
A model assigns probability to an event. Its negative log-likelihood, the loss contribution, is .
Find first, then negate it. What is the negative log-likelihood?
Where this goes next
That’s module 2. You can reshape expressions, graph lines, solve systems, wire functions into pipelines, bend parabolas, and turn a doomed product into a workable sum of logs.
g(f(x)) is one layer wired into the next, the forward pass you built in the functions lesson. log(∏) = ∑(log) is why a million tiny probabilities don’t sink training, the identity you just watched rescue a computation. Modules 8 and 9 cash in the logarithm as the loss function. Module 11 cashes in composition as the network. Everything that follows is these two facts at scale, and you now hold both.
Lesson complete
Nice tinkering.
Before you go