A function is a machine
A function is a rule with one job: take an input, produce an output. Two conditions, no more.
- It is deterministic: the same input always yields the same output.
- It produces exactly one output per input. Never zero, never two.
That’s it. You already write these all day. A function in your language of choice, one that takes an argument and returns a value with no surprises, is a mathematical function. School made this feel exotic; it isn’t. It’s a pure function.
A function is not required to be a formula. It can be a lookup table, a piecewise rule with an if in it, a trained model with a billion parameters. As long as one input deterministically gives one output, it’s a function. “Function equals formula” is a school habit worth dropping today.
Domain, codomain, range
A machine comes with a spec sheet.
The domain is the set of legal inputs, the values you’re allowed to feed it. The codomain is the declared universe the outputs live in. The range is the set of outputs the machine actually produces as the input sweeps the whole domain.
For with domain “all real numbers”: you may feed it anything, but the range is only the non-negative numbers, because a square is never negative. Domain is the inputs you’re allowed; range is the outputs you really get.
The vertical-line test is just the “exactly one output” rule, drawn. If any vertical line hits a graph twice, that has two ‘s, and the graph is not a function.
f(x) is evaluation, not multiplication
The notation trips people because it looks like times . It is not.
means: the output of machine when the input is . It is a function call. If it helps, read it as f.call(x) or f(x) the way you’d write it in code, because that is precisely what it is.
So doesn’t multiply anything. It runs the machine on the input and names whatever comes out.
Run the machine
Let . Evaluate the function at the input .
What is ?
Composition: wire the output into the next input
Now the move this whole lesson is built around. Take two machines and wire them in series: send the output of the first straight into the input of the second.
That’s composition, written . Run on , take what comes out, run on that.
Pick a rule for each machine, draw the wire from one output port to the next input port, then drop an input value in and run it. Watch the value get relabeled as it clears each machine. That pipeline, an input flowing through stacked machines, is not a metaphor for a neural network. It is the shape of one.
Read composition right to left
The notation is read from the inside out. The input touches first, because is the innermost machine, the one wrapped directly around . Then runs on the result.
This catches everyone once: does not mean “do first.” The machine nearest the input runs first. In , that’s . Trace the parentheses from the center outward and you’ll never get the order wrong.
Compose two machines
Let and . Compute .
Run the inner machine first, then feed its output to . What is ?
Order matters
Swap the wiring and you usually get a different machine. With the same and :
Different answers. In general . Composition is not commutative.
This is not a technicality. It’s the reason a neural network’s layer order is a design decision. Stack attention then a feed-forward block, or the other way around, and you have built two different functions. The wiring order is part of the architecture.
The other order
Same and . Now compute , the other order.
What is ?
The deep-network preview
Wire three machines, then four, then ninety-six. An input enters, gets transformed, passes to the next stage, again, again, and a prediction falls out the end.
That run, input flowing forward through a stack of composed machines, has a name you’ll hear from module 11 onward: the forward pass. A deep network is . What’s missing from today’s version is only the content of each machine: a matrix multiply and a nonlinear bend, which modules 7 and 11 supply. The skeleton, composition, you already have.
Inverses: running the machine backwards
Every machine raises a question: can you undo it? The inverse is the machine that takes ‘s output and hands back the original input. Composing a function with its inverse gets you nowhere, which is the point: .
One notation warning. does not mean . The is not a reciprocal exponent here; it’s the symbol for “the inverse machine.” and are unrelated. Different operation, unlucky collision of notation.
To find an inverse, run the construction backwards. For : multiplies by then adds , so subtracts then divides by , giving . It’s the outside-in peel from module 1, packaged as its own function.
Undo the machine
Let . Its inverse is .
Evaluate the inverse at . What is ?
Inverses need the right domain
Not every machine can be run backwards. sends both and to . Asked to invert , the machine can’t choose; an inverse would need to return two values, and then it isn’t a function.
The fix is not to give up, it’s to restrict the domain. Limit to inputs and now every output traces back to exactly one input. On that restricted domain is a genuine function.
Domain restriction isn’t bureaucracy. It is the thing that makes the inverse exist. A function can be inverted exactly when it’s one-to-one, and choosing the domain is how you make it one-to-one.
Where this goes next
You can build machines, wire them in series, and run them backwards. Composition is the forward pass; inverses are the seed of running a computation in reverse, which module 12 will turn into backpropagation.
g(f(x)) is one layer wired into the next. log(∏) = ∑(log) is why a million tiny probabilities don’t sink training. Everything that follows is those two facts at scale, and you just built the first one with your own hands.
Lesson complete