Algebra I & II · 35 min

Wiring machines together

A function is a machine with one input and one output, every single time. Composition wires one machine's output into the next one's input, which is exactly what a neural network is.

0 / 0

A function is a machine

A function is a rule with one job: take an input, produce an output. Two conditions, no more.

  • It is deterministic: the same input always yields the same output.
  • It produces exactly one output per input. Never zero, never two.

That’s it. You already write these all day. A function in your language of choice, one that takes an argument and returns a value with no surprises, is a mathematical function. School made this feel exotic; it isn’t. It’s a pure function.

A function is not required to be a formula. It can be a lookup table, a piecewise rule with an if in it, a trained model with a billion parameters. As long as one input deterministically gives one output, it’s a function. “Function equals formula” is a school habit worth dropping today.

Domain, codomain, range

A machine comes with a spec sheet.

The domain is the set of legal inputs, the values you’re allowed to feed it. The codomain is the declared universe the outputs live in. The range is the set of outputs the machine actually produces as the input sweeps the whole domain.

For f(x)=x2f(x) = x^2 with domain “all real numbers”: you may feed it anything, but the range is only the non-negative numbers, because a square is never negative. Domain is the inputs you’re allowed; range is the outputs you really get.

The vertical-line test is just the “exactly one output” rule, drawn. If any vertical line hits a graph twice, that xx has two yy‘s, and the graph is not a function.

f(x) is evaluation, not multiplication

The notation f(x)f(x) trips people because it looks like ff times xx. It is not.

f(x)f(x) means: the output of machine ff when the input is xx. It is a function call. If it helps, read it as f.call(x) or f(x) the way you’d write it in code, because that is precisely what it is.

So f(7)f(7) doesn’t multiply anything. It runs the machine ff on the input 77 and names whatever comes out.

Run the machine

Let f(x)=3x2f(x) = 3x - 2. Evaluate the function at the input 77.

What is f(7)f(7)?

Composition: wire the output into the next input

Now the move this whole lesson is built around. Take two machines and wire them in series: send the output of the first straight into the input of the second.

That’s composition, written (gf)(x)=g(f(x))(g \circ f)(x) = g(f(x)). Run ff on xx, take what comes out, run gg on that.

Function Machines g ∘ f
A2x + 1B

Click Machine A's output port, then Machine B's input port to wire them together.

composite:

The rightmost machine in g(f(x)) runs FIRST — feed the ball into the first machine, then the second. Swap the wires and run again — usually a different answer!

Pick a rule for each machine, draw the wire from one output port to the next input port, then drop an input value in and run it. Watch the value get relabeled as it clears each machine. That pipeline, an input flowing through stacked machines, is not a metaphor for a neural network. It is the shape of one.

Read composition right to left

The notation g(f(x))g(f(x)) is read from the inside out. The input xx touches ff first, because ff is the innermost machine, the one wrapped directly around xx. Then gg runs on the result.

This catches everyone once: gfg \circ f does not mean “do gg first.” The machine nearest the input runs first. In g(f(x))g(f(x)), that’s ff. Trace the parentheses from the center outward and you’ll never get the order wrong.

Compose two machines

Let f(x)=x+3f(x) = x + 3 and g(x)=2xg(x) = 2x. Compute (gf)(4)(g \circ f)(4).

Run the inner machine ff first, then feed its output to gg. What is (gf)(4)(g \circ f)(4)?

Order matters

Swap the wiring and you usually get a different machine. With the same f(x)=x+3f(x) = x + 3 and g(x)=2xg(x) = 2x:

(gf)(4)=g(f(4))=g(7)=14(g \circ f)(4) = g(f(4)) = g(7) = 14(fg)(4)=f(g(4))=f(8)=11(f \circ g)(4) = f(g(4)) = f(8) = 11

Different answers. In general gffgg \circ f \neq f \circ g. Composition is not commutative.

This is not a technicality. It’s the reason a neural network’s layer order is a design decision. Stack attention then a feed-forward block, or the other way around, and you have built two different functions. The wiring order is part of the architecture.

The other order

Same f(x)=x+3f(x) = x + 3 and g(x)=2xg(x) = 2x. Now compute (fg)(4)(f \circ g)(4), the other order.

What is (fg)(4)(f \circ g)(4)?

The deep-network preview

Wire three machines, then four, then ninety-six. An input enters, gets transformed, passes to the next stage, again, again, and a prediction falls out the end.

That run, input flowing forward through a stack of composed machines, has a name you’ll hear from module 11 onward: the forward pass. A deep network is f96(f2(f1(x)))f_{96}(\dots f_2(f_1(x)) \dots). What’s missing from today’s version is only the content of each machine: a matrix multiply and a nonlinear bend, which modules 7 and 11 supply. The skeleton, composition, you already have.

Inverses: running the machine backwards

Every machine raises a question: can you undo it? The inverse f1f^{-1} is the machine that takes ff‘s output and hands back the original input. Composing a function with its inverse gets you nowhere, which is the point: f1(f(x))=xf^{-1}(f(x)) = x.

One notation warning. f1f^{-1} does not mean 1/f1/f. The 1-1 is not a reciprocal exponent here; it’s the symbol for “the inverse machine.” f1(x)f^{-1}(x) and 1f(x)\tfrac{1}{f(x)} are unrelated. Different operation, unlucky collision of notation.

To find an inverse, run the construction backwards. For f(x)=5x+1f(x) = 5x + 1: ff multiplies by 55 then adds 11, so f1f^{-1} subtracts 11 then divides by 55, giving f1(x)=x15f^{-1}(x) = \tfrac{x - 1}{5}. It’s the outside-in peel from module 1, packaged as its own function.

Undo the machine

Let f(x)=5x+1f(x) = 5x + 1. Its inverse is f1(x)=x15f^{-1}(x) = \tfrac{x - 1}{5}.

Evaluate the inverse at 1111. What is f1(11)f^{-1}(11)?

Inverses need the right domain

Not every machine can be run backwards. f(x)=x2f(x) = x^2 sends both 22 and 2-2 to 44. Asked to invert 44, the machine can’t choose; an inverse would need to return two values, and then it isn’t a function.

The fix is not to give up, it’s to restrict the domain. Limit f(x)=x2f(x) = x^2 to inputs x0x \ge 0 and now every output traces back to exactly one input. On that restricted domain f1(x)=xf^{-1}(x) = \sqrt{x} is a genuine function.

Domain restriction isn’t bureaucracy. It is the thing that makes the inverse exist. A function can be inverted exactly when it’s one-to-one, and choosing the domain is how you make it one-to-one.

Where this goes next

You can build machines, wire them in series, and run them backwards. Composition is the forward pass; inverses are the seed of running a computation in reverse, which module 12 will turn into backpropagation.

g(f(x)) is one layer wired into the next. log(∏) = ∑(log) is why a million tiny probabilities don’t sink training. Everything that follows is those two facts at scale, and you just built the first one with your own hands.

Lesson complete

Nice tinkering.