Lessons · Tinker

Module 1 drafting

Pre-algebra Refresh

Number line intuition, arithmetic fluency, negatives, fractions as division, order of operations, variables as placeholders, one-variable equations.

01 Quantities live on a line Every number, including the negative ones, is a position on one straight line. Distance from zero has a name, and you'll use it for the rest of the course. 18 min
02 There are only two operations Subtraction is just adding a negative. Division is just multiplying by a reciprocal. Four operations collapse into two, each with an undo button. 22 min
03 A fraction is one number a/b means "a divided by b," one point on the line. Fractions, decimals, and percents are three spellings of that single point. 25 min
04 Expressions are trees Operator precedence isn't a slogan to memorize; it's a parser. Every expression has exactly one tree, and evaluating it is a climb up that tree. 20 min
05 Variables, expressions, equations A variable is a named box. An equation is a constraint. Solving is pressing undo buttons on the expression tree, from the outside in. 28 min

Module 2 drafting

Algebra I & II

Linear equations & graphing, slope, systems, quadratics, functions as machines, function composition, polynomials, exponents, logarithms.

Module 3 drafting

Trigonometry: compact

Unit circle, sine and cosine, angle addition, rotations in 2D, Pythagoras, polar coordinates. Only what we'll actually use.

Module 4 drafting

Pre-calculus: the limit intuition

The ten parent functions and one grammar that bends them all, sequences and the first infinite process, informal limits by table and zoom, continuity, and the number e two ways.

Module 5 drafting

Single-variable Calculus: Derivatives & the Chain Rule

The most important module in the first half of the course. The derivative. The chain rule. THIS is what runs when you call `loss.backward()`.

Module 6 drafting

Multivariable Calculus: Partial Derivatives, Gradients, Jacobians

Functions of many variables. The gradient as a vector pointing uphill. The Jacobian as a gradient for vector-valued functions.

Module 7 drafting

Linear Algebra

Vectors, matrices as linear transformations, dot products, matrix multiplication as composition, determinants, eigenvalues, SVD intuition.

Module 8 drafting

Probability & Statistics

Sample spaces, conditional probability, Bayes, random variables, PMF/PDF/CDF, expectation, the Gaussian, the CLT, sampling.

Module 9 drafting

Information Theory Basics

Surprise = −log p. Entropy as average surprise. Cross-entropy. KL divergence. Why cross-entropy is the right classification loss.

Module 10 drafting

Optimization

Minimizing a scalar function. Gradient descent. Learning rate. SGD. Momentum. RMSProp. Adam. Loss-landscape pathologies.

Module 11 drafting

Neural Network Fundamentals

Perceptron. Activations (ReLU, GELU, sigmoid, tanh). The XOR problem. Multilayer perceptrons. Forward pass as matrix multiplies and nonlinearities.

Module 12 drafting

Backpropagation from Scratch

The keystone module of the course. Build micrograd node by node. Computational graph editor. Watch gradients flow backward through a tanh.

Module 13 drafting

Training Dynamics & Modern Tricks

Over/underfitting. Train/val/test splits. L2. Dropout. Weight init. BatchNorm. LayerNorm. Residual connections. LR warmup.

Module 14 shipped

Sequence Modeling: Bigrams to RNNs

From a bigram count table to an RNN. Tokens, the chain rule of probability, perplexity, sampling, fixed-context MLPs, and the recurrent hidden state, built so that attention next module lands as a fix to a specific, named failure mode.

Module 15 shipped

Attention

Build attention from a soft dictionary lookup. Scaled dot-product. Q, K, V as projections of the same X. Causal masking. Permutation-equivariance and three flavors of positional encoding. Multi-head as parallel subspaces. The T² cost and the KV-cache that tames it.

Module 16 shipped

The Transformer Block

Compose attention with a position-wise FFN, wrap each in residual + layer norm, stack N times, top with a final LN and a tied unembedding. Pre-LN vs post-LN. The residual stream as the noun the model operates on. Where the parameters and FLOPs actually go.

Module 17 shipped

Tokenization, Training & Sampling

How text becomes integer ids (BPE). How the M16 forward pass becomes a working model (the training loop with AdamW, warmup, cosine decay, gradient clipping). How a trained model becomes text again (autoregressive sampling with temperature, top-k, top-p, and the KV cache that makes inference O(T)).

Module 18 shipped

Capstone: Train a Tiny Transformer in Your Browser

4 layers, 4 heads, 64-embed, 64-context. ~209k parameters. Trains in roughly 5 minutes on WebGPU. Produces Shakespeare-flavored nonsense. Yours to keep.