How large language models actually work — tokenization, embeddings and positional encoding, attention, the transformer block, training, inference, and interpretability.
A module-by-module concept outline. Open the course to learn each topic with animated explanations, in-browser code, practice challenges, and a knowledge check.
Module 1. Tokens — The Words a Model Sees
Topics
BPEVocabularyGlitch tokensTokenizer comparison
Sections
1Why Tokens Exist
2Byte-Pair Encoding (BPE) — The Algorithm
3Vocabulary, Special Tokens, and the Pre-Tokenizer
4Glitch Tokens and Tokenizer Fragility
5Tokenizer Choice in 2026 — Tiktoken vs SentencePiece vs Llama
Module 2. Embeddings & Positional Encoding
Topics
Word embeddingsSinusoidal positionsRoPEALiBiNoPE
Sections
1Tokens Become Vectors
2The Embedding Table — A Lookup, Not a Computation
3Why Order Matters — The Need for Positional Encoding
4Sinusoidal Positions — The Original Trick
5Rotary Position Embedding (RoPE) — The Modern Default