Feel the AGI: Supervised Fine-Tuning in Your Browser
Until you play with a transformer and watch it learn, you will never truly feel the AGI. The folks at OpenAI felt it firsthand, scaling from GPT-2 to GPT-3 to GPT-4, watching language models go from parlor tricks to something that felt like understanding. This page attempts to give you that same feeling: load Pythia-14M, a real 14M-parameter transformer from EleutherAI, fine-tune it on your own instruction-completion pairs, and watch a model that spits out gibberish start producing structured answers. All through SFT, completely in your browser.
1. Load Model
2. Training Data (15 examples)
Feel free to add your own SFT examples.
| Prompt | Completion (target) |
|---|
3. Before Training (baseline)
What the model generates before any fine-tuning:
Prompt Model response
4. Fine-Tune
LM Head only freezes the transformer backbone and only trains the final output projection layer that maps hidden states to vocabulary logits. Fast and sufficient when the pretrained representations already capture what you need. Full Model updates all parameters, including embeddings and every transformer layer. This lets the model learn deeper representations but is slower and more prone to overfitting on small datasets.
Hyperparameters
5. Test Model
Run the same baseline prompts through the current model. Use anytime to check progress.
Prompt Model response
Anything outside your training data will produce nonsense, but hey, at least it'll be better nonsense than the base model.
What Just Happened
You took a pretrained language model that only knew how to predict the next token and taught it to follow instructions. That is supervised fine-tuning: you provided (prompt, completion) pairs, computed a cross-entropy loss on the completion tokens only, and updated the weights with gradient descent.
This is the same first step used to build ChatGPT, Claude, and every other instruction-following LLM. The difference is scale: they use billions of parameters and millions of examples. The mechanism is identical.
SFT alone does not produce a safe or well-aligned model. It teaches format and surface-level instruction following, but not preference or judgment. That requires the next step in the pipeline: reinforcement learning from human feedback (RLHF), using algorithms like PPO or GRPO. But SFT is where the magic first becomes visible.
I hope you felt the AGI. If not fully, at least a little bit.