Blog

Machine Learning and Recommender Systems

From Reasoning to Agentic Thinking: What Changes and Why It's Hard

April 6, 2026

The shift from static reasoning (o1, R1) to agentic thinking changes everything: RL infrastructure, environment design, reward signals, and what "good thinking" even means.
TurboQuant Animated: Watch Vector Quantization Happen

March 24, 2026

Interactive 2D and 3D animations showing every step of TurboQuant: normalize, rotate, quantize, reconstruct. Add your own points and see the compression error in real time.
Test-Time Scaling: Spend Compute When It Matters

March 24, 2026

You can scale LLMs at inference, not just training. A guide to the two paradigms of test-time compute, the verification bottleneck, and when thinking longer actually helps.
Training MoE Right: Making Every Expert Count

March 18, 2026

The techniques that prevent expert collapse in Mixture-of-Experts LLMs: load balancing losses, routing strategies, shared experts, auxiliary-loss-free methods, and fine-grained expert segmentation.
Feel the AGI: Supervised Fine-Tuning in Your Browser

March 12, 2026

Fine-tune a 14M parameter language model in your browser. Load Pythia-14M, train on instruction-completion pairs, and watch it learn.
Scaling Laws for LLMs: The 3 Knobs You Actually Have

March 10, 2026

Scaling is not "bigger model." It's a budget allocation problem across parameters, tokens, and compute. A first-principles guide to what scaling laws actually say.
WTF Is Happening Inside a Transformer (Linear Algebra Edition)

March 7, 2026

An intuition-first guide to what transformers actually compute. Q, K, V demystified, attention as a data-dependent mixing matrix, and the MLP's expand-activate-compress pattern.
DeepSeek's Technical Playbook: From MLA to Conditional Memory

March 6, 2026

A deep dive into DeepSeek's key innovations: Multi-head Latent Attention, sparse MoE, sparse attention, scalable RL, and the Engram conditional memory architecture.
Reward Modeling and DPO: Learning What "Good" Means

March 4, 2026

How reward models turn human preferences into training signal, and how DPO skips the reward model entirely. Bradley-Terry, preference data, and offline alignment explained.
Positional Encodings for LLMs: From Sinusoidal to RoPE

March 1, 2026

How transformers understand token order. An intuition-first guide covering sinusoidal positional encodings and Rotary Position Embeddings (RoPE).
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

February 26, 2026

How DeepSeek uses idle decode-side NICs to double KV-Cache loading throughput in prefill-decode disaggregated serving.
Reinforcement Learning for LLMs

February 20, 2026

An intuition-first guide to the RL concepts behind RLHF, PPO, and GRPO — the background you need before diving into alignment algorithms.
PPO & GRPO for LLM Alignment

February 17, 2026

A first-principles guide to PPO and GRPO for LLM alignment, for ML engineers with minimal RL background.
Hashing for large scale similarity

January 30, 2019

Machine Learning
Implementing Matrix Factorisation using Tensorflow

March 15, 2015

My quora response
How exactly is machine learning used in recommendation engines?

March 15, 2015

My quora response