Applied ML 2026: Reproducible Projects for the LLM Era

Overview

Applied ML 2026 is the successor to the 2024 Applied Machine Learning book, written from scratch for the LLM era. Where the earlier volume was a 2024 snapshot of 25 worked projects across healthcare, bioinformatics, and vision/robotics — and has been archived to the shelf below — this book is built differently, one reproducible project at a time.

This is an in-progress release. The preface and Chapter 1 are published now; further chapters land as each project's rig lands with runnable code and honest numbers.

Design principles

Every project in this book has to pass the same bar:

LLM-first. The starting point is a foundation model plus retrieval, with fine-tuning as a targeted intervention. Projects are organised around the patterns the LLM era actually uses — retrieval, tool use, evaluation, distillation, quantisation, cost — rather than around architectures-in-isolation.
Eval-first. Every chapter's first code cell is the metric, not the model. The eval is the specification; without one, you cannot tell whether anything is working.
Reproducible rigs. Every project ships a public script with the book itself. Chapter 1's run.py runs in under a second on a laptop, emits metrics.json, and produces the chapter's numbers without access to the private genass workspace.
No API keys required to reproduce. Every rig uses only the Python standard library or a small, well-pinned dependency set. Projects that benefit from a real LLM are designed so the base rig is deterministic, with the LLM call as a clearly-marked optional extension. This keeps the bar at a $200 laptop.
Honest failure modes. Each project has a section documenting what fails, with numbers. Negative results are not a footnote.
Grounded in Mathematical Awakening. Where the math shows up, this book points back to the relevant chapter rather than re-teaching. The two books are meant to sit together.

What's in now

Chapter 1 — Evaluating pass@k, and what it doesn't tell you. A 120-problem synthetic grade-school math benchmark with three difficulty tiers. Four solvers of controllable accuracy and controllable sample-level correlation expose how pass@k actually behaves:

why pass@k = pass@1 for a deterministic solver (and what that means for papers that report pass@10 on greedy decoding);
how the classical bound $1 - (1 - p)^k$ holds under independence and breaks under correlated sampling;
how a 0.5 stickiness correlation costs ~2.6 percentage points at pass@5 relative to an iid oracle of the same base rate;
how to read pass@k alongside bootstrap 95% CIs and per-difficulty breakdowns, and why headline pass@k numbers without either are half a number.

The rig is under 400 lines of standard-library Python, runs in one second, and is published as run.py alongside the HTML reader.

What's coming next

As each project lands with a runnable rig, the chapter joins the book. The active queue:

Retrieval for QA: BM25 vs embeddings vs hybrid on a small hand-built dataset.
Distillation: a synthetic teacher-student setup to measure how much reasoning ability transfers via SFT on generated traces.
Quantisation and the memory-quality trade-off: int8, int4, and the ablations that matter.
Tool use: a tiny agent that calls three deterministic tools, with traceable logs and a cost ceiling.
Domain plumbing: one healthcare, one bioinformatics, one robotics project, each built on the LLM-first primitives above.

Chapters only land when the rig does. That constraint is deliberate.

Read online vs. PDF

The web reader splits the book into its current chapters — preface and Chapter 1 — and renders inline math and code listings directly in the browser. The PDF is the same content in a single downloadable file and grows as chapters are added.