Preface
Why this book
This is the successor to Applied Machine Learning (2024). That earlier volume is still on the shelf at aresalab.com — it was a 2024 snapshot of 25 worked projects across healthcare, bioinformatics, and vision/robotics. It held up as a structured tour of classical ML patterns, but it aged quickly: the NLP projects predate the LLM era, the drug-discovery projects predate AlphaFold 3, and the code listings were static artefacts without runnable rigs. A reader in 2026 would not want to build new work from that starting point.
Applied ML 2026 is written from scratch with a different set of design choices, driven by what went wrong in the 2024 book and by what the field has converged on since.
Design principles
- LLM-first. The centre of gravity has moved. In 2024, a "healthcare AI" project meant training a custom transformer on a MIMIC-III subset. In 2026, the first thing you reach for is a foundation model plus retrieval, with fine-tuning as a targeted intervention rather than a default. Each chapter starts from that premise and is organised around the patterns the LLM era actually uses: retrieval, tool use, evaluation, distillation, quantisation, safety, cost.
- Eval-first. Every project begins with an evaluation harness before it has a model. The eval is the specification — without one, you cannot tell whether anything is working. The book is strict about this: the first code cell in every project is the metric, not the architecture.
- Reproducible rigs. Every project ships with an
experiments/run.pythat you can clone, run in seconds on a laptop, and inspect. The script emits ametrics.jsonwith the headline numbers, and those numbers — not marketing bullets — are what appear in the chapter's result section. When aresalab.com displays a chapter, it reads the livemetrics.jsonso the webpage and the PDF cannot drift. - No API keys required to reproduce. Every rig runs with only the Python standard library and a small, well-pinned dependency set. Projects that benefit from a real LLM are designed so the base rig is deterministic, with the LLM call as a clearly-marked, optional extension. This means the book can be reproduced on a $200 laptop, which is the correct bar for a public research artefact.
- Honest failure modes. Each project has a section documenting what fails, with numbers. Negative results are not a footnote; they are half the value of running a rig in the first place. If a paper claims 94% on some benchmark and our rig says 87% after controlling for a known confound, we report 87% and explain the gap.
- Mathematical grounding via Mathematical Awakening. Where a project needs calculus, linear algebra, probability, or statistics, this book points back to the relevant chapter of Mathematical Awakening rather than re-teaching. The two books are designed to sit on the shelf together.
Scope
The book is growing one project at a time, chapter by chapter, as each project's rig lands with runnable code and honest numbers. The current table of contents is short on purpose:
- Chapter 1 — Evaluating pass@k, and what it doesn't tell you. A small reasoning benchmark with three solvers of controllable accuracy. The rig computes pass@1, pass@5, pass@10 with bootstrap confidence intervals, and the chapter walks through what pass@k actually measures, what the variance looks like in practice, and where the metric misleads. No API calls; fully reproducible in under a second.
Future chapters, added as the rigs land:
- Retrieval for QA: BM25 vs embeddings vs hybrid, evaluated on a small hand-built dataset.
- Distillation: running a synthetic teacher-student setup to measure how much reasoning ability can transfer via SFT on generated traces.
- Quantisation and the memory-quality trade-off: int8, int4, and the ablations that matter.
- Evaluation harnesses: designing a benchmark that cannot be gamed.
- Tool use: a tiny agent that calls three deterministic tools, with traceable logs and a cost ceiling.
- Domain plumbing: one healthcare project, one bioinformatics project, one robotics project, each built on the LLM-first primitives above.
Any chapter that cannot be written with a reproducible rig does not get written. That constraint is deliberate: it is what separates this book from the 2024 edition.
How to read a chapter
Every chapter has the same shape:
- Problem. What is being measured, and why is it interesting.
- Eval. The metric, with formula and reference implementation. If you only read one section, this is the one.
- Method. The solver or architecture, kept compact. Math pointers back to Mathematical Awakening where appropriate.
- Rig. A walkthrough of
experiments/run.py— what it does, how long it takes, what it emits. - Numbers. Headline results from the rig, pulled live from
metrics.json. Tables with bootstrap confidence intervals, not point estimates. - What fails, and why. The section that matters most. A list of cases where the method breaks, with reproducible examples.
- Extensions. The obvious next paper for anyone who wants to take the project further.
That is the shape. One chapter for now, more landing as the rigs do.
Written with a debt to Mathematical Awakening, which teaches the math this book assumes, and to the 2024 Applied Machine Learning, which taught me what not to do the second time around.