Preface

Why this book

This is the successor to Applied Machine Learning (2024). That earlier volume is still on the shelf at aresalab.com — it was a 2024 snapshot of 25 worked projects across healthcare, bioinformatics, and vision/robotics. It held up as a structured tour of classical ML patterns, but it aged quickly: the NLP projects predate the LLM era, the drug-discovery projects predate AlphaFold 3, and the code listings were static artefacts without runnable rigs. A reader in 2026 would not want to build new work from that starting point.

Applied ML 2026 is written from scratch with a different set of design choices, driven by what went wrong in the 2024 book and by what the field has converged on since.

Design principles

LLM-first. The centre of gravity has moved. In 2024, a "healthcare AI" project meant training a custom transformer on a MIMIC-III subset. In 2026, the first thing you reach for is a foundation model plus retrieval, with fine-tuning as a targeted intervention rather than a default. Each chapter starts from that premise and is organised around the patterns the LLM era actually uses: retrieval, tool use, evaluation, distillation, quantisation, safety, cost.
Eval-first. Every project begins with an evaluation harness before it has a model. The eval is the specification — without one, you cannot tell whether anything is working. The book is strict about this: the first code cell in every project is the metric, not the architecture.
Reproducible rigs. Every project ships with a public script inside the book release itself. Chapter 1's run.py can be saved locally, run in seconds on a laptop, and inspected without access to the private genass workspace. The script emits a metrics.json with the headline numbers, and those numbers — not marketing bullets — are what appear in the chapter's result section.
No API keys required to reproduce. Every rig runs with only the Python standard library and a small, well-pinned dependency set. Projects that benefit from a real LLM are designed so the base rig is deterministic, with the LLM call as a clearly-marked, optional extension. This means the book can be reproduced on a $200 laptop, which is the correct bar for a public research artefact.
Honest failure modes. Each project has a section documenting what fails, with numbers. Negative results are not a footnote; they are half the value of running a rig in the first place. If a paper claims 94% on some benchmark and our rig says 87% after controlling for a known confound, we report 87% and explain the gap.
Mathematical grounding via Mathematical Awakening. Where a project needs calculus, linear algebra, probability, or statistics, this book points back to the relevant chapter of Mathematical Awakening rather than re-teaching. The two books are designed to sit on the shelf together.

Scope

The book is growing one project at a time, chapter by chapter, as each project's rig lands with runnable code and honest numbers. The current table of contents is short on purpose:

Chapter 1 — Evaluating pass@k, and what it doesn't tell you. A small reasoning benchmark with three solvers of controllable accuracy. The rig computes pass@1, pass@5, pass@10 with bootstrap confidence intervals, and the chapter walks through what pass@k actually measures, what the variance looks like in practice, and where the metric misleads. No API calls; fully reproducible in under a second.

Future chapters, added as the rigs land:

Retrieval for QA: BM25 vs embeddings vs hybrid, evaluated on a small hand-built dataset.
Distillation: running a synthetic teacher-student setup to measure how much reasoning ability can transfer via SFT on generated traces.
Quantisation and the memory-quality trade-off: int8, int4, and the ablations that matter.
Evaluation harnesses: designing a benchmark that cannot be gamed.
Tool use: a tiny agent that calls three deterministic tools, with traceable logs and a cost ceiling.
Domain plumbing: one healthcare project, one bioinformatics project, one robotics project, each built on the LLM-first primitives above.

Any chapter that cannot be written with a reproducible rig does not get written. That constraint is deliberate: it is what separates this book from the 2024 edition.

How to read a chapter

Every chapter has the same shape:

Problem. What is being measured, and why is it interesting.
Eval. The metric, with formula and reference implementation. If you only read one section, this is the one.
Method. The solver or architecture, kept compact. Math pointers back to Mathematical Awakening where appropriate.
Rig. A walkthrough of experiments/run.py — what it does, how long it takes, what it emits.
Numbers. Headline results from the rig, pulled live from metrics.json. Tables with bootstrap confidence intervals, not point estimates.
What fails, and why. The section that matters most. A list of cases where the method breaks, with reproducible examples.
Extensions. The obvious next paper for anyone who wants to take the project further.

That is the shape. One chapter for now, more landing as the rigs do.

Written with a debt to Mathematical Awakening, which teaches the math this book assumes, and to the 2024 Applied Machine Learning, which taught me what not to do the second time around.