Applied Machine Learning: 25 Worked Projects Across Healthcare, Bioinformatics, and Vision/Robotics
This is the project-driven companion to Mathematical Awakening. Where the first book builds the math, this one uses it: worked machine-learning implementations you can read straight through, return to as a reference, or fork and run on your own hardware.
What's actually in this book
Let me tell you on the first page, so nothing has to be discovered later:
- Three domains, twenty-five projects.
- Chapter 1 — Healthcare & Medical AI (Projects 1–10)
- Chapter 2 — Bioinformatics & Genomic AI (Projects 11–18)
- Chapter 3 — Computer Vision & Robotics (Projects 19–25)
- Two more domains were planned and not written. The original outline included NLP (Projects 31–40) and Financial / Environmental / Autonomous Systems (Projects 41–50). Those chapters aren't in this volume, and listing them in the TOC just to imply they exist would be dishonest. If they ever get finished, it'll be in a future edition or as a separate book; pretending otherwise isn't useful to you.
- Code is shown as static listings. The Quarto source sets
execute: eval: false. Every cell renders as a readable listing rather than a live run. That keeps the book buildable in seconds and honest about what it is: a structured pattern catalogue, not a reproducible experiment harness. Large training runs shouldn't pretend to be reproducible from amake build. - Datasets are referenced, not vendored. Most projects pull from Kaggle, Hugging Face, public medical datasets (MIMIC), or open geospatial sources. Every project cites where the data comes from; reproducing a project is a fork-and-run exercise on your own machine, not a one-liner.
Vintage note: this is a 2024 snapshot
Be aware of when these chapters were written. The architectures, loss choices, metrics, and failure modes called out here are all 2024-era best practice. Some of that is already aging:
- The NLP-style sections (clinical note summarization, radiology report generation, healthcare chatbots) describe pre-LLM patterns — BART / T5 fine-tuning, custom transformer stacks. Today you'd reach for a general foundation model + retrieval or light fine-tuning first, and only train from scratch when you have a compelling reason.
- The drug discovery and protein folding projects lean on AlphaFold-inspired architectures trained in-house. In practice, the modern move is to build on top of foundation models (ESM, AlphaFold2/3, DiffDock) rather than reproduce them.
- Vision Transformers for medical imaging and robotic manipulation are still correct building blocks, but the surrounding stack (distillation, LoRA, quantization, multimodal adapters) has moved forward.
What does still hold up, and why I kept the book:
- Architectures as patterns. ViT-for-classification, U-Net-for-segmentation, GAT-for-pathway-graphs, PPO-for-control — the mapping from problem class to architecture family is stable, even when the specific checkpoint you'd load has moved on.
- Math-to-code mapping. Each project lands on a concrete loss function, a concrete metric, and a failure mode you should watch for. That part doesn't age with the framework version.
- Shape of the problem. Most of the chapters get the problem formulation right — what the input is, what the output is, why a specific signal is hard to learn from, what the ablation should be. That's the hardest-won knowledge in this book, and it's domain-stable.
If you're looking for current-frontier guidance (2025–2026 foundation models, agentic systems, large-scale training infra), this is not that book. Read it as a structured walk through the classical ML stack applied to real domains, circa 2024.
Who this is for
- Engineers and data scientists who can read Python and want stronger intuition for model choices.
- Researchers prototyping across domains they don't already specialize in, who need a structured starting point rather than a blank file.
- Readers of Mathematical Awakening who want a concrete place to see the math land.
If you don't already know the standard Python ML stack (NumPy / Pandas / PyTorch or TensorFlow / scikit-learn), you'll want to spend some time there first; the projects don't re-teach it.
How to read
- Skim. Read this preface, then open one chapter that matches a domain you care about. Each project section has a Problem Statement, a Mathematical Foundation, an Implementation walkthrough, and a short Advanced Extensions / Implementation Checklist at the end. Skimming those first four subsections per project will give you the shape without the full listing.
- Study. Pick one chapter and take one or two projects end-to-end on real data. Substitute your own dataset, change one architectural choice, and watch how the metrics move. That's where the value is; reading code alone won't give it to you.
- Reference. Search by problem class (classification, retrieval, segmentation, sequence modeling, forecasting, control). The implementation patterns repeat across domains — a U-Net for medical segmentation is very close structurally to one for satellite imagery, and the book's job is to make those shared patterns visible.
How this pairs with Mathematical Awakening
Mathematical Awakening walks through calculus, linear algebra, probability, and statistics. This book invokes that math where it does work — gradient descent in project 1, Bayesian inference in project 8, spectral decomposition in the pathway graphs of project 16, KL divergence in the RL projects of chapter 3. Where you need a refresher, the chapter references in that book are your primary path; I don't re-derive the math here.