pass@k synthetic benchmark
chapter · Applied ML 2026 · Ch. 1 — pass@kGenerates a 120-problem synthetic grade-school math benchmark across three difficulty tiers, runs four solvers (regex baseline, parser, iid noisy oracle, sticky noisy oracle) with 20 samples each, and computes pass@1 / pass@5 / pass@10 with bootstrap 95% CIs. Demonstrates how the classical 1−(1−p)^k bound breaks under correlated sampling.
Outputs & notes
- · experiments/results/headline.json — overall pass@1/5/10 across solvers
- · experiments/results/metrics.json — per-solver, per-tier, per-k with 95% CIs
- · Applied ML 2026 Ch. 1 prose consumes the same JSON; PDF and web reader cannot drift
- Standard library only — no third-party deps, no API keys. Deterministic from `--seed`.
- A real LLM can be dropped in as a one-line adapter to replace `noisy_oracle`; the harness, grading, and CIs all work unchanged.