- Published on
LangChain's harness-engineering recipe treats evals as the training data for agents — but what do you do when you don't have evals, can't touch production customer data, and need to probe very specific corners of agent behaviour? You build a harness whose output is the dataset itself. This post walks through a synthetic-data-generation harness that runs Claude Code in a loop over Phoenix traces and generates edge-case companies designed to break the agent under test, grounded in what Meta-Harness and the LangChain canon actually say (and don't say) about how such loops should be built.