anthropic

Anthropic's Clio is a privacy-preserving pipeline — extract facets from each conversation with Haiku, embed with sentence-transformers, cluster bottom-up with k-means into a ~10/100/1000 three-level hierarchy, label each cluster with Sonnet, and enforce minimum unique-account thresholds at every step. The whole 100K-conversation run costs $48.81 and recovers a known taxonomy at 94% accuracy versus 5% for random guessing. The architecture lifts almost unchanged to agent traces, which is exactly what Distributional has been doing: traces become the unit of analysis, facets become tool-call sequences and failure fingerprints, and clusters surface the lazy-tool-call hallucinations and resource-conservation regressions that pre-defined evals never thought to look for. This post walks Clio's pipeline stage by stage, maps each stage onto the agent-trace setting, and pins down what the 'analytics' layer above telemetry and monitoring actually buys you.

Hierarchical Clustering of Agent Traces for Discovering Unknown Failure Modes