A synthesis of recent work on LLM forecasting agents, focusing on Bridgewater’s AIA Forecaster and why blending AI with market prices can beat either alone.
MLE-Bench introduces a new benchmark to evaluate AI agents on real-world ML engineering tasks using Kaggle competitions. This post highlights key findings, including resource scaling effects, debugging challenges, and the performance of different agent frameworks.