Recursive Self-Improvement for Trading: How LLMs Can Teach Themselves to Invest

The Problem With Static AI Trading Systems

Every quantitative trading system built in the last decade shares the same fundamental limitation: it stops learning the moment you deploy it.

You train a model on historical data. You optimize its parameters. You backtest it against past regimes. Then you push it live and hope the market doesn't change too much before you retrain it next quarter. When it does change -- and it always does -- you pull the system offline, collect new data, retrain, re-validate, and redeploy. The cycle takes weeks or months. The market moved on days ago.

This is the traditional ML approach to trading: a human-in-the-loop retraining cycle where the model itself has no mechanism to recognize its own failures, reflect on what went wrong, or autonomously improve its reasoning.

Recursive Self-Improvement (RSI) changes this entirely. Instead of a static model waiting for humans to fix it, RSI creates a system that iteratively enhances its own capabilities -- refining its strategies, updating its knowledge, and improving its reasoning through automated feedback loops. The system trades, observes outcomes, critiques its own decisions, and generates better versions of itself. No human retraining required.

This is not science fiction. It is actively being built by both academic research groups and major hedge funds. The question is no longer whether self-improving trading AI is possible, but which architectures actually work.

What Recursive Self-Improvement Actually Means for LLMs

RSI in the LLM context takes several concrete forms, each operating at a different level of the system:

Prompt and scaffolding optimization. The LLM iteratively refines its own prompting strategies. Microsoft's Self-Taught Optimizer (STOP) demonstrated this: a seed improver program uses an LLM to improve itself, with the improved version generating significantly better programs. The underlying model weights never change -- all improvement happens in the scaffolding code and prompts that wrap the model.

Iterative fine-tuning with self-generated data. RISE (Recursive IntroSpEction), presented at NeurIPS 2024, fine-tunes LLMs to self-correct across multiple turns. It bootstraps rollouts, pairs failed attempts with better responses from best-of-N sampling, and trains via reward-weighted regression. The result: 17.7% improvement for LLaMA2-7B and 23.9% for Mistral-7B over five turns of introspection.

Recursive problem decomposition. LADDER enables LLMs to autonomously improve by recursively generating simpler variants of hard problems, solving them, and using reinforcement learning to bootstrap upward. A 7B parameter model achieved 73% accuracy on the 2025 MIT Integration Bee, outperforming GPT-4o at 42%.

Evolutionary code generation. Google DeepMind's AlphaEvolve uses an ensemble of Gemini models in an evolutionary loop -- mutating and combining algorithms, receiving automated evaluator feedback, and iterating. It achieved genuine recursive self-improvement by optimizing components of its own training infrastructure, producing a 32.5% speedup for FlashAttention kernels and the first improvement to matrix multiplication algorithms in 56 years.

Skill library accumulation. Voyager from NVIDIA and Caltech demonstrated lifelong learning in Minecraft by iteratively prompting GPT-4 for code, refining based on environment feedback, and storing successful programs in an expanding skill library -- obtaining 3.3x more unique items and unlocking milestones 15.3x faster than prior methods.

The common thread: none of these systems just run inference once and return a result. They run, evaluate, reflect, refine, and run again. The output of iteration N becomes the input for iteration N+1.

Mind Evolution: Poetiq's Approach to Evolving Deeper LLM Thinking

One of the most compelling RSI architectures comes from a paper that predates the company built around it. Evolving Deeper LLM Thinking, authored by Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, Dave Marwood, Shumeet Baluja, Dale Schuurmans, and Xinyun Chen, introduces Mind Evolution -- an evolutionary search strategy that scales inference-time computation through a language-based genetic algorithm.

The core insight is elegant: LLMs are uniquely suited to serve as mutation and crossover operators in a genetic algorithm because they can meaningfully recombine and refine natural language solutions through prompting. Traditional genetic algorithms require you to formalize the search space -- encode solutions as bit strings, define crossover points, design mutation operators. Mind Evolution skips all of that. The solutions are natural language text. The LLM is the mutation engine.

How Mind Evolution Works

The algorithm fuses two cognitive modes:

Divergent thinking -- free-flowing parallel idea exploration across multiple independent populations, called islands. Each island maintains its own population of candidate solutions that evolve independently.

Convergent thinking -- idea evaluation, selection, and iterative refinement through structured self-critique.

The pipeline looks like this:

┌──────────────────────────────────────────────────────────────┐
│                     MIND EVOLUTION                           │
│                                                              │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐       │
│  │ Island 1│  │ Island 2│  │ Island 3│  │ Island 4│       │
│  │         │  │         │  │         │  │         │       │
│  │ Pop: 5  │  │ Pop: 5  │  │ Pop: 5  │  │ Pop: 5  │       │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘       │
│       │            │            │            │              │
│       ▼            ▼            ▼            ▼              │
│  ┌─────────────────────────────────────────────────┐        │
│  │            Fitness Evaluation                    │        │
│  │  • Score optimization objectives                 │        │
│  │  • Verify constraint satisfaction                │        │
│  │  • Generate textual feedback                     │        │
│  └─────────────────────┬───────────────────────────┘        │
│                        │                                     │
│                        ▼                                     │
│  ┌─────────────────────────────────────────────────┐        │
│  │     Refinement through Critical Conversation     │        │
│  │                                                  │        │
│  │  Critic: "This solution fails constraint X       │        │
│  │          because of Y. The budget is exceeded    │        │
│  │          by $200 on day 3."                      │        │
│  │                                                  │        │
│  │  Author: "Revised solution swapping hotel Z      │        │
│  │          for hotel W, reallocating $200 to       │        │
│  │          day 3 dining budget."                   │        │
│  │                                                  │        │
│  │  [Repeat for multiple turns]                     │        │
│  └─────────────────────┬───────────────────────────┘        │
│                        │                                     │
│                        ▼                                     │
│  ┌─────────────────────────────────────────────────┐        │
│  │  Selection → Crossover → Next Generation         │        │
│  │  (Boltzmann tournament, 0-5 parents per child)   │        │
│  └─────────────────────────────────────────────────┘        │
│                                                              │
│  Migration: Top 5 solutions clone island i → island i+1     │
│  Reset: Every 3 generations, retire 2 weakest islands       │
│                                                              │
│  Repeat for 10 generations (≤800 candidate solutions)       │
└──────────────────────────────────────────────────────────────┘

The most critical component is Refinement through Critical Conversation (RCC). An LLM playing a critic role analyzes the current solution and identifies specific weaknesses. A separate author role then generates a refined solution addressing those weaknesses. This critic-author loop runs for multiple turns, creating a self-referential improvement cycle. The ablation study shows that removing the critic step drops performance from 95.6% to 91.1%, and removing textual feedback drops it to 76.1%.

The Results

On the TravelPlanner benchmark -- complex multi-day travel planning with budget, dining, accommodation, and transport constraints:

Method	Success Rate	Cost per Problem
Single-pass baseline	5.6%	Minimal
Best-of-N (800 candidates)	55.6%	~$0.29
Sequential revision (10 threads, 80 turns)	82.8%	$2.75
Mind Evolution (Flash)	95.6%	$0.29
Mind Evolution two-stage (Flash + Pro)	100%	~$1.00

The two-stage approach achieves perfect results at a fraction of the cost of sequential revision. This is not incremental improvement -- it is a qualitative leap in capability.

Poetiq: The Company Built on This Research

Two of the paper's authors -- Ian Fischer and Shumeet Baluja -- went on to co-found Poetiq in June 2025. Their thesis: LLMs are powerful knowledge stores but poor reasoners on their own. The Mind Evolution architecture is the fix.

Poetiq is not a trading company. They build a model-agnostic meta-system -- a layer that sits on top of any existing LLM (GPT, Claude, Gemini, Llama) and enhances it through recursive self-improvement. A client provides a problem definition and a few hundred examples. Poetiq's system generates a specialized agent that recursively improves itself. No fine-tuning. No millions of training samples. Just iterative evolutionary search converging in fewer than two iterations per task.

The team comes from Google DeepMind -- 72 years of combined experience. Baluja spent 21 years at Google, contributed to 170+ patents, and co-originated YouTube's Content ID system. Fischer joined Google through the Y Combinator-backed acquisition of Apportable.

Their public validation: on the ARC-AGI-2 benchmark (considered the gold standard for AI reasoning), a six-person Poetiq team achieved 54% accuracy at $30.57 per problem, beating Google's Gemini 3 Deep Think (45% at$ 77.16 per problem). After incorporating GPT-5.2, they reached 75% -- surpassing the human baseline of approximately 60%.

In January 2026, they raised a $45.8M seed round led by FYRFLY Venture Partners and Surface Ventures, with participation from Y Combinator, 468 Capital, and others.

Applying Mind Evolution to Trading

The Mind Evolution paper does not discuss trading directly. But the architecture maps to financial problems with almost uncomfortable precision. Here is why.

Portfolio Optimization as Constraint Satisfaction

The TravelPlanner benchmark -- the one where Mind Evolution scored 100% -- is a multi-constraint optimization problem. You must plan a trip that satisfies budget limits, dining constraints, accommodation requirements, transportation schedules, and temporal dependencies. All expressed in natural language.

Portfolio construction is the same type of problem. You have capital allocation constraints, sector exposure limits, risk budgets, liquidity requirements, regulatory restrictions, correlation targets, and rebalancing rules. These constraints are often informal, regime-dependent, and expressed in investment committee memos rather than mathematical equations.

A Mind Evolution approach to portfolio optimization would:

Initialize -- generate a population of candidate portfolios across multiple islands, each with different strategic tilts
Evaluate -- run backtests, compute Sharpe ratios, check constraint violations, measure drawdowns
Critique -- the LLM critic identifies specific weaknesses: excessive sector concentration, tail risk exposure, liquidity mismatches
Refine -- the LLM author generates improved portfolios addressing the critique
Evolve -- selection, crossover between successful portfolios, migration of top performers across islands
Repeat -- for multiple generations until convergence

The key advantage over traditional portfolio optimizers: you do not need to formalize every constraint mathematically. The LLM can reason about qualitative constraints like "avoid companies with pending regulatory investigations" or "increase defensive exposure ahead of election uncertainty" alongside quantitative ones like "maximum 5% single-name concentration."

Trading Strategy Evolution

Instead of evolving portfolios, evolve entire trading strategies. Each candidate in the population is a natural language description of a strategy -- entry rules, exit rules, position sizing, risk management. The fitness function is backtesting performance.

Island 1: Momentum strategies
  - Candidate A: "Buy stocks with 12-month momentum above 90th
    percentile, sell after 20% drawdown from peak"
  - Candidate B: "Buy sector ETFs showing 3-month relative
    strength, rotate monthly, stop loss at -8%"

Island 2: Mean reversion strategies
  - Candidate A: "Buy stocks trading below 200-day MA that show
    RSI < 30 with increasing volume, target 5% bounce"
  - Candidate B: "Pairs trading on cointegrated sector pairs,
    enter at 2σ divergence, exit at mean"

Island 3: Event-driven strategies
  - Candidate A: "Buy 3 days before earnings if sentiment is
    positive and implied vol is below 30-day average"
  - Candidate B: "Short stocks with insider selling > $1M in
    past 30 days and declining revenue estimates"

Island 4: Hybrid strategies
  - [Crossover candidates from other islands]

Each generation: backtest all candidates, critique the losers, refine, select, crossover, migrate top performers between islands, reset underperforming islands. After 10 generations, the surviving strategies have been stress-tested through evolutionary pressure against historical data.

The island model is particularly valuable for trading. It prevents premature convergence on a single strategy type -- critical in markets where regime changes can invalidate previously optimal approaches overnight.

Learning From News, Research Reports, and Price Data

A self-improving trading LLM needs three input streams that traditional quant models handle poorly:

Unstructured news. Earnings call transcripts, Fed meeting minutes, geopolitical analysis, social media sentiment. LLMs can process all of these natively. The self-improvement loop: the agent makes predictions based on news interpretation, observes market reactions, and refines its understanding of which signals actually matter and which are noise.

Research reports. Analyst recommendations, industry reports, macroeconomic forecasts. A RAG-based architecture retrieves relevant passages and grounds decisions in evidence. The self-improvement mechanism: the agent tracks which research sources were predictive and which were not, adjusting source weighting over time.

Price and technical data. Candlestick patterns, moving averages, volume profiles, order flow. The agent generates technical analysis, trades on it, and uses P&L feedback to calibrate which patterns work in which regimes.

The power of RSI is that these three streams can be integrated and improved simultaneously. A system using Mind Evolution could evolve strategies that synthesize all three -- "buy when the technical setup shows bullish divergence AND recent earnings call sentiment was positive AND the macro environment is accommodative" -- and iteratively refine the specific thresholds and conditions through evolutionary search.

The Broader RSI Trading Landscape

Poetiq's Mind Evolution is one approach among several. The research landscape for self-improving trading AI is expanding rapidly.

Memory-Based Self-Evolution: FinMem

FinMem uses a cognitive-architecture-inspired memory system with layers -- working memory for short-term processing, stratified long-term memory ranked by novelty, relevance, and importance. The agent continuously refines its trading decisions by retaining critical market knowledge beyond human perceptual limits. Each trade decision updates the memory, and future decisions draw on accumulated experience.

LLM as RL Policy: FLAG-Trader

FLAG-Trader, published at ACL 2025, takes a direct approach: use the LLM itself as the policy network in a reinforcement learning loop. Trading P&L serves as the reward signal. Through Proximal Policy Optimization (PPO), the LLM's financial reasoning improves via gradient updates. This is weight-level self-improvement, not just scaffolding-level.

Self-Reflective Verbal Reasoning: SEP

The SEP (Summarize-Explain-Predict) framework uses a verbal self-reflective agent that teaches itself to generate correct stock predictions. Past mistakes become training samples without human annotation. PPO trains the model to generate optimal explanations at test time. The self-improvement loop is entirely autonomous.

Multi-Agent Debate: FinCon and TradingAgents

FinCon (NeurIPS 2024) deploys a manager-analyst hierarchy with a self-critiquing mechanism that updates systematic investment beliefs after each decision cycle. TradingAgents from Tauric Research structures a full trading-firm architecture with bull/bear debate, risk management, and a fund manager for final approval. Both use adversarial interaction as the improvement mechanism.

Curriculum RL: Trading-R1

Trading-R1 uses supervised fine-tuning followed by three-stage easy-to-hard reinforcement learning on 100K+ financial reasoning samples. It employs reverse reasoning distillation -- reconstructing reasoning traces from high-performing but opaque API models to create training data for smaller, more efficient models.

What the Hedge Funds Are Actually Doing

The academic work is public. What the hedge funds are doing is mostly private, but some details have leaked:

Bridgewater Associates (AIA Labs): Building systems to replicate Ray Dalio's macro process end-to-end. Already trading client money, described internally as functioning like "millions of 80th-percentile associates working in parallel."
D.E. Shaw: Deploying an Assistants-LLM Gateway-DocLab stack allowing any desk to create custom LLM tools with minimal code.
Balyasny Asset Management: Running internal LLM agents for autonomous filing synthesis, catalyst monitoring, and risk anticipation.

These are not experiments. They are production systems managing real capital. The common pattern: multi-agent architectures with self-improving feedback loops.

Why Mind Evolution Is Particularly Well-Suited for Finance

Several properties of the Mind Evolution architecture make it a natural fit for financial applications:

No formal specification required. Traditional optimizers need you to express every constraint as a mathematical equation. Financial constraints are often qualitative, regime-dependent, and expressed in natural language. Mind Evolution operates on natural language natively.

Diversity preservation through the island model. Financial markets exhibit regime changes. A system that converges on a single optimal strategy is brittle. The island model maintains multiple independent strategy populations, ensuring the system always has diverse approaches available when regimes shift.

Cost efficiency. Mind Evolution with Gemini Flash costs $0.29 per problem on TravelPlanner. This makes large-scale strategy exploration feasible -- you could evaluate thousands of strategy variants for under$ 300.

Global evaluation, not step-level. Unlike tree search methods that require step-level process rewards (which are extremely hard to define for trading), Mind Evolution only needs a complete-solution evaluator. Backtesting provides this naturally -- you evaluate the entire strategy on the entire historical period, not individual decisions.

The critic-author loop maps to investment committees. The RCC mechanism -- where a critic identifies weaknesses and an author addresses them -- mirrors how investment committees actually work. A portfolio manager proposes a trade. The risk manager critiques it. The PM revises. This is not an analogy; it is the same cognitive process, automated.

The Risks: Where Self-Improving Trading AI Breaks Down

RSI for trading is not a free lunch. Several failure modes are well-documented:

Hallucination and Phantom Portfolios

TradeTrap documents a failure mode called epistemic hallucination: trading agents erroneously believe they retain positions they have already liquidated. The agent makes decisions based on a phantom portfolio that does not exist, leading to strategic paralysis. In a self-improving loop, this hallucination can compound -- the system "learns" from imagined outcomes.

Model Collapse and Entropy Decay

Research on the limits of self-improvement identifies two failure modes when systems train on their own outputs:

Entropy Decay -- finite sampling causes monotonic loss of distributional diversity. The system's strategy space shrinks with each iteration.
Variance Amplification -- without persistent grounding in real data, distributional drift causes the system to converge on a distorted and impoverished version of market reality.

Without fresh authentic data injection, self-improving loops collapse into progressively narrower strategy spaces.

Overfitting to Recent Regimes

A self-improving system that optimizes itself on bull-market data will confidently increase leverage and concentration right before a crash. The fitness function (backtesting returns) rewards the exact behavior that will destroy the portfolio when the regime changes. This is the classic overfitting problem amplified by automation -- the system does not just overfit, it recursively reinforces its overfitting.

Latency

LLM inference takes seconds to minutes. This rules out high-frequency trading entirely. RSI-based systems are best suited for medium to low-frequency strategies with daily, weekly, or longer holding periods.

Evaluation Noise

Financial markets are stochastic. A strategy that loses money might be correct (and unlucky), while one that makes money might be wrong (and lucky). Unlike the TravelPlanner benchmark where constraint satisfaction is deterministic, trading fitness evaluation is noisy. This makes it hard to attribute performance to the self-improvement loop versus randomness, and introduces the risk that the system "improves" by selecting for luck rather than skill.

A Practical Architecture: Combining Mind Evolution With Trading Agents

Here is how you would actually build a self-improving trading system using the ideas from the Mind Evolution paper and the current trading agent research:

┌────────────────────────────────────────────────────────────┐
│                   STRATEGY EVOLUTION LAYER                  │
│                    (Mind Evolution)                         │
│                                                            │
│  Island 1: Momentum    Island 2: Value    Island 3: Event  │
│  ┌──────────────┐    ┌──────────────┐   ┌──────────────┐  │
│  │ Strategy pop  │    │ Strategy pop  │   │ Strategy pop  │  │
│  │ (5 variants)  │    │ (5 variants)  │   │ (5 variants)  │  │
│  └──────┬───────┘    └──────┬───────┘   └──────┬───────┘  │
│         └──────────────────┼──────────────────┘           │
│                            ▼                               │
│              ┌─────────────────────────┐                   │
│              │   Fitness Evaluation    │                   │
│              │   (Backtesting Engine)  │                   │
│              │                         │                   │
│              │  • Sharpe ratio         │                   │
│              │  • Max drawdown         │                   │
│              │  • Constraint adherence │                   │
│              │  • Regime robustness    │                   │
│              └───────────┬─────────────┘                   │
│                          ▼                                  │
│              ┌─────────────────────────┐                   │
│              │  Critic-Author Refine   │                   │
│              │  (RCC Loop)             │                   │
│              └───────────┬─────────────┘                   │
│                          ▼                                  │
│              Selection → Crossover → Next Gen              │
└────────────────────────────┬───────────────────────────────┘
                             │
                             │  Winning strategies
                             ▼
┌────────────────────────────────────────────────────────────┐
│                   EXECUTION LAYER                           │
│              (Multi-Agent Trading Firm)                     │
│                                                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ News     │  │Technical │  │Sentiment │  │Fundamental│  │
│  │ Analyst  │  │ Analyst  │  │ Analyst  │  │  Analyst  │  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘  │
│       └──────────────┼──────────────┼──────────────┘       │
│                      ▼              ▼                       │
│              ┌──────────────────────────┐                   │
│              │    Bull / Bear Debate    │                   │
│              └────────────┬─────────────┘                   │
│                           ▼                                 │
│              ┌──────────────────────────┐                   │
│              │     Risk Management      │                   │
│              │  (VaR, drawdown, limits) │                   │
│              └────────────┬─────────────┘                   │
│                           ▼                                 │
│              ┌──────────────────────────┐                   │
│              │       Execution          │                   │
│              └──────────────────────────┘                   │
└────────────────────────────────────────────────────────────┘
                             │
                             │  P&L, trade outcomes
                             ▼
┌────────────────────────────────────────────────────────────┐
│                   MEMORY & LEARNING LAYER                   │
│                                                            │
│  ┌─────────────────┐  ┌─────────────────────────────────┐ │
│  │ Working Memory   │  │ Long-Term Stratified Memory     │ │
│  │                  │  │                                  │ │
│  │ Current market   │  │ • What worked in past regimes   │ │
│  │ context, recent  │  │ • Which news sources predicted  │ │
│  │ trades, active   │  │ • Strategy-regime correlations  │ │
│  │ positions        │  │ • Failure patterns to avoid     │ │
│  └─────────────────┘  └─────────────────────────────────┘ │
│                                                            │
│  Feedback loop: outcomes update memory → memory informs    │
│  next evolution cycle → strategies incorporate learned      │
│  patterns → improved fitness → better strategies            │
└────────────────────────────────────────────────────────────┘

The three layers work together:

Strategy Evolution uses Mind Evolution to generate and refine trading strategies through evolutionary search. Backtesting is the fitness function. The critic-author loop identifies weaknesses in each strategy. Multiple islands maintain diversity across strategy types.
Execution deploys winning strategies through a multi-agent architecture. Specialized analyst agents process different data streams (news, technicals, sentiment, fundamentals). Bull/bear debate forces consideration of counter-arguments. Risk management gates prevent catastrophic trades.
Memory and Learning captures trade outcomes and feeds them back into the evolution layer. Working memory handles current market context. Long-term memory accumulates institutional knowledge -- which strategy types work in which regimes, which data sources are predictive, which failure patterns to avoid.

The recursive loop: Strategy Evolution generates strategies → Execution runs them → Memory captures outcomes → outcomes inform the next generation of Strategy Evolution. Each cycle, the system gets better. Not because someone retrained it, but because it learned from its own experience.

The Self-Improvement Frontier: What Comes Next

The ICLR 2026 Workshop on Recursive Self-Improvement -- the first major academic workshop dedicated exclusively to RSI -- is organized around six questions: what changes, when changes happen, how changes are produced, where systems operate, how to ensure alignment, and how to evaluate progress. These same questions define the research frontier for RSI in trading:

What changes? Current systems improve prompts, memory, and strategy selection. The next frontier is weight-level self-improvement -- systems like FLAG-Trader that update the model itself through RL. This closes the loop entirely: the model that generates strategies is the same model that learns from their outcomes.

When do changes happen? Current systems improve between trading sessions. Real-time adaptation -- where the system refines its approach during the trading day based on intraday feedback -- requires faster inference and lower-latency evaluation.

How are changes produced? Evolutionary search (Mind Evolution), reinforcement learning (FLAG-Trader), self-critique (SEP), and memory accumulation (FinMem) are the four main mechanisms. Combining them -- using evolutionary search to generate strategy populations, RL to optimize execution within each strategy, self-critique to identify regime changes, and memory to accumulate institutional knowledge -- is the logical next step.

How do you ensure safety? A self-improving trading system without guardrails is a recipe for catastrophic losses. Risk management gates, position limits, drawdown circuit breakers, and human oversight for strategy-level changes are non-negotiable. The system can improve how it trades within a strategy, but changing the strategy itself should require human approval.

Conclusion

Recursive Self-Improvement is not about building a trading model that works once. It is about building a trading system that gets better every time it runs.

The pieces are already in place. Mind Evolution provides the evolutionary search architecture. Multi-agent frameworks provide the execution layer. Memory systems provide the learning mechanism. Backtesting provides the fitness function. RL provides the gradient signal for weight-level improvement.

The companies and research groups building this -- Poetiq for the underlying RSI architecture, Tauric Research for multi-agent trading frameworks, Bridgewater and D.E. Shaw for production deployment, the AI4Finance Foundation for open-source infrastructure -- are converging on the same design pattern: systems that trade, observe, reflect, and improve in an autonomous loop.

The trading systems of the next decade will not be trained and deployed. They will be evolved.

Key References:

Lee et al., "Evolving Deeper LLM Thinking" -- Mind Evolution paper
Poetiq -- RSI meta-system startup ($45.8M seed, founded by DeepMind veterans)
ICLR 2026 RSI Workshop -- First dedicated academic workshop on Recursive Self-Improvement
TradingAgents -- Multi-agent trading firm architecture
FLAG-Trader -- LLM as RL policy for trading (ACL 2025)
FinMem -- Memory-based self-evolving trading agent
SEP Framework -- Self-reflective stock prediction
Trading-R1 -- Curriculum RL for financial reasoning
AlphaEvolve -- Google DeepMind's evolutionary code generation
STOP -- Self-Taught Optimizer (Microsoft)
RISE -- Recursive IntroSpEction (NeurIPS 2024)

Keywords: recursive self-improvement trading, LLM trading agents, mind evolution finance, self-improving AI trading, evolutionary AI trading strategies, poetiq ai, recursive self-improvement LLM, AI hedge fund, quantitative trading AI agents, self-improving trading system