Hubble Uses LLMs to Automate Alpha Factor Discovery

Hubble presents an LLM-driven, agentic framework that automates alpha factor discovery while enforcing deterministic safety constraints. The system combines a domain-specific operator language with an AST-based execution sandbox and an evolutionary feedback loop that returns performance diagnostics to the LLM for iterative refinement. In experiments on a panel of 30 U.S. equities over 752 trading days, Hubble evaluated 181 syntactically valid factors from 122 unique candidates across 3 rounds and achieved a peak composite score of 0.827 with 100% computational stability. The framework prioritizes interpretability and reproducibility over opaque optimization, using cross-sectional RankIC, annualized Information Ratio, and portfolio turnover as its primary evaluation signals. For quant teams, Hubble demonstrates a practical pathway to integrate LLM-based search heuristics into controlled, auditable factor mining workflows.
What happened
Hubble introduces an agentic, LLM-driven framework for automated alpha factor discovery that enforces deterministic safety and interpretability constraints. The paper reports experiments on a panel of 30 U.S. equities across 752 trading days, evaluating 181 syntactically valid factors from 122 unique candidates over 3 rounds and reaching a peak composite score of 0.827 with 100% computational stability.
Technical details
Hubble constrains LLM generation using a domain-specific operator language and executes candidate formulas inside an AST-based sandbox to prevent unsafe operations and ensure deterministic evaluation. The system measures candidate factors with a statistical pipeline centered on:
- •RankIC (cross-sectional Rank Information Coefficient)
- •annualized Information Ratio
- •portfolio turnover
An evolutionary feedback mechanism returns top-performing factors and structured error diagnostics to the LLM, enabling iterative refinement across multiple generation rounds. The authors emphasize syntactic validation, computational stability checks, and a closed-loop that blends heuristic search with rigorous backtest hygiene.
Context and significance
Hubble directly addresses common failure modes of automated factor mining, including brittle, overfit formulas from genetic programming and opaque black-box generators. By pairing LLM creativity with a deterministic AST sandbox and explicit financial metrics, the framework trades unconstrained expressivity for reproducible, auditable discovery. This sits at the intersection of agentic LLM research and practical quant engineering: it leverages natural-language-driven search while preserving the kinds of guards quants require for model governance and backtest defensibility.
Limitations and caveats: The experiments use a modest universe and historical backtests; real-world performance will depend on transaction cost modeling, out-of-sample validation across regimes, and robustness to market microstructure effects. The LLM acts as a heuristic generator, not a guarantee of alpha, so human-in-the-loop validation and risk controls remain essential.
What to watch
Replication at scale, formalized out-of-sample testing protocols, integration of transaction-cost-aware simulators, and potential open-source releases of the operator language and sandbox. If those follow, Hubble could become a practical template for safe, auditable LLM-assisted searching in quantitative research.
Scoring Rationale
Hubble is a solid, methodologically interesting contribution that adapts agentic `LLM` search to a tightly constrained, auditable factor discovery workflow. Its immediate relevance is strongest for quant researchers and ML practitioners working on safe agentic systems. The paper is not yet production-validated at scale, and it is older than three days, so the practical impact is moderate.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

