Models & Researchllm embeddingsfinancial mlsupply chainfinbert

Paper Demonstrates Networked LLM Embeddings Predict Returns

|June 30, 2026|By LDS Team

6.6

Relevance Score

Paper Demonstrates Networked LLM Embeddings Predict Returns

A new arXiv paper reports that propagating FinBERT text embeddings of company 10-K filings across a supply-chain knowledge graph produces a statistically significant stock-return predictor, with a long-short portfolio achieving a 0.86 annualized Sharpe ratio and a 7.27% per-year Fama-French five-factor alpha (t = 2.30). The study, by researcher Asef Yilki, analyzed 255 S&P 500 firms' 10-K MD&A sections from 2011-2025, comparing plain LLM embeddings against network-augmented versions where firm-level textual signals spread through measured supplier and customer linkages; only the network-augmented factor (net_pc_5) showed significant predictive power (Newey-West t = -2.64) after controlling for momentum, volatility, and firm size. The results held up in out-of-sample tests, placebo experiments, sector-neutral portfolios, and subsample splits, though the paper is a single-author preprint awaiting peer review.

The interesting result for ML and quant teams is not that text embeddings predict returns, that is a known technique, it is that the predictive power only shows up once those embeddings are propagated through the supply-chain graph. Direct firm-level LLM embeddings alone were not the significant factor; the network-augmented version was. That is a concrete, reproducible example of relational structure adding information that a per-firm text model misses on its own.

What happened

Researcher Asef Yilki proposes an asset-pricing framework that augments LLM embeddings of annual report disclosures with supply-chain knowledge-graph propagation (arXiv:2606.29290). Using FinBERT embeddings of 10-K MD&A sections for 255 S&P 500 firms over 2011-2025, the paper builds two predictor sets: direct LLM embeddings and network-augmented embeddings where firm-level signals propagate through inter-firm supplier and customer linkages. Fama-MacBeth cross-sectional regressions find that the network-augmented factor, net_pc_5, carries significant return predictability (Newey-West t-statistic of -2.64) even after controlling for momentum, volatility, and firm size. A long-short portfolio sorted on net_pc_5 achieves an annualized Sharpe ratio of 0.86 and a Fama-French five-factor alpha of 7.27% per year (t = 2.30). The predictive power survives out-of-sample tests, placebo experiments, sector-neutralization, and subsample analysis.

Technical context

Prior financial-NLP work has generally treated firms as independent text-generating units; this paper's contribution is explicitly modeling inter-firm exposure via a measured supply-chain graph and testing whether propagated signals carry pricing-relevant information beyond each firm's own disclosures. The two implementable pieces are (1) extracting dense FinBERT embeddings from regulatory filings, and (2) applying a graph-propagation or diffusion operator to spread those embeddings along measured supplier/customer links before using them as a factor.

For practitioners

Teams building financial-NLP or factor-research pipelines get a concrete template here: the paper quantifies economic magnitudes (0.86 Sharpe, 7.27% alpha) large enough to justify follow-up research, and it isolates the network-propagation step, not just the embedding step, as the source of the effect. That is a useful, falsifiable claim to test against an independent dataset or embedding model before relying on it in production.

What to watch

This is a single-author arXiv preprint (submitted June 28, 2026) that has not yet been peer-reviewed, so independent replication on other datasets, embedding families (beyond FinBERT), and propagation kernels is the key next step. Watch for follow-up work testing whether the effect persists across different market regimes and whether it survives transaction-cost-adjusted backtesting, since a 0.86 Sharpe on a long-short factor can look very different after realistic trading frictions.

Key Points

1A new arXiv paper finds network-propagated FinBERT embeddings, not raw firm-level embeddings, predict stock returns significantly.
2A long-short portfolio built on the network-augmented factor achieved a 0.86 Sharpe ratio and 7.27 percent annual alpha.
3Results held across out-of-sample, placebo, and sector-neutral tests, but the single-author preprint still awaits peer review.

Scoring Rationale

A methodologically concrete, well-tested (out-of-sample, placebo, sector-neutral) paper showing that network-propagated LLM text embeddings, not raw embeddings, produce a statistically and economically significant return factor. Notable for financial-ML practitioners, but it remains a single-author, non-peer-reviewed preprint with backtested (not live) results.

MoreAI Research news

Sources

Public references used for this report.

1 source

arxiv.orgSupply Chain Propagation of Textual Signals: LLM Embeddings and Cross-Sectional Return Predictability

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Models & Researchllm embeddingsfinancial mlsupply chainfinbert

Paper Demonstrates Networked LLM Embeddings Predict Returns

|June 30, 2026|By LDS Team

6.6

Relevance Score

What happened

Technical context

For practitioners

What to watch

Key Points

1A new arXiv paper finds network-propagated FinBERT embeddings, not raw firm-level embeddings, predict stock returns significantly.
2A long-short portfolio built on the network-augmented factor achieved a 0.86 Sharpe ratio and 7.27 percent annual alpha.
3Results held across out-of-sample, placebo, and sector-neutral tests, but the single-author preprint still awaits peer review.

Scoring Rationale

MoreAI Research news

Sources

Public references used for this report.

1 source

arxiv.orgSupply Chain Propagation of Textual Signals: LLM Embeddings and Cross-Sectional Return Predictability

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Paper Demonstrates Networked LLM Embeddings Predict Returns

What happened

Technical context

For practitioners

What to watch

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Researchers Test Machine Learning Wavefront Sensing on TOTO

GNN Force Fields Model Metallic Spin Dynamics

MSGNN Models Signed and Directed Graphs With a Magnetic Laplacian

Samsung and SK Hynix Post Record AI-Memory Profits

Paper Demonstrates Networked LLM Embeddings Predict Returns

What happened

Technical context

For practitioners

What to watch

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Researchers Test Machine Learning Wavefront Sensing on TOTO

GNN Force Fields Model Metallic Spin Dynamics

MSGNN Models Signed and Directed Graphs With a Magnetic Laplacian

Samsung and SK Hynix Post Record AI-Memory Profits