Mamba SSM Applies LLM Reasoning to Refine Biomarker Candidates

A short paper and arXiv preprint introduce a pipeline that combines a Mamba SSM trained on TCGA-BRCA RNA-seq with structured chain-of-thought LLM evaluation (DeepSeek-R1) to filter gradient-saliency gene lists for biomarker discovery. Starting from the top-50 gradient-salient genes, the LLM reduces the set to 17 genes; the LLM-filtered set yields AUC 0.927 on the held-out split, outperforming a 5,000-gene variance baseline (AUC 0.903) and the raw 50-gene list (AUC 0.832). A faithfulness audit against COSMIC CGC, OncoKB, and PAM50 shows only 6 of 17 selected genes are validated BRCA biomarkers and key known genes such as FOXA1 were missed. The result highlights a practical trade-off: targeted confounder removal via LLM reasoning can boost predictive performance despite incomplete biological recall.
What happened
The authors present a hybrid pipeline that pairs a gradient-saliency feature extractor from a sequence model, labeled Mamba SSM, with structured chain-of-thought evaluation from an LLM, DeepSeek-R1, to refine candidate biomarkers from TCGA-BRCA RNA-seq. From the top-50 gradient-saliency genes the LLM selects a 17-gene set that achieves AUC 0.927 on holdout classification. By contrast, the raw 50-gene saliency set attains AUC 0.832, and a high-dimensional variance-based baseline using 5,000 genes achieves AUC 0.903, so the LLM-filtered set matches and exceeds larger baselines while using 294x fewer features.
Technical details
The pipeline trains a Selective State Space Model, the Mamba SSM, on TCGA-BRCA RNA-seq and uses gradient saliency to propose the top-50 candidate genes. Each candidate is then evaluated by DeepSeek-R1 using structured chain-of-thought prompts to probe causal relevance and flag tissue-composition confounders. Key empirical findings:
- •The LLM-filtered 17-gene set yields AUC 0.927 on the held-out test split.
- •The raw top-50 saliency genes give AUC 0.832; a 5,000-gene variance baseline gives AUC 0.903.
- •Faithfulness auditing against curated resources (COSMIC CGC, OncoKB, PAM50) finds 6 of 17 genes validated; the recall for known BRCA genes in the input is 0.375, and notable biomarkers like FOXA1 were missed.
Context and significance
This work probes a topical question for ML-enabled biology: can LLM chain-of-thought reasoning act as a faithful filter to remove confounded features and improve downstream predictive performance? The paper documents a clear performance gain from targeted LLM filtering while exposing a disconnect between predictive utility and domain-level faithfulness. The authors term this outcome "selective faithfulness": the LLM can remove specific confounders that matter for classifier performance without recovering the full set of biologically validated biomarkers. For practitioners, that means LLM-assisted feature refinement can produce compact, high-performing signatures but should not be treated as a substitute for experimental validation or exhaustive domain curation.
What to watch
Validate the approach across additional cancer cohorts and noncancer datasets, run robustness checks under batch effects and adversarial confounders, and compare different LLM prompting strategies or model architectures. The balance between predictive efficiency and biological fidelity is the key open question for adoption in discovery pipelines.
Scoring Rationale
This is a notable methods paper that demonstrates a practical hybrid use of LLM chain-of-thought for feature refinement with measurable downstream gains. It is not a paradigm shift, but it highlights an important trade-off between predictive performance and biological faithfulness that practitioners must consider.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

