MIRAGE translates fMRI signals into mental images
Researchers describe MIRAGE, a method for reconstructing internally generated visual imagery from fMRI by training on perception datasets, per the arXiv preprint (May 16, 2026). The authors analyze the recently released NSD-Imagery benchmark and report that state-of-the-art vision decoders do not reliably generalize from seen images to mental images. Per the paper and accompanying CCNeuro poster, MIRAGE uses a linear ridge-regression backbone, multi-modal conditioning with text and low-dimensional image embeddings, and a diffusion-stage generator (Stable Cascade). The preprint reports that feature-based metrics and human raters establish MIRAGE as state-of-the-art on NSD-Imagery; ablation studies in the paper attribute gains to lower-dimensional embeddings, text guidance, and low- plus high-level image features. Editorial summaries and a literature review corroborate the robustness focus and the training-on-perception-data experimental design.
What happened
According to the arXiv preprint (submitted May 16, 2026) and a CCNeuro proceedings PDF, the authors introduce MIRAGE, a pipeline designed to decode internally generated visual imagery from fMRI signals using models trained on perceptual data. The paper analyzes the NSD-Imagery benchmark and reports that achieving state-of-the-art performance on seen-image reconstruction does not guarantee comparable performance on mental-image reconstruction. Per the authors, MIRAGE combines a linear ridge-regression backbone with multi-modal conditioning (text plus low-dimensional image embeddings) and uses the Stable Cascade diffusion model as the generative decoder. The preprint reports that both automated feature metrics and human rater evaluations establish MIRAGE as state-of-the-art on NSD-Imagery, and includes ablation studies highlighting the contributions of lower-dimensional embeddings and multi-modal guidance.
Technical details
Per the CCNeuro PDF and arXiv preprint, MIRAGE trains exclusively on the Natural Scenes Dataset (NSD) perceptual data and applies the learned decoder to the NSD-Imagery mental-imagery benchmark. The authors report using a ridge-regression mapping from voxel-space to target features, with multi-modal conditioning vectors that combine text-derived semantics and image-derived features of relatively low dimensionality. The pipeline feeds those features into a diffusion-stage image generator, which the paper identifies as Stable Cascade. Ablation experiments reported in the manuscript evaluate:
- •embedding dimensionality
- •inclusion of text guidance
- •combinations of low- and high-level image features; the paper attributes measured performance gains on mental-image trials to using smaller feature dimensions and explicit multi-modal guidance (arXiv preprint; CCNeuro PDF)
Editorial analysis: The technical choices reported, a linear, regularized decoder and emphasis on lower-dimensional, multi-modal conditioning, align with approaches that trade model complexity for robustness under low signal-to-noise ratio (SNR). For fMRI signals representing mental imagery, where task-evoked responses are weaker and more variable, simpler linear mappings plus semantically rich conditioning can reduce overfitting to perceptual idiosyncrasies and stabilize the downstream generative step.
Context and significance
Decoding internal mental content from brain activity sits at the intersection of neuroscience, machine learning, and generative modeling. The authors' claim that perceptual training sets can produce effective mental-image decoders, if paired with the right architecture, is notable because large-scale perceptual fMRI datasets are far more common than curated mental-imagery collections. If the paper's reported results replicate across subjects and datasets, this lowers a practical barrier for research groups that lack dedicated imagery data but have perceptual recordings.
Editorial analysis: For practitioners building brain-to-image systems, the manuscript reinforces a recurring pattern: robustness to measurement noise often benefits from lower-capacity decoders and stronger, semantically aligned priors at generation time. The reported human-rater and feature-metric evaluations are important because pixel-wise similarity can be misleading for subjective imagery reconstruction.
What to watch
- •Replication: whether independent groups reproduce the reported SOTA numbers on NSD-Imagery and other imagery datasets, per the arXiv preprint.
- •Subject generalization: cross-subject and cross-session stability of the ridge-regression mappings.
- •Ablation extension: whether the dimensionality and text-guidance effects hold when swapping different embedding families or diffusion decoders.
- •Ethical and practical evaluation: human-rater protocols and privacy implications for cerebral decoding research.
Overall, the manuscript provides a concrete architecture and ablation evidence that training on large perceptual fMRI datasets, combined with conservative decoder choices and multi-modal guidance, can improve mental-image reconstruction performance as reported by the authors on NSD-Imagery.
Scoring Rationale
This paper reports a methodological advance for fMRI-to-image decoding with empirical SOTA claims on the NSD-Imagery benchmark. The result is notable for researchers working on neural decoding and generative interfaces but is narrower in scope than general foundation-model releases.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

