Researchers predict protein cascade expression from H&E images
The medRxiv preprint "Predicting Protein Cascade Expression from H&E Images" (Alejandro Leyva et al., medRxiv, 2026 Jan 24; DOI 10.64898/2026.01.23.26344725) reports a method to estimate downstream protein expression from routine H&E whole-slide images using Reverse Phase Protein Array (RPPA) measurements from the Cancer Genome Atlas Breast Adenocarcinoma dataset (TCGA-BRCA). The paper targets five proteins from the apoptosis cascade and uses DNA damage and repair (DDR) proteins as a biological control, per the preprint. The authors compare patch-level Vision Transformers (ViT) against a cellular-level ViT they call CellRPPA (referred to as CellViT in parts of the manuscript), finding patch-level ViTs achieve R-squared values below 0.1 while CellViT attains R-squared values above 0.1 across five test folds, according to the preprint. The study is posted to medRxiv and indexed in PubMed as a preprint; PubMed notes it has not yet been peer reviewed.
What happened
The medRxiv preprint "Predicting Protein Cascade Expression from H&E Images" (Alejandro Leyva et al., medRxiv, 2026 Jan 24; DOI 10.64898/2026.01.23.26344725) presents a computational pipeline that links routine hematoxylin and eosin (H&E) whole-slide images to downstream protein expression measured by Reverse Phase Protein Array (RPPA). According to the preprint, the authors used RPPA data paired with WSIs from the Cancer Genome Atlas Breast Adenocarcinoma dataset (TCGA-BRCA) to predict the expression of five proteins drawn from the apoptosis cascade, with DNA damage and repair (DDR) cascade proteins used as a control. The manuscript reports that patch-level Vision Transformers (ViT) produced R-squared values below 0.1, while a cellular-level Vision Transformer named CellRPPA (also cited as CellViT in the paper) produced R-squared values above 0.1 across five test folds, per the preprint.
Technical details
The preprint frames the task as a regression problem linking morphology to protein intensities measured by RPPA. The authors describe a cellular-level ViT architecture (CellRPPA/CellViT) that operates on single-cell or cell-centric representations rather than large image patches; the paper contrasts this with standard patch-based ViTs and reports the comparative R-squared results cited above (medRxiv preprint). The study also compares predictive performance across biological pathways, reporting higher predictive signal for morphologically indicative cascades such as apoptosis versus the DDR control, per the preprint.
Editorial analysis
Methodologically, the paper targets a harder task than single-protein prediction
estimating downstream cascade-level signals that reflect propagated protein activity. Industry-pattern observations: prior digital pathology efforts have had limited success predicting bulk or single-marker proteomics from H&E when tissue-level signal is weak; architectures that incorporate cellular context or segmentation-derived features often outperform coarse patch-based models on tasks sensitive to cell state and microenvironment. For practitioners, a cellular-level transformer approach aligns with a broader trend to fuse single-cell morphology with spatial omics when available.
Context and significance
Editorial analysis: The work is notable for attempting cascade-level inference from routine H&E - which, if reproducible and robust across cohorts - could expand the actionable readouts derivable from archival histology. However, the reported R-squared thresholds (around 0.1) indicate limited explained variance; readers should treat results as early-stage proof-of-concept rather than production-ready biomarkers. The manuscript is posted as a preprint and indexed in PubMed; PubMed labels it as not yet peer reviewed (PubMed entry).
What to watch
For practitioners: look for peer-reviewed publication, external validation on independent cohorts beyond TCGA-BRCA, ablation studies showing which cellular features drive signal, and comparisons to spatial proteomics modalities. Observers should also watch whether CellRPPA implementations, code, and trained weights are released for reproducibility testing.
Scoring Rationale
This is a notable methodological preprint linking routine H&E to cascade-level proteomics, which matters to computational-pathology practitioners. Reported effect sizes are modest and the work is currently a preprint, so impact is limited until peer review and external validation.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems