Models & Researchdeep learningepigeneticsgenomicsbioinformatics

DeepMethylation delivers tissue-specific DNA methylation predictions

||By LDS Team
5.4
Relevance Score
DeepMethylation delivers tissue-specific DNA methylation predictions

A team at the Shanghai Institute of Nutrition and Health published DeepMethylation on July 1, 2026 in *PLOS Computational Biology*, a deep learning framework that predicts CpG DNA methylation status genome-wide by combining DNA sequence with tissue-specific epigenomic features. The model reports an average AUROC of 0.909 across nine tissues and, according to the paper, can impute methylation at sites not covered by the EPIC array and extend older 450k array data to EPIC-level resolution. A companion model, Delta DeepMethylation (DDM), estimates how individual SNPs affect nearby CpG methylation, with predicted effects consistent with known methylation QTLs but less confounded by linkage disequilibrium. Code and a demo are available on GitHub.

For genomics and bioinformatics teams, this offers a practical way to cut the cost of methylation profiling: rather than always running new EPIC arrays or whole-genome bisulfite sequencing, researchers can train a model per tissue and computationally fill in unmeasured CpG sites, or upgrade older 450k-array datasets to EPIC-level coverage, extending the useful life of legacy data.

What happened

Researchers Wenran Li, Shijia Yu, Yingyu Cheng, and Sijia Wang, based at the Shanghai Institute of Nutrition and Health (Chinese Academy of Sciences), published DeepMethylation in PLOS Computational Biology on July 1, 2026. The framework uses a CNN-based module to read local DNA sequence and an MLP-based module for tissue-specific epigenomic annotations (chromatin accessibility, histone marks, transcription factor binding, and genomic position features), then combines both to predict CpG methylation status. Trained and tested on GTEx EPIC array data across nine tissues (blood, lung, breast, kidney, ovary, prostate, testis, colon, and skeletal muscle), the model achieved an average accuracy of 0.847 and AUROC of 0.909, which the authors report as outperforming the prior CNN-based baseline MRCNN (accuracy 0.819, AUROC 0.852) on blood data. The paper also introduces Delta DeepMethylation (DDM), which compares predicted methylation for reference versus alternate allele sequences around a CpG site to estimate a SNP's regulatory effect; the authors report DDM's predicted effects track known methylation QTLs (mQTLs) without the linkage-disequilibrium confound that affects standard mQTL association analysis.

Technical context

The model was trained on 754,119 CpG sites from the Illumina EPIC array (GTEx v8 cohort) plus 25 engineered epigenomic features, and validated against roughly 24 million CpG sites from whole-genome bisulfite sequencing data. Cross-tissue transfer held up reasonably well (AUROC/AUPRC above 0.83 in all 81 tissue-pair comparisons), though performance dropped on CpG sites not covered by any array (whole-genome AUROC of 0.618 versus 0.712 on EPIC-covered sites), which the authors flag as a harder prediction regime. Code, a demo, and usage instructions are posted on the authors' GitHub repository.

For practitioners

Teams working on epigenetic biomarker discovery, aging clocks, or variant-to-function pipelines get an open, tissue-specific model that can be retrained or fine-tuned on GTEx-style epigenomic annotations without needing new sequencing. The DDM variant-scoring approach is a useful complement to existing sequence-based regulatory variant tools (the paper benchmarks it favorably against DeepSEA2.0/Beluga), particularly where LD is a concern in fine-mapping candidate CpG-modifying variants.

What to watch

This is a single peer-reviewed paper published today with no independent replication or third-party coverage yet; the reported AUROC and benchmark comparisons come from the authors' own evaluation. Watch for independent reproduction using the public GitHub code, and for whether the model generalizes to tissues and populations beyond the GTEx cohort it was trained on.

Key Points

  • 1DeepMethylation combines DNA sequence and tissue-specific epigenomic features to predict genome-wide CpG methylation with average AUROC 0.909 across nine tissues.
  • 2The model extends coverage beyond array-measured sites and lets researchers upgrade legacy 450k methylation data to EPIC-level resolution without new wet-lab assays.
  • 3Its companion DDM model scores how SNPs affect nearby methylation, giving genomics teams a less LD-confounded way to prioritize regulatory variant candidates.

Scoring Rationale

A solid, methodologically sound bioinformatics tool with strong reported benchmarks (AUROC 0.909) and a useful variant-effect companion model, but it is a single freshly published paper with no independent corroboration, narrow (research-community) audience, and incremental rather than field-altering impact.

Sources

Public references used for this report.

1 source

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Health & Insurance problems