Power posteriors enable robust mutational signature discovery
Researchers from Harvard Biostatistics, Boston University, and Dana-Farber Cancer Institute have published BayesPowerNMF, a Bayesian non-negative matrix factorization (NMF) method for discovering mutational signatures in cancer genomes, in PLOS Computational Biology. The approach uses a power posterior to improve robustness when the standard NMF model is misspecified, and a sparsity-inducing prior to automatically infer the number of active signatures - removing a key manual step in competing tools. In simulation studies, BayesPowerNMF recovers more true signatures with lower cosine error than leading methods SigProfilerExtractor and SignatureAnalyzer. Applied to whole-genome sequencing data for six cancer types from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG), the method recovers more signatures than current state-of-the-art. Unlike point-estimate methods, BayesPowerNMF provides credible intervals that quantify uncertainty in each inferred signature.
Background
Mutational signature analysis decomposes a cancer genome's mutation catalog into characteristic frequency profiles - "signatures" - left by distinct mutational processes such as carcinogenic exposures, defective DNA repair, or APOBEC deaminase activity. Non-negative matrix factorization (NMF) is the dominant computational framework for this task, underpinning widely used tools like SigProfilerExtractor and SignatureAnalyzer. However, the standard NMF model is an approximation, and even modest departures from the assumed probability model can cause methods to miss real signatures or infer spurious ones, per the PLOS Computational Biology paper (Xue et al.).
Method
Per the paper, BayesPowerNMF uses a power posterior for a fully Bayesian NMF model. A power posterior discounts the likelihood by a fractional exponent (the "temperature"), making inference less sensitive to model misspecification without abandoning the Bayesian framework. The method also uses a sparsity-inducing prior to automatically infer the number of active signatures, removing the need for a separate model-selection step required by competing tools. As a fully Bayesian approach, BayesPowerNMF produces credible intervals that quantify uncertainty in each inferred signature - a feature absent in point-estimate methods like SigProfilerExtractor and SignatureAnalyzer.
Evaluation
Per the paper, extensive simulation studies show BayesPowerNMF recovers more true signatures with greater accuracy and lower cosine error compared to leading methods when the generating model is misspecified. On real whole-genome sequencing data for six cancer types from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG), the method recovers more signatures than the current state-of-the-art.
Significance for practitioners
For data scientists and computational biologists working in cancer genomics, a more robust signature extractor with automatic signature-count inference reduces a common failure mode - choosing the wrong number of NMF components - and provides principled uncertainty estimates for downstream annotation and attribution tasks. The PLOS Computational Biology publication (peer-reviewed) increases confidence in the method relative to preprint-only tools.
What to watch
Independent benchmarks comparing BayesPowerNMF against newer tools and updates to SigProfiler and SignatureAnalyzer; community uptake via citation and software reuse; and practical guidance on tuning the power posterior temperature hyperparameter for new cancer cohorts.
Scoring Rationale
A published, peer-reviewed Bayesian methods contribution to cancer mutational signature analysis with a clear methodological advantage over SigProfilerExtractor and SignatureAnalyzer. Relevant primarily to computational biologists and cancer genomics practitioners; not broadly impactful across AI/ML.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

