Researchers integrate ML binding predictions into mechanistic signaling models
A study by Holly A. Huber and Stacey D. Finley of the University of Southern California presents a multiscale, probabilistic framework that uses Bayesian inference to fold machine-learning predictions of protein binding affinity into mechanistic ODE models of cell signaling. The pipeline chains AlphaFold 3, which predicts a protein-complex structure from amino acid sequence, with PPI-Affinity, an SVM model that estimates binding affinity, then integrates those estimates using affine-invariant MCMC and quantifies the information gained via KL divergence. Tested on the well-characterized EGFR and GPCR signaling models, the augmented data consistently sharpened estimates of protein unbinding rates, pushing them toward experimentally reported values. First released as a 2025 bioRxiv preprint, the work has now appeared in PLOS Computational Biology. The authors note it is the first such approach tested on dynamic signaling models, with code released openly.
What happened
Holly A. Huber and Stacey D. Finley of the University of Southern California have published a multiscale, probabilistic modeling framework that uses Bayesian inference to augment mechanistic models of cell signaling with machine-learning predictions of protein binding affinity. First posted as a bioRxiv preprint in 2025, the study has now appeared in PLOS Computational Biology (DOI 10.1371/journal.pcbi.1014321).
The problem
Mechanistic models of intracellular signaling, typically systems of ordinary differential equations (ODEs), are often underdetermined: key parameters such as binding and unbinding rates are rarely measured directly, and the available time-series data on protein concentrations are sparse, relative, and noisy. The authors set out to bring more abundant data, namely amino acid sequences and protein structures, to bear on this parameter-estimation problem despite the mismatch in scale.
The method
The framework chains two web-based ML tools into a pipeline that predicts a binding-affinity parameter (KD) for each reaction. When an experimental complex structure is unavailable, AlphaFold 3 predicts the structure from amino acid sequence, and PPI-Affinity, a support-vector-machine regression model, then estimates the binding free energy, which is converted to KD. These predictions are folded into the model's parameters through a Bayesian likelihood, with sequence data drawn from UniProt and structures from the Protein Data Bank. The team runs affine-invariant ensemble MCMC and uses Kullback-Leibler (KL) divergence to quantify how much information the added data contribute, comparing a baseline dataset against an augmented one.
What they found
Across two well-established test cases, EGFR signaling (a 50-parameter model) and GPCR signaling (an 8-parameter model, biologically notable because roughly a third of FDA-approved drugs target GPCRs), the ML pipeline beat an uninformative prior at predicting KD. Augmenting with sequence and structure data consistently yielded the most information about protein unbinding rates and significantly improved those estimates, nudging them toward experimentally reported values. Predictions on held-out test data were not fundamentally changed, though other model outputs shifted more, an effect the authors trace to each output's sensitivity to unbinding rates.
Why it matters
The work is a concrete example of coupling AlphaFold-era structure and affinity prediction with mechanistic, dynamic systems-biology models in a way that makes uncertainty explicit. The authors describe it as the first such approach tested on dynamic signaling models, and they release their code openly, offering modelers a reusable route to constrain underdetermined parameters.
Caveats and what to watch
The pipeline was evaluated on just 10 binding reactions, with only one experimental complex structure available, so generalization across larger reaction sets remains to be shown. Worth watching: independent benchmarks on additional signaling models, and whether improvements in affinity predictors further tighten parameter estimates.
Scoring Rationale
Now peer-reviewed in PLOS Computational Biology (originally a 2025 bioRxiv preprint), the study couples AlphaFold 3 and an ML affinity predictor with mechanistic ODE signaling models via Bayesian inference, releases open code, and shows that sequence and structure data most improve unbinding-rate estimates on EGFR and GPCR test cases. It is a solid, methodologically interesting advance of primary interest to systems-biology and scientific-ML practitioners rather than a broad industry development.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

