FeatureDCA Enables Controllable Protein Sequence Generation
Caredda et al. (published Feb 19, 2026) introduce FeatureDCA, an autoregressive extension of Direct Coupling Analysis that conditions sequence generation on principal components derived from MSAs. The model matches or surpasses unconditioned Potts and autoregressive baselines in reproducing higher-order sequence statistics, preserves sequence diversity, and yields structures consistent with targets using AlphaFold and ESMFold. FeatureDCA enables interpretable, targeted sampling toward functional and structural subtypes for protein design.
Scoring Rationale
High novelty and peer-reviewed validation across families, supported by code release but scope remains focused on protein design.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

