Transformer Model Predicts And Designs Biosynthetic Gene Clusters
Researchers (Kawano et al.) publish on February 23, 2026 a transformer-based framework that models protein domains as tokens to predict and design biosynthetic gene clusters, training on four datasets including 2,492 MIBiG-validated BGCs. Models ranked true domains first over 50% of the time, exceeded 70% classification accuracy for major classes, and experimental expression produced an unknown cyclooctatin derivative.
Scoring Rationale
Strong experimental validation and public models increase utility, but the approach targets microbial BGCs limiting immediate broader applicability.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
