Researchers Release GENERator Long-Context Genomic Model

Researchers released GENERator, a generative genomic foundation model (submitted to arXiv Jan 22, 2026) that models DNA with a 98,000-nucleotide context and is pre-trained on 386 billion eukaryotic nucleotides. Without fine-tuning it yields phylogenetically coherent embeddings and competitive zero-shot variant effect prediction; task-specific fine-tuning achieves state-of-the-art benchmarks and enables design of protein-coding sequences and cis-regulatory elements validated by UMI-STARR-seq.
Scoring Rationale
High novelty and broad applicability from large-scale, long-context pretraining, offset by preprint status and limited peer review.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
