Bolmo Converts Olmo 3 Into Byte-Level Models

Today Allen Institute for AI introduces Bolmo, a new family of byte-level language models that byteify Olmo 3 into Bolmo 7B and Bolmo 1B, achieving competitive or superior performance to subword models. Bolmo reuses the Olmo 3 transformer with a lightweight local encoder and boundary predictor, requiring 9.8B tokens for initial training and 39.3B tokens for full fine-tuning, and improves character-level benchmarks by nearly 20 points.
Key Points
- 1Demonstrates byteifying Olmo 3 into Bolmo 7B and 1B with retained transformer backbone
- 2Shows substantial character-level gains—nearly twenty-point accuracy improvement on CUTE/EXECUTE benchmarks
- 3Enables practical deployment: faster training, tunable bytes-per-patch compression, and zero-cost capability transfer
Scoring Rationale
High novelty and practical applicability across models, supported by public benchmarks; limited to compatibility with Olmo-family embeddings.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems