Large Language Models Struggle With DFA Construction

Researchers introduce a benchmark testing large language models' ability to construct deterministic finite automata (DFAs) from regular-language descriptions, submitted Jan. 19, 2026. Models achieve perfect accuracy on factual items and 84–90% on seen constructions, but accuracy drops 30–64% on unseen, handcrafted and Arden's-theorem-generated problems; failures include constraint misinterpretation, Kleene-star errors, and global inconsistency, while a hint protocol only partially corrects shallow mistakes.
Scoring Rationale
Moderate empirical novelty and broad relevance, limited by single preprint source and constrained problem scope.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems

