Researchllmmathematicsbenchmarksresearch evaluation
Mathematicians Launch Challenge Testing AI Problem-Solving
8.2
Relevance Score
A team of 11 mathematicians led by Harvard and Stanford professors launched First Proof, unveiling 10 encrypted research problems on Feb. 5 and revealing solutions on Feb. 13. The problems span number theory, combinatorics, topology and numerical linear algebra, and were devised to benchmark AI systems; preliminary tests show leading LLMs solved only two problems. The effort aims to define AI's limits on research mathematics.


