Models & Researchchatgptopenaiuniversity examsmultimodal evaluation

ChatGPT Outperforms Humans on Japan Entrance Exams

|April 27, 2026

7.3

Relevance Score

ChatGPT Outperforms Humans on Japan Entrance Exams — Photo: static.bangkokpost.com · rights & takedowns

According to LifePrompt Inc., OpenAI's ChatGPT 5.2 Thinking model scored higher than the top human applicants in the 2026 entrance exams for the University of Tokyo and Kyoto University. Per LifePrompt, the model scored 503 out of 550 on the University of Tokyo Natural Sciences exam and 452 out of 550 on Humanities and Social Sciences, exceeding the top successful applicant marks of 453 and 434 respectively. LifePrompt reported a perfect score in mathematics, 90% in English, and 25% on essay-style history questions; essay responses were graded by teachers from cram school Kawai Juku. For Kyoto University, LifePrompt reported 771 in the Faculty of Law and 1,176 in the Faculty of Medicine, both above the highest passing scores. Kyodo News, The Straits Times, and Bangkok Post carried reporting based on LifePrompt's tests.

What happened

According to LifePrompt Inc., OpenAI's ChatGPT 5.2 Thinking model took the 2026 undergraduate entrance exams for the University of Tokyo and Kyoto University and achieved scores higher than the highest successful human applicants, as reported by Kyodo News, The Straits Times, and Bangkok Post. Per LifePrompt, the model scored 503 out of 550 on the University of Tokyo Natural Sciences exam and 452 out of 550 on Humanities and Social Sciences; the university-listed top scores were 453 and 434 respectively. LifePrompt also reported a perfect score in mathematics, 90% in English, and 25% on essay-style questions such as World History. For Kyoto University, LifePrompt reported 771 in the Faculty of Law exam versus a top passing score of 734, and 1,176 in the Faculty of Medicine exam versus 1,098 for the top scorer. The tests were run by converting exam questions into image data and feeding them to the model; essay answers were graded by teachers from cram school Kawai Juku, per the reporting. Kyodo News and other outlets noted LifePrompt's prior experiments, including a failed 2024 run with ChatGPT 4 and a subsequent passing-level result with a model named o1 in 2025. The LifePrompt head Satoshi Endo was quoted saying, "The AI's capabilities have been well documented. Given the rapid pace of AI evolution, companies will need to adopt AI with an eye toward how business operations will look in 10 to 20 years," in coverage by Kyodo News and Bangkok Post.

Editorial analysis - technical context

The LifePrompt tests highlight two technical points relevant to practitioners. First, using image-based inputs to evaluate a language model requires a reliable image-to-text pipeline: question conversion, layout preservation, and OCR accuracy materially affect what the model actually receives. Second, the split performance pattern reported -- near-perfect scores on structured mathematical problems and weaker results on open-ended essays -- is consistent with recent improvements in chain-of-thought reasoning and symbolic problem solving in large multimodal models, contrasted with persistent weaknesses in long-form subjective composition and world knowledge framing. Observers should treat the numerical scores as results of a specific evaluation setup rather than as an absolute measure of general intelligence; in this case essays were human-graded, adding a subjective element to final tallies.

Industry context

Industry observers have used standardized tests as repeatable probes of model capabilities because they combine domain knowledge, reasoning, and time-limited problem solving. Reporting on these Japan entrance exam results fits that pattern: public-facing academic benchmarks are low-friction ways to show progress to nontechnical audiences. At the same time, such demonstrations do not directly translate to real-world deployment readiness for education, proctoring, or assessment tasks without addressing test security, adversarial input, and human-evaluation variability.

What to watch

Observers and practitioners should look for:

•reproducibility and methodology disclosure from LifePrompt or independent researchers, including exact prompts and OCR pipeline details
•cross-evaluator agreement on essay grading and whether rubric-based scoring changes relative rankings
•follow-up evaluations on adversarially framed questions and time-constrained problem solving. Additionally, official statements from the examined institutions or test authorities about permitted use of their materials would clarify policy and integrity implications

Scoring Rationale

The results demonstrate meaningful progress in multimodal and reasoning capabilities that matter to ML practitioners, but they are a single, lab-style evaluation with methodological caveats and limited implications for deployment.

MoreOpenAI news

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Models & Researchchatgptopenaiuniversity examsmultimodal evaluation