Large Language Models Match Logistic Regression Diagnostic Accuracy

Researchers benchmarked multiple LLMs in 2026 using natural-language prompts derived from PPMI structured clinical variables to classify Parkinson disease. On a 122-participant test set, logistic regression achieved 0.960 macro F1 (accuracy 0.975) while LLM few-shot prompting reached up to 0.987 F1 (accuracy 0.992); on a 31-participant temporal validation, LLMs achieved up to 0.968 F1 versus LR 0.903. The study reports prompt sensitivity, stochastic variability, and limited temporal sample size.
Scoring Rationale
Relevant and peer-reviewed with moderate novelty; small temporal validation and exploratory design significantly limit generalizability.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problems

