LLMs Extract Drug Discontinuations From Estonian EHRs

Per a JMIR preprint by Suvalov et al., researchers combined prescription records with free-text anamneses from a 10% sample of the Estonian population (2012-2019) to identify drug discontinuation events and their reasons. The study applied Llama-3.1-70B and GPT-4o to extract discontinuation phrases, map them into a clinician-developed taxonomy, and label who initiated the stoppage; performance was evaluated on 100 randomly selected cases per drug group (statins and antidiabetic medications), according to the preprint. This work demonstrates a practical application of LLMs to a low-resource language for pharmacoepidemiology, highlighting both potential gains for large-scale adherence research and the need for careful validation on clinical free text.
What happened
Per a JMIR preprint by Suvalov et al., the authors merged prescription data with free-text clinical anamneses from a 10% sample of the Estonian population covering 2012-2019. The study targeted discontinuations for statins and antidiabetic medications and applied two large language models, Llama-3.1-70B and GPT-4o, to:
- •extract discontinuation phrases
- •classify reasons using a clinician-developed taxonomy
- •identify whether the patient or clinician initiated the discontinuation. Performance was measured on 100 randomly chosen cases per drug group, as reported in the preprint
Technical details
The preprint documents using Llama-3.1-70B and GPT-4o for information extraction and classification from Estonian-language clinical notes. The authors developed a taxonomy of discontinuation reasons with clinician input and applied the models to link free-text evidence to structured prescription records. The manuscript presents validation on a held-out sample; exact performance metrics are reported in the preprint.
Context and significance
Applying LLMs to extract clinically relevant events from free text addresses a long-standing barrier in pharmacoepidemiology: important discontinuation rationale is frequently recorded only in narrative notes. Systems that successfully pair prescriptions with extracted reasons can enable higher-fidelity signal detection for side effects, inefficacy, or access barriers. A concurrent Harvard / Brigham and Women's Hospital preprint (arXiv 2506.11137) covers the same problem on English EHR datasets, demonstrating that LLM-based medication status extraction scales without human annotation - reinforcing the broader applicability of this approach.
What to watch
Observers should watch for the peer-reviewed final JMIR publication for full performance metrics and error analysis, replication on other languages or EHR systems, and whether the authors publish the taxonomy, annotation guidelines, or evaluation code to enable reproducibility. External replication and transparent error breakdowns (false positives versus false negatives, initiator misclassification) will determine practical utility for downstream clinical research.
Scoring Rationale
A solid niche preprint demonstrating LLM application to pharmacoepidemiology in a low-resource (Estonian) language, using population-scale prescription and free-text EHR data. Relevant to clinical NLP and pharmacoepidemiology practitioners but limited by single-country scope, small evaluation set (100 cases per drug group), and preprint status pending peer review.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

