LLM Automates Multitier Health Content Tagging

In 2025, researchers at Fudan University and Shanghai health agencies developed and validated an automated tagging system using a fine-tuned Baichuan2-7B LLM to annotate Chinese health education resources. The work defined a 3-tier taxonomy (10 primary, 34 secondary, 90,562 tertiary tags) and evaluated the pipeline on 1,000 resources, finding AI-human agreement Cohen κ=0.54 versus human-human κ=0.32 and 90% precision for AI-added tags. The system aims to enable scalable precision health communication.
Key Points
- 1Develops 3-tier taxonomy and fine-tuned Baichuan2-7B pipeline; evaluates tagging on 1,000 resources.
- 2Demonstrates higher consistency: AI-human Cohen κ=0.54 versus human-human κ=0.32 baseline.
- 3Shows AI identifies missed tags with 90% expert-validated precision, improving annotation completeness.
Scoring Rationale
Strong empirical validation and expert adjudication across 1,000 samples; novelty limited to application-specific tagging deployment.
Sources
Public references used for this report.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problems


