TongueVLM Achieves Multimodal Tongue Diagnosis Accuracy

Researchers from Hefei University of Technology and collaborators developed TongueVLM, a multimodal large model for traditional Chinese medicine tongue-image diagnosis, published in JMIR Medical Informatics (2026). The LLaMA-based 7B-parameter model uses CLIP-ViT visual encoding and modal fusion, evaluated on three test datasets (3,000 samples each) and achieved 79.8%, 78.6%, and 60.7% accuracy, outperforming baseline VLMs.
Scoring Rationale
Strong peer-reviewed evaluation and clear empirical gains drive the score, but niche TCM focus limits wider impact.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalApplication of a Large Visual Language Model on Tongue Image Description Generation and Physical Constitution Reasoning in Traditional Chinese Medicine (TongueVLM): Model Development and Validation Stmedinform.jmir.org


