Researchmultimodal llmmultisensor fusionembodied perception
HoloLLM Enables Robust Multisensory Language-Grounded Perception
8.1
Relevance Score
In a Feb. 24, 2026 arXiv preprint, researchers introduce HoloLLM, a multimodal large language model that integrates LiDAR, infrared, mmWave radar and WiFi sensors for language-grounded human perception. They propose a Universal Modality-Injection Projector (UMIP) and a human-VLM collaborative data curation pipeline to align rare-sensor signals with text. Experiments on two new benchmarks report up to 30% accuracy improvement, advancing embodied multisensory intelligence.


