Researchqwen2.5 vlactive visionpan tilt zoomreinforcement learning

EyeVLA Enables Robots Active Visual Perception

|December 3, 2025|By LDS Team

7.0

Relevance Score

EyeVLA Enables Robots Active Visual Perception — Photo: hackster.imgix.net · rights & takedowns

Researchers at Shanghai Jiao Tong University developed EyeVLA, a pan-tilt zoom robotic eyeball that uses Qwen2.5-VL (7B) and reinforcement learning to predict discrete camera action tokens. In indoor experiments EyeVLA acquired clearer, more detailed observations than fixed RGB-D cameras, learning from about 500 real-world samples and pseudo-labeled data. The system could improve embodied-robot perception for inspection, warehouse automation, and household robotics.

Key Points

1Implements EyeVLA: pan-tilt zoom camera with Qwen2.5-VL trained via reinforcement learning.
2Improves detail acquisition over fixed RGB-D setups by planning camera actions and zooms.
3Enables efficient active perception with ~500 real samples, useful for robotics and inspection.