EyeVLA Enables Robots Active Visual Perception
Researchers at Shanghai Jiao Tong University developed EyeVLA, a pan-tilt zoom robotic eyeball that uses Qwen2.5-VL (7B) and reinforcement learning to predict discrete camera action tokens. In indoor experiments EyeVLA acquired clearer, more detailed observations than fixed RGB-D cameras, learning from about 500 real-world samples and pseudo-labeled data. The system could improve embodied-robot perception for inspection, warehouse automation, and household robotics.
Key Points
- 1Implements EyeVLA: pan-tilt zoom camera with Qwen2.5-VL trained via reinforcement learning.
- 2Improves detail acquisition over fixed RGB-D setups by planning camera actions and zooms.
- 3Enables efficient active perception with ~500 real samples, useful for robotics and inspection.
Scoring Rationale
Demonstrates practical active-vision advances with solid experimental evidence; limited generality beyond indoor robot prototypes so far.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


