Transformers Develop Shape Perception From Embodiment
Pandey, Samantha M. W. Wood, and Justin N. Wood publish on December 15, 2025 in PLoS Computational Biology, presenting computational evidence that shape perception develops from three ingredients: generic fitting models, embodied visual experiences, and biologically plausible retinas. They train transformer models in embodied simulated environments, run controlled-rearing experiments showing view diversity drives shape learning, and demonstrate retinal preprocessing can substitute for artificial augmentations, offering a practical template for machine vision.
Key Points
- 1Show that transformer models trained on embodied visual streams shift from color to shape-based representations.
- 2Demonstrate view diversity (many temporally linked views) is the causal factor producing shape-centric perception.
- 3Advise practitioners to use embodied data collection and retina-like preprocessing to build robust shape representations.
Scoring Rationale
Demonstrates clear computational mechanism via controlled-rearing and transformers, limited by absence of in-vivo biological validation.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems