ReWiND Teaches Robot Policies Without Demonstrations

In their CoRL 2025 paper, Jiahui Zhang et al. introduce ReWiND, a three-stage framework that learns dense language-conditioned reward functions from a handful of demonstrations to train and fine-tune robot manipulation policies without per-task demonstrations. In simulation and real-world tests, ReWiND attains ~79% IQM success on unseen MetaWorld tasks and raises real-robot average success from 12% to 68% with about one hour of RL.
Key Points
- 1Introduces ReWiND three-stage method learning language-conditioned dense reward function from five demos
- 2Enables offline pretraining and online fine-tuning to adapt policies to unseen tasks efficiently
- 3Delivers strong gains: ~79% IQM in MetaWorld and real-robot success from 12% to 68%
Scoring Rationale
Strong methodological novelty and real-world results, limited to robotics manipulation niche and evaluated on specific task sets.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


