Joint Reward Modeling Improves Vision-Language Evaluation

Researchers led by Yankai Yang (submitted Feb 7, 2026) introduce Joint Reward Modeling (JRM), which jointly trains preference learning and language modeling on a shared vision-language backbone to evaluate image-editing outputs more efficiently. JRM internalizes generative models' semantic reasoning into discriminative representations, achieves state-of-the-art results on MMRB2 and EditReward-Bench, and improves stability and performance in downstream online reinforcement learning.
Scoring Rationale
Strong methodological novelty and state-of-the-art empirical results, tempered by being a single arXiv preprint without peer review.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

