Researchexperience memorygnnreinforcement learningrlhf

MemReward Improves RL Fine-Tuning With Graph Rewards

|March 23, 2026|By LDS Team

9.2

Relevance Score

MemReward Improves RL Fine-Tuning With Graph Rewards

Researchers introduce MemReward, a graph-based experience memory framework that stores LLM rollouts as heterogeneous graph nodes and uses a GNN to propagate sparse reward labels during reinforcement learning fine-tuning; authors submitted the paper to arXiv on March 13, 2026. Experiments on Qwen2.5-3B and 1.5B across mathematics, question answering, and code generation show MemReward reaches 97.3% of Oracle performance with 20% labels on 3B and 96.6% on 1.5B, scaling to 99.4% at 70% labels and surpassing Oracle on out-of-domain tasks.

Key Points

1Stores rollouts as heterogeneous graph nodes linking queries, thinking processes, and answers with similarity edges
2Trains a GNN on labeled nodes to propagate rewards to unlabeled rollouts for label efficiency
3Achieves ~97% Oracle performance with 20% labels and scales to 99.4% at 70% labels