MemReward Improves RL Fine-Tuning With Graph Rewards

Researchers introduce MemReward, a graph-based experience memory framework that stores LLM rollouts as heterogeneous graph nodes and uses a GNN to propagate sparse reward labels during reinforcement learning fine-tuning; authors submitted the paper to arXiv on March 13, 2026. Experiments on Qwen2.5-3B and 1.5B across mathematics, question answering, and code generation show MemReward reaches 97.3% of Oracle performance with 20% labels on 3B and 96.6% on 1.5B, scaling to 99.4% at 70% labels and surpassing Oracle on out-of-domain tasks.
Scoring Rationale
High novelty and broad applicability across LLM tasks, limited by being an arXiv preprint without peer review.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
