Paper Revisits Neural Quantum States Through Reinforcement Learning
A new arXiv paper (2607.02292), submitted July 2, 2026, reframes training neural quantum states (NQS) as a reinforcement-learning problem and introduces Proximal Wavefunction Optimization (PWO), a trust-region algorithm that clips probability-ratio and phase changes during training. The authors, led by Juan Agustin Duque and Anna Dawid, report that PWO improves stability and wall-clock convergence over Adam, minSR, and SPRING on Ising and frustrated J1-J2 spin systems, and demonstrate the approach by fine-tuning a 1.5B-parameter RWKV-7 model, scaling NQS optimization more than three orders of magnitude beyond prior work. For practitioners in variational quantum Monte Carlo, the method matters because it avoids costly matrix inversion while reusing samples across updates, though independent reproduction will determine how quickly it is adopted.
For researchers and engineers building variational quantum Monte Carlo systems, optimizer choice controls both numerical stability and wall-clock cost; a method that removes expensive linear-algebra bottlenecks while preserving convergence guarantees can change which Hamiltonians and model sizes are practical to study.
What happened
The arXiv paper "One More Time: Revisiting Neural Quantum States from a Reinforcement Learning Perspective" (arXiv:2607.02292), submitted July 2, 2026 by Juan Agustin Duque, Sergio Garcia Heredia, Vinicius Hernandes, Eliska Greplova, Thomas Spriggs, Aaron Courville, and Anna Dawid, frames variational energy minimization for neural quantum states (NQS) as an advantage policy-gradient problem over the Born distribution. The authors introduce Proximal Wavefunction Optimization (PWO), a trust-region algorithm that clips probability-ratio changes in the amplitude channel and phase increments in the phase channel. PWO avoids explicit matrix inversion, reuses samples across multiple updates, and combines first-order scalability with theoretical guarantees, per the paper. Across Ising and frustrated J1-J2 one- and two-dimensional spin systems, the authors report that PWO improves stability and wall-clock convergence over Adam, minSR, and SPRING, and they fine-tune a 1.5B-parameter RWKV-7 model to demonstrate NQS optimization at a scale the paper says is over three orders of magnitude beyond prior work.
Technical context
The paper connects autoregressive NQS architectures, which permit exact independent sampling from the Born distribution, with trust-region policy optimization from reinforcement learning, which constrains large update steps. Framing variational energy minimization as a policy-gradient problem lets the authors apply clipping-based trust regions separately to amplitude probability ratios and phase increments, avoiding the explicit matrix inversions used in stochastic reconfiguration while retaining a geometry-aware constraint on updates.
For practitioners
Teams using NQS with autoregressive samplers should expect lower wall-clock cost and reduced memory pressure from removing matrix inversions and enabling sample reuse across updates. The reported 1.5B-parameter RWKV-7 fine-tune demonstrates feasibility at a new scale but does not by itself establish generalization to other architectures or Hamiltonians, so reproduction on additional systems remains the key open question.
What to watch
Watch for open-source implementations or reproducibility notebooks validating the reported wall-clock and stability gains, for comparisons across a wider range of Hamiltonians and model families, and for whether PWO's clipping hyperparameters transfer across systems without extensive re-tuning, since that will determine practical adoption cost.
Key Points
- 1A new arXiv paper reframes neural quantum state training as reinforcement learning and introduces Proximal Wavefunction Optimization (PWO), a trust-region method.
- 2PWO reportedly improves stability and convergence over Adam, minSR, and SPRING while avoiding costly matrix inversion during training.
- 3The authors fine-tuned a 1.5B-parameter RWKV-7 model to show NQS optimization scaling far beyond prior work, though independent reproduction is still needed.
Scoring Rationale
A verified, methodologically significant arXiv contribution introducing a new trust-region optimizer for neural quantum states and demonstrating fine-tuning at 1.5B parameters, notable for the quantum-ML niche; broader impact still depends on community reproduction and open tooling.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


