Researchlinear rlregret boundsmulti agentsample complexity
LSVI-UCB++ Achieves Gap-Dependent Regret Bounds For Linear-RL
7.1
Relevance Score
Authors present a Feb 2026 arXiv preprint proving the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ in episodic reinforcement learning with linear function approximation. The analysis improves dependencies on feature dimension d and horizon H versus prior results and matches the near-minimax worst-case rate Õ(d sqrt(H^3 K)). They also propose a concurrent multi-agent variant achieving linear agent speedup and a gap-dependent sample complexity bound.


