Continuous-Time RL Establishes Optimal Switching Framework
Xiang Yu (submitted Dec 4, 2025) presents a continuous-time reinforcement learning framework for optimal switching across multiple regimes using entropy-regularized exploratory controls via generator matrices of finite-state Markov chains. The paper proves well-posedness of the coupled HJB system, characterizes optimal policies, establishes policy-improvement and convergence, shows vanishing-temperature value convergence, and proposes a martingale-based RL algorithm validated by neural-network experiments.
Key Points
- 1Characterizes entropy-regularized continuous-time switching through coupled HJB equations and generator-matrix controls.
- 2Proves policy-improvement and convergence results, providing theoretical guarantees for iterative continuous-time RL.
- 3Implements martingale-based policy evaluation with neural-network experiments, enabling a practically deployable RL algorithm.
Scoring Rationale
Strong theoretical guarantees and a practical algorithm drive impact; niche focus on switching and preprint status limit universality.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
