Lyapunov-Guided Self-Alignment Enables Test-Time Safety Adaptation

An arXiv paper and OpenReview entry introduce SAS (Self-Alignment for Safety), a transformer-based method that performs test-time adaptation for offline safe reinforcement learning without parameter updates. Per the arXiv paper (arXiv:2604.26516) and the OpenReview AISTATS 2026 poster, the pretrained agent generates imagined trajectories at test time and selects segments that satisfy a Lyapunov condition; those feasible segments are recycled as in-context prompts so the agent realigns behavior toward safety without retraining. The authors report experiments on Safety Gymnasium and MuJoCo benchmarks showing reduced costs and failures while maintaining or improving return, according to the submitted paper and poster materials. The OpenReview TL;DR emphasizes the same test-time, Lyapunov-conditioned prompting mechanism.
What happened
Per the arXiv submission (arXiv:2604.26516) and the OpenReview AISTATS 2026 poster entry, the authors present SAS (Self-Alignment for Safety), a transformer-based framework for test-time adaptation in offline safe reinforcement learning. The paper reports that, at test time, a pretrained agent imagines multiple trajectories, filters those segments that satisfy a Lyapunov safety condition, and reuses the feasible segments as in-context prompts to steer the agent without any parameter updates. The submission states experiments on Safety Gymnasium and MuJoCo where SAS reduces cost and failure rates while maintaining or improving return.
Technical details
Per the paper, SAS implements self-alignment by converting Lyapunov-guided imagined rollouts into control-invariant prompts fed to a transformer world model. The authors frame the transformer's prompting behavior as admitting a hierarchical RL interpretation, where prompting acts like Bayesian inference over latent skills. The OpenReview summary echoes the same mechanism and emphasizes that the method avoids retraining by relying on in-context adaptation.
Context and significance
Editorial analysis: For practitioners, the combination of Lyapunov stability checks with in-context prompting is notable because it shifts some safety-critical adaptation from weight updates to test-time trajectory selection. Industry-pattern observations: Recent work in sequence-model-based control has trended toward leveraging model rollout and prompting to adapt behavior online, and SAS fits this pattern while explicitly incorporating a Lyapunov constraint for safety.
What to watch
Editorial analysis: Observers should look for code releases, dataset splits, and ablation details that quantify how often the Lyapunov filter rejects imagined segments and how that tradeoff affects return versus safety. Industry context: Replication on additional benchmarks and real-world tasks, and comparisons with established offline-safe-RL baselines, will determine practical utility for deployments that demand low-failure guarantees.
Scoring Rationale
The paper proposes a concrete, replicable method for improving safety in offline RL by combining Lyapunov constraints with transformer in-context prompting. That is a notable technical contribution for practitioners deploying offline RL, but it is a research advance rather than a field-defining paradigm shift.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems


