Researchsliding window attentionbalanced alibiropekv cache
SWAT Enhances Sliding Window Attention Efficiency
7.1
Relevance Score.png)
A recent research paper introduces SWAT, a modified sliding-window attention mechanism that replaces softmax with sigmoid and augments attention with balanced ALiBi slopes and RoPE positional rotations to strengthen positional signals. SWAT retains sliding-window efficiency (reducing O(n^2) to O(n·w)), reduces token competition, and claims practical inference benefits such as better KV-cache behavior for long-context models and RAG pipelines.



