Disaggregated Serving Evaluates KV Cache Transfer Efficiency
A research paper by Jiaxi Li (submitted Nov 14, 2025) systematically benchmarks prefill-decode disaggregation for LLM serving across multiple KV cache transfer media and a colocated baseline. Using GPU profiling and dynamic voltage and frequency scaling (DVFS), the study maps performance-energy Pareto frontiers and compares KV cache reuse and frequency-scaling optimizations. Results show benefits vary with request load and transfer medium, and disaggregation-enabled stage-wise frequency scaling increases energy use.
Key Points
- 1Benchmarks prefill-decode disaggregation across KV cache transfer paths and a colocated serving baseline
- 2Identifies that performance gains depend on request load and KV transfer medium, not guaranteed universally
- 3Recommends practitioners evaluate transfer paths and loads because disaggregation and DVFS may raise energy costs
Scoring Rationale
Comprehensive empirical benchmarking provides high practical value, but it's an arXiv preprint lacking peer-review confirmation.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems