Disaggregated Serving Evaluates KV Cache Transfer Efficiency

A research paper by Jiaxi Li (submitted Nov 14, 2025) systematically benchmarks prefill-decode disaggregation for LLM serving across multiple KV cache transfer media and a colocated baseline. Using GPU profiling and dynamic voltage and frequency scaling (DVFS), the study maps performance-energy Pareto frontiers and compares KV cache reuse and frequency-scaling optimizations. Results show benefits vary with request load and transfer medium, and disaggregation-enabled stage-wise frequency scaling increases energy use.
Scoring Rationale
Comprehensive empirical benchmarking provides high practical value, but it's an arXiv preprint lacking peer-review confirmation.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


