SAIL Optimizes Orpheus-TTS For Higher Throughput

SAIL evaluated the publicly available Orpheus-TTS deployment (served via Baseten) and applied system-level optimizations to characterize and improve real-time inference performance. Baseline sustained about 24 concurrent real-time streams per NVIDIA H100 GPU, and after optimizations sustained 216 streams (~10×), reducing equivalent annual accelerator spend from about $1.4M to $140k for a 100-GPU capacity.
Scoring Rationale
Strong empirical 10× throughput improvement across production-like TTS pipelines; limited by single-source internal evaluation and lack of external replication.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems


