Case Studymulti armed banditsthompson samplingonline experiments
DoorDash Adopts Multi‑Armed Bandits For Experimentation
8.1
Relevance Score
DoorDash engineers Caixia Huang and Alex Weinstein adopt a multi-armed bandits (MAB) approach to optimize product experiments, using Thompson sampling to adaptively allocate traffic and reduce opportunity cost. They report MAB accelerates learning and lowers regret compared with fixed-split A/B tests but complicates metric inference and can create inconsistent user experiences. DoorDash plans contextual bandits, Bayesian optimization, and sticky user assignment to mitigate limitations.


