Case Studyragllmenterprise search
Dropbox Uses LLMs To Improve Search
8.2
Relevance Score
Dropbox engineers describe using large language models to amplify human relevance labeling for Dash search, calibrating LLM evaluators against a small human-labeled set to produce hundreds of thousands to millions of labels and amplify human effort roughly 100×. They report the method improves retrieval ranking — the bottleneck in retrieval-augmented generation — by combining automated LLM judgments with human oversight and hardest-mistake analysis.
