Case Studyragllmenterprise search

Dropbox Uses LLMs To Improve Search

|March 7, 2026|By LDS Team

8.2

Relevance Score

Dropbox Uses LLMs To Improve Search — Photo: res.infoq.com · rights & takedowns

Dropbox engineers describe using large language models to amplify human relevance labeling for Dash search, calibrating LLM evaluators against a small human-labeled set to produce hundreds of thousands to millions of labels and amplify human effort roughly 100×. They report the method improves retrieval ranking — the bottleneck in retrieval-augmented generation — by combining automated LLM judgments with human oversight and hardest-mistake analysis.

Key Points

1Amplifies human labeling roughly 100× by letting LLMs generate hundreds of thousands or millions of labels
2Improves RAG output because retrieval ranking quality directly impacts final generated answers
3Enables scalable training data for ranking models, requiring human-calibrated evaluation and hardest-mistake analysis