Researchers Propose DRIL Method to Automate Dataset Construction

A new working paper, "Deep Research on a Loop: Using AI Agents to Construct Economic Datasets" (NBER working paper W35188; SSRN abstract 6502499) proposes DRIL, a methodology that uses AI agents to assemble datasets from publicly available sources. Per the paper, DRIL applies a fixed research instrument across a mapped unit space, uses a two-stage architecture separating design from implementation, and enforces an evidence policy plus data-quality mechanisms. In a 2025 update of the Global Tax Expenditures Database for eight Latin American and Caribbean countries, the authors report the run produced 129 sources and 136 evidence records, covering 22 qualitative fields fully and 6 quantitative estimate types with documented gaps, at a cost the paper describes as comparable to a standard LLM subscription or a few hours of research-assistant work (Marginal Revolution; NBER/SSRN). Editorial analysis: This methodology demonstrates a practical agentic pipeline for routine empirical tasks, which could change how researchers budget time and verification effort when assembling primary-source datasets.
What happened
A new research paper titled "Deep Research on a Loop: Using AI Agents to Construct Economic Datasets" (listed on NBER as working paper W35188 and on SSRN as abstract 6502499) introduces DRIL, a methodology that uses AI agents to assemble datasets from publicly available primary sources. Per the paper, DRIL implements a fixed research instrument applied across a mapped unit space (for example, country-year units) and uses a two-stage architecture that separates instrument design from implementation. The instrument specifies variables and coding rules, an evidence policy governs source selection and citation, and explicit data-quality mechanisms track gaps and uncertainty. The authors applied DRIL to a 2025 update of the Global Tax Expenditures Database covering eight Latin American and Caribbean countries; the run produced 129 sources and 136 evidence records, covering 22 qualitative fields fully and 6 quantitative estimate types with documented gaps. The paper reports the run was executed at a cost comparable to a standard LLM subscription or a few hours of research-assistant work (Marginal Revolution; NBER/SSRN). The paper also states, "We argue that even partial automation of dataset construction can shift the production function of empirical economics." (Marginal Revolution).
Editorial analysis - technical context
The DRIL proposal package combines three technical ideas that are already familiar in ML operations and agentic systems: a repeatable, instrument-driven workflow; modular agent roles for search/coding/verification; and explicit provenance and uncertainty-tracking. Industry and academic projects using agentic pipelines typically separate planning and execution to improve reproducibility; DRIL formalizes that separation for empirical-economics datasets. For practitioners: adopting similar architectures generally requires careful prompt/instrument engineering, robust evidence policies to avoid source drift, and tooling to represent partially-complete fields and provenance alongside numeric estimates.
Context and significance
Editorial analysis: Automating parts of primary-source dataset construction matters because dataset assembly is often the most time-consuming, least-scalable component of empirical research. The reported efficiency in the DRIL run-a multi-country update that produced over a hundred evidence records at low marginal cost-illustrates the possible returns to instrumentizing and agentizing routine research tasks. If reproducible, such pipelines could enable more frequent updates, larger cross-unit coverage, and clearer audit trails for dataset provenance. That said, reported performance here is a single application with documented gaps; replication and cross-validation against human-constructed datasets will determine practical reliability and acceptance in the research community (NBER/SSRN; Marginal Revolution; ResearchGate summary).
What to watch
For practitioners: follow several measurable indicators that will determine DRIL-style methods' practical uptake. These include replication studies comparing agent-produced records to human-curated datasets, benchmarks for source-finding recall and coding precision on heterogeneous documents, and tooling that exports machine-readable provenance and uncertainty metadata compatible with common statistical workflows. Also watch for community guidelines or replication-based evaluation frameworks the authors sketch in the paper, which would help standardize comparisons across automated and manual construction efforts (NBER/SSRN; ResearchGate snippet).
Practical implications for teams
Editorial analysis: Labs experimenting with agentic dataset assembly should plan to invest in instrument specification, evidence-policy design, and verification pipelines rather than treating the agent as a single-step replacement for human coders. Teams will need lightweight UIs or notebooks that surface provenance and flagged gaps so domain experts can validate edge cases efficiently. Over time, practitioner workflows that pair agentic collection with targeted human verification are likely to yield the best tradeoffs between scale and trustworthiness.
Scoring Rationale
This is a notable methodology paper that demonstrates a concrete agentic workflow for a high-friction research task. It has practical implications for empirical researchers and tool builders, but the result is currently a single documented application with gaps that require replication.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
