Researchers Benchmark Models on Bangladeshi Agri-Price Dataset

What happened
The paper introduces AgriPriceBD, a curated benchmark dataset of 1,779 daily retail mid-prices for five commodities in Bangladesh (garlic, chickpea, green chilli, cucumber, sweet pumpkin) spanning July 2020 to June 2025. The dataset was assembled from government reports using an LLM-assisted digitisation pipeline and is released alongside code and trained models. The authors benchmark seven forecasting methods—naïve persistence, SARIMA, Prophet, BiLSTM, Transformer, Transformer augmented with Time2Vec temporal encoding, and Informer—and apply Diebold–Mariano tests to evaluate statistical significance.
Technical context
Small, noisy market datasets challenge modern time-series architectures that assume abundant training data and smooth dynamics. Agricultural commodity prices in developing economies often exhibit low signal-to-noise ratios, discrete step changes, and idiosyncratic supply shocks. The study explicitly contrasts classical statistical approaches with deep sequence models under these realistic data constraints.
Key details from the paper
Forecastability is heterogeneous across commodities; naïve persistence dominates for near-random-walk series. Time2Vec temporal encoding offered no statistically significant advantage over fixed sinusoidal encoding and produced catastrophic degradation on green chilli (+146.1% MAE, p<0.001). Prophet systematically failed, attributed to its smooth seasonal/trend decomposition being incompatible with the dataset’s discrete step-function dynamics. Informer produced erratic predictions with variance up to 50× the ground truth, supporting the authors’ conclusion that sparse-attention Transformers require substantially larger training sets than these small agricultural time series provide.
Why practitioners should care
The paper supplies a vetted, publicly available dataset for an underrepresented domain (South Asian food markets) and provides concrete negative results that matter when choosing forecasting pipelines for small, volatile series. Two practical takeaways: (1) baseline persistence or classical models may outperform complex deep models on limited, near-random-walk data; (2) architectural add-ons (Time2Vec, sparse attention) can harm performance if their inductive biases mismatch data characteristics. The release lets practitioners reproduce benchmarks, test custom preprocessing, and probe when modern architectures actually help.
What to watch
Replications on other regional markets, interventions that increase signal (exogenous features, higher-frequency trade data), and methods to regularize attention-based models for small datasets. Also monitor follow-ups validating the LLM-assisted digitisation pipeline for larger historical records.
Scoring Rationale
A public dataset plus careful benchmarking provides actionable guidance for practitioners working on small, noisy economic time series. Findings about model failure modes (Time2Vec, Prophet, Informer) are directly relevant for model selection, but the domain focus and modest dataset size limit industry-defining impact.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.


