Models & Researchtime seriesagriculturebangladeshforecasting

Researchers Benchmark Models on Bangladeshi Agri-Price Dataset

|April 9, 2026

6.5

Relevance Score

A new arXiv paper releases AgriPriceBD, a machine-learning-ready dataset of 1,779 daily retail mid-prices for five Bangladeshi commodities (garlic, chickpea, green chilli, cucumber, sweet pumpkin) covering July 2020-June 2025, extracted via an LLM-assisted digitisation pipeline. The authors benchmark seven forecasting approaches-naïve persistence, SARIMA, Prophet, BiLSTM, Transformer, Time2Vec-enhanced Transformer, and Informer-and use Diebold-Mariano tests for statistical comparison. Key results: forecastability varies by commodity, naïve persistence often outperforms complex models on near-random-walk series; Time2Vec encoding provides no significant improvement and catastrophically degrades performance on green chilli (+146.1% MAE, p<0.001); Prophet fails on discrete step-function price dynamics; Informer yields erratic, high-variance forecasts. All code, models, and data are publicly released to support replication.

What happened

The paper introduces AgriPriceBD, a curated benchmark dataset of 1,779 daily retail mid-prices for five commodities in Bangladesh (garlic, chickpea, green chilli, cucumber, sweet pumpkin) spanning July 2020 to June 2025. The dataset was assembled from government reports using an LLM-assisted digitisation pipeline and is released alongside code and trained models. The authors benchmark seven forecasting methods-naïve persistence, SARIMA, Prophet, BiLSTM, Transformer, Transformer augmented with Time2Vec temporal encoding, and Informer-and apply Diebold-Mariano tests to evaluate statistical significance.

Technical context

Small, noisy market datasets challenge modern time-series architectures that assume abundant training data and smooth dynamics. Agricultural commodity prices in developing economies often exhibit low signal-to-noise ratios, discrete step changes, and idiosyncratic supply shocks. The study explicitly contrasts classical statistical approaches with deep sequence models under these realistic data constraints.

Key details from the paper

Forecastability is heterogeneous across commodities; naïve persistence dominates for near-random-walk series. Time2Vec temporal encoding offered no statistically significant advantage over fixed sinusoidal encoding and produced catastrophic degradation on green chilli (+146.1% MAE, p<0.001). Prophet systematically failed, attributed to its smooth seasonal/trend decomposition being incompatible with the dataset's discrete step-function dynamics. Informer produced erratic predictions with variance up to 50× the ground truth, supporting the authors' conclusion that sparse-attention Transformers require substantially larger training sets than these small agricultural time series provide.

Why practitioners should care

The paper supplies a vetted, publicly available dataset for an underrepresented domain (South Asian food markets) and provides concrete negative results that matter when choosing forecasting pipelines for small, volatile series. Two practical takeaways: (1) baseline persistence or classical models may outperform complex deep models on limited, near-random-walk data; (2) architectural add-ons (Time2Vec, sparse attention) can harm performance if their inductive biases mismatch data characteristics. The release lets practitioners reproduce benchmarks, test custom preprocessing, and probe when modern architectures actually help.

What to watch

Replications on other regional markets, interventions that increase signal (exogenous features, higher-frequency trade data), and methods to regularize attention-based models for small datasets. Also monitor follow-ups validating the LLM-assisted digitisation pipeline for larger historical records.

Scoring Rationale

A public dataset plus careful benchmarking provides actionable guidance for practitioners working on small, noisy economic time series. Findings about model failure modes (Time2Vec, Prophet, Informer) are directly relevant for model selection, but the domain focus and modest dataset size limit industry-defining impact.