Nanjing Researchers Publish 1-km Daily Soil Moisture Dataset

A research team led by Nanjing University produced a high-precision, daily, 1 km surface soil moisture dataset for China covering 2000-2025, per a Springer Nature data paper (April 2026). The models were trained on in situ observations from 2,371 automatic stations operated by the China Meteorological Administration, and four machine learning algorithms, including Random Forest, XGBoost, LightGBM, and CatBoost, were evaluated, according to the paper. The authors report that the fused product (referred to as CSMX in press materials) reduces bias relative to the CLDAS reanalysis and lowers root-mean-square error, while SHAP analysis and Recursive Feature Elimination informed interpretability and feature selection (Springer Nature; EurekAlert). Prof. Huiling Yuan is quoted on the dataset's error reduction benefits (EurekAlert). The dataset is publicly available, per the paper and press release.
What happened
A Nanjing University research team released a national-scale daily surface soil moisture dataset at 1 km spatial resolution covering 2000-2025, described in a Springer Nature data paper published April 2026. Per the paper, the authors trained and evaluated four machine learning algorithms (Random Forest, XGBoost, LightGBM, and CatBoost) using in situ measurements from 2,371 automatic soil moisture stations managed by the China Meteorological Administration. The paper reports that the fusion product shows lower root-mean-square error than the CLDAS reanalysis product and substantially reduces the long-standing "wet bias" in reanalysis data, with the fused dataset made publicly available (Springer Nature; EurekAlert). The EurekAlert press release quotes Prof. Huiling Yuan: "Our model significantly reduces soil moisture estimation errors while preserving the temporal evolution characteristics of soil humidity." (EurekAlert).
Technical details
Per the Springer Nature data paper, model-building included automated hyperparameter tuning with Optuna and feature selection via Recursive Feature Elimination, which removed 57% of candidate predictors while maintaining predictive accuracy. The authors applied SHapley Additive exPlanations (SHAP) to attribute predictor importance, finding that high-accuracy soil moisture inputs, terrain and soil static variables, and meteorological predictors were the most influential. The paper frames the fusion approach as transferable to multi-source satellite soil moisture fusion and downscaling tasks (Springer Nature).
Editorial analysis - technical context
Industry-pattern observations: High-resolution, machine-learned soil moisture products typically combine in situ, satellite, and reanalysis sources to trade off spatial detail and temporal continuity. The use of automated hyperparameter tuning and SHAP for interpretability follows a growing standard in environmental ML to improve reproducibility and explain model drivers rather than treating models as black boxes.
Context and significance
Industry context
A contiguous national-scale 1 km daily soil moisture time series spanning multiple decades fills a practical gap for hydrology, drought monitoring, and agricultural applications that require both fine spatial detail and long-term continuity. The reported reduction in bias relative to CLDAS, if sustained across seasons and land-cover types, improves antecedent moisture inputs for hydrologic and land-atmosphere coupling studies (Springer Nature; EurekAlert).
What to watch
For practitioners: independent validation across independent station networks and hydrologic model sensitivity tests will determine operational utility. Observers should watch for documentation of the public data access endpoint, spatial/temporal uncertainty fields, and any follow-on releases extending vertical layers or near-real-time production.
Scoring Rationale
A national, multidecade, daily 1-km soil moisture product is a notable resource for hydrology, drought monitoring, and model development. The work is important for practitioners who need high-resolution surface moisture inputs but is not a frontier-model release, so the impact is notable rather than industry-shaking.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


