Load-Dependent Hardness Models Improve Materials Screening

Researchers present a machine learning approach that predicts Vickers hardness as a function of indentation load using a large, curated set of experimental measurements. Models trained directly on experimental, load-dependent Hv outperform approaches that rely on DFT-derived proxies such as bulk and shear moduli, and multi-task models that mix experimental and computed targets. The work demonstrates that explicitly including the indentation load plus compositional, electronic, and structural descriptors is both necessary and sufficient to achieve accurate hardness predictions. The paper highlights the importance of high-quality experimental data and measurement metadata for reliable materials informatics and suggests practical changes for high-throughput screening workflows that currently use DFT-only proxies.
What happened
Authors release a study on predicting Vickers hardness that trains machine learning models on a large, curated corpus of load-dependent experimental measurements. They find a moderate correlation between experimental hardness and DFT-derived hardness proxies, but a single-task ML model trained only on experimental Hv values and measurement load outperforms multi-task models that combine experimental and computed targets. The result shows explicit modeling of indentation load plus materials descriptors is critical for accurate hardness prediction.
Technical details
The paper builds ML models using experimental Vickers hardness as the target and includes explicit measurement-condition metadata. Key modeling choices and inputs include:
- •compositional descriptors
- •structural and electronic descriptors
- •explicit inclusion of the indentation load as a feature
The authors compare performance against DFT-based approaches that use bulk and shear moduli as proxies, and versus multi-task learning formulations that jointly predict experimental and computed hardness. The single-task experimental model consistently yields better predictive accuracy, indicating label-quality and measurement-context dominate performance for this application.
Context and significance
Hardness screening often relies on DFT-accessible elastic moduli because of their availability in high-throughput databases. This work challenges that practice by showing those proxies miss strong load dependence present in real-world Hv data. For practitioners building materials informatics pipelines, the paper underscores that high-quality experimental datasets and explicit measurement metadata can beat larger but context-poor computed datasets for property prediction.
What to watch
Whether the curated dataset or model code are released will influence uptake. Next steps are validation on out-of-distribution chemistries, integration into high-throughput screening workflows, and extending the approach to other load- or condition-dependent mechanical properties.
Scoring Rationale
This is a solid materials-informatics contribution that shifts best practices for hardness prediction by emphasizing measurement context. It is primarily important to materials and ML-for-materials practitioners rather than the broader AI community, hence a mid-tier research score.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

