Machine Learning Selects Optimal Threads for GEMM

Yufan Xia (arXiv preprint submitted Jan 14, 2026) presents a proof-of-concept ADSALA library that uses an on-the-fly machine learning model to select optimal thread counts for GEMM. Tests on two-socket Intel Cascade Lake and two-socket AMD Zen 3 nodes report 25–40% speedups versus traditional BLAS GEMM for workloads with up to 100 MB memory usage. The approach targets multi-core shared-memory tuning complexity.
Scoring Rationale
Strong cross-architecture ML optimization and actionable speedups, limited by single arXiv preprint validation and 100 MB workload scope.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
