Automated Research Assessment Reveals Misaligned Funding Criteria

A new arXiv study audits Brazil's Research Productivity (PQ) Grant evaluation using interpretable machine learning applied to CV and OpenAlex bibliometric data. The authors adapt the Boruta feature selection technique and multiple classifiers to treat regulatory criteria as testable hypotheses rather than assumptions. Models reach strong discrimination, achieving AUC 0.96, and reliably identify Level 1A researchers. However, predictive power concentrates in a narrow set of signals: bibliographic output, graduate supervision, and institutional leadership. Several regulation-stated criteria show no detectable statistical contribution, indicating a practical evaluator signal that is far more compact than the formal framework. The paper recommends evidence-based refinement and greater transparency in automated research assessment.
What happened
The paper audits the Brazilian Research Productivity (PQ) Grant evaluation run by CNPq using an interpretable machine learning pipeline applied to researcher CVs and OpenAlex bibliometrics. The authors convert policy dimensions into measurable variables and test each regulatory criterion as an empirical hypothesis. Their models achieve high discrimination, with mean AUC 0.96, while showing that the effective evaluative signal is concentrated in a small feature subset.
Technical details
The study operationalizes dimensions such as bibliographic production, human resource training, and scientific recognition from structured CV fields and OpenAlex records. It uses a block-based adaptation of the Boruta feature selection procedure to probe variable importance across several standard classifiers. The analysis emphasizes interpretability over black-box performance, isolating features that consistently improve classification of grant levels, with a focus on identifying Level 1A researchers.
- •Key predictive features identified: bibliographic production, graduate-level supervision, institutional management roles.
Context and significance
This work reframes evaluation criteria as testable inputs rather than normative axioms, applying interpretable ML to a public funding regime. The high AUC shows that PQ labels carry a robust statistical structure, which is useful for automation and audit. The more important finding is the mismatch: many officially emphasized criteria do not contribute measurably to classification. For practitioners, the paper is a blueprint for evidence-driven audits of research metrics and automated decision systems in science policy.
What to watch
The study raises governance questions for agencies using automated or semi-automated assessment. Follow-ups should test causal links, extend to other funding programs and countries, and evaluate how disclosure of the compact evaluative signal changes applicant behavior and policy design.
Scoring Rationale
This arXiv paper introduces a practical, interpretable ML method to audit research funding criteria, offering actionable insights for policy and assessment practice. It is notable for translating policy into testable hypotheses, but it is not a paradigm shift in ML methodology.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.


