Lunette Launches Investigator Platform For Agent Audits
Lunette launches a platform that uses investigator agents to audit AI agents and evaluation environments, available now with the first 40 investigations free. The system re-enters agent environments to run experiments and produces validated findings, which the team used to detect broken tasks in SWE-bench, showing environment access raises issue-detection accuracy from 63% without access to 82% with access. Practitioners can explore results interactively.
Key Points
- 1Launches investigator agents that re-enter environments to run experiments and audit agent failures
- 2Shows environment access reduces confabulation, boosting issue-detection accuracy from 63% without access to 82%
- 3Enables debugging of evals and detection of ill-posed or overly strict SWE-bench tasks for practitioners
Scoring Rationale
Validated product launch offering experiment-driven agent audits; limited novelty since similar debugging tools exist and broader adoption unproven.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems