Contract2Tool learns tool preconditions and effects

The arXiv paper "Contract2Tool" (arXiv:2606.07904), submitted 5 Jun 2026 by Rahul Suresh Babu and one coauthor, proposes a framework for inferring symbolic tool contracts that encode each tool's preconditions, effects, risk level, and cost, using metadata, schemas, documentation, and execution traces (per the paper). Contract2Tool produces normalized contracts that can be evaluated intrinsically and used inside downstream causal tool filtering. According to the paper, learned contracts preserved most reliability and efficiency benefits of gold contracts: learned-contract CMTF achieves 0.980 downstream success versus 0.990 for gold-contract CMTF, reduces visible tools from 100 to 1, and cuts average token usage from 26,172 to 2,528 relative to exposing all tools (per the paper).
What happened
The arXiv paper Contract2Tool (arXiv:2606.07904), submitted 5 Jun 2026 by Rahul Suresh Babu and one coauthor, presents a method to infer symbolic tool contracts from observable evidence. The contracts capture a tool's preconditions, effects, risk level, and cost, and are designed for use in causal tool filtering (per the paper).
Technical details
Contract2Tool converts evidence sources --- metadata, tool schemas, documentation, and execution traces --- into normalized symbolic contracts that can be evaluated intrinsically and deployed inside downstream filtering. The paper evaluates learned contracts against gold preconditions, effects, and risk labels and measures downstream utility on multi-step agent tasks. According to the paper, learned-contract CMTF achieves 0.980 downstream success compared with 0.990 for gold-contract CMTF, while reducing visible tools from 100 to 1 and lowering average token usage from 26,172 to 2,528 relative to exposing all tools (per the paper).
Editorial analysis - technical context
Learning explicit, symbolic preconditions and effects addresses a common operational gap: tool schemas typically specify call signatures but not whether a tool is causally appropriate or how it transforms task state. Industry-pattern observations suggest that encoding causal applicability and state updates in a compact, machine-evaluable form can reduce spurious tool invocation and excessive prompting cost, which is especially important for multi-step agent workflows.
Context and significance
Industry context: Automated inference of tool contracts targets two practical bottlenecks for tool-augmented agents at scale: the manual cost of writing and maintaining contracts across large or evolving tool ecosystems, and the runtime inefficiency caused by exposing agents to unnecessary tools. The reported near-parity between learned and gold contracts in downstream success indicates that learning from docs and traces may be a viable path to scale reliability improvements without manual annotation at every tool.
What to watch
Observers should look for:
- •replication of results on larger, heterogenous tool suites and in-the-wild agent deployments
- •evaluations of learned risk labels against adversarial or safety-critical scenarios
- •open-source tool contract datasets or benchmarks to standardize comparison across approaches
Scoring Rationale
Automated inference of symbolic tool contracts targets a real reliability and efficiency gap for tool-augmented LLM agents, with reported near-parity to hand-written contracts. It is relevant to teams building multi-step agents, but it is a focused single preprint not yet independently verified, placing it in the solid band.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
