Alvaro Videla Tests JIT Assistance for LLM Arithmetic

Developer Alvaro Videla's "Rune" project, covered by Hackaday, investigates whether a language model's internal activations can supply arithmetic tool arguments without reading prompt text. Under a strict no-parser rule, Videla built activation-derived readouts to decode operation and operands from Llama's hidden states and route them to Python. The final route achieved near-perfect exact-answer rates on a DeepMind Mathematics Dataset benchmark slice: 100% for gcd, 99.2% for division with remainder, and 98.0% for lcm, gaining 50-97 percentage points over the frozen model's native baseline. A hard-negative suite of 10,200 non-trigger examples produced zero false fires. The original "JIT replacement" goal - writing computed answers back into the model's residual stream - did not succeed in tested forms: residual writes offered no accuracy advantage over simpler token correction and disturbed surrounding generation. Videla concludes that a readable activation variable is not necessarily a writable register.
Background
Standard tool-use routes for arithmetic - such as PAL, ReAct, and Toolformer - parse the prompt text to extract operands and pass them to Python. Developer Alvaro Videla's Rune project asked a narrower question: can those tool arguments come from the model's own internal activations instead, under a strict no-parser boundary? The no-parser rule means the runtime route cannot see prompt text, regex spans, harness operands, or gold answers - only token IDs and activation vectors.
What worked
The activation-derived argument route succeeded in its core claim. Using a frozen Llama model, Videla trained readout probes on layer-22 chunk activations to decode operation type and operands without prompt access. On a filtered DeepMind Mathematics Dataset interpolation split, the route produced exact-answer rates of 100% for gcd (across 1,233 targets), 99.2% for division with remainder, and 98.0% for lcm - gains of 50.2, 81.0, and 96.8 percentage points respectively over the frozen model's unassisted baseline. The result was preregistered on June 2, 2026, with thresholds and operand bounds fixed in advance. A separate hard-negative audit of 10,200 adversarial non-trigger examples produced zero false fires.
What failed
The original "JIT replacement" goal - writing the computed correct answer back into the model's residual stream so it would continue generation naturally - did not hold up. Residual writes showed no accuracy advantage over simpler logit or token correction and disturbed surrounding generation more. Videla frames this sharply: "a readable variable is not necessarily a writable register." Demonstrating that arithmetic structure is encoded in activations is a separate problem from reliably writing a corrected state back.
Scope and limits
The route is Llama-specific. A Qwen operand-localization sweep returned zero recovery across all sampled positions. The supported operand range is integers 0-9999 with at most 12 generated answer tokens. The Hackaday write-up characterized the project as "sort-of worked" and "deemed a failure" - accurate for the JIT replacement half, but understating the activation-derived argument result, which held up under rigorous controls.
For practitioners
The core finding is that arithmetic prompts leave recoverable operation and operand structure in the residual stream, consistent with the Fourier-style helix encoding described by Kantamneni and Tegmark (2025). Practical value is currently limited by model specificity: unlike text parsers, activation routes require per-model localization. Building model-agnostic operand localizers is the meaningful next step for anyone looking to extend this approach.
Scoring Rationale
Rigorous independent interpretability research with preregistration and hard-negative audits demonstrating activation-derived arithmetic argument extraction in Llama. Score pulled from 5.8 to 5.0: the Hackaday framing overstated failure (the activation-derived route genuinely succeeded), but scope is narrow - single model family, integers 0-9999, not peer-reviewed - placing it solidly in niche-research rather than notable territory.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems


