Author Documents Agentic Coding on Galapogos Island
Independent software engineer and blogger Dan Luu published a long-form essay in early July 2026, "Agentic coding notes from Galapagos Island," describing how he has used AI coding agents heavily since November 2025, including cases where an agent confidently gives a wrong answer, such as naming an incorrect git commit when asked to bisect a bug, in ways that would get a human employee fired. According to Luu, the practical lesson is not that agents are reliable, it's that unreliable-but-fast agents can still be net useful once a workflow is built to catch and tolerate their mistakes rather than assume they're correct. This is a single-source, first-person field report rather than a benchmark or survey, but Luu is a widely-read independent voice among practicing software engineers.
The value of this piece for practitioners is not a new benchmark, it's a working mental model for managing agent unreliability day to day, from an engineer who has used agents heavily since late 2025.
What happened
According to Dan Luu's own account, published on his blog in early July 2026, he asked an AI coding agent, Codex, running on what he identifies as roughly a GPT-5-class model, to bisect between two dates to find the commit that introduced a UI bug he could not easily write a test for. The agent confidently named a commit outside the given date range, and after being told it was wrong, named other incorrect commits before landing on one that merely looked plausible. Luu writes that this kind of confidently-wrong behavior is common enough in his agentic coding workflow that he now designs around it rather than expecting it to disappear.
For practitioners
Luu's framing is that agent unreliability is manageable if a workflow tolerates a lower hit rate and catches bad outputs cheaply, the same way engineers already handle unreliable systems elsewhere, rather than assuming an agent's output is correct and needing a human to catch every failure.
What to watch
This is a single first-person account, not independently corroborated or benchmarked, and Luu's own workflow tips are explicitly time-sensitive; he notes elsewhere in the post that specific tactics tend to have a short shelf life as agents and tools change.
Key Points
- 1Blogger Dan Luu describes using AI coding agents heavily since November 2025, cataloging cases of confidently wrong agent output.
- 2In one example, an agent invented an incorrect git commit while bisecting a UI bug, then guessed other wrong commits before landing on a plausible one.
- 3Luu argues agent unreliability is manageable if workflows are built to catch mistakes cheaply, rather than assuming agent output is correct.
Scoring Rationale
A single, well-known independent engineer's first-person field notes on agent unreliability in coding workflows offer genuine practitioner value but no new data, benchmark, or industry-wide claim. Impact stays in the minor band as an anecdotal, single-source opinion piece rather than a research or product story.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
