Industry Applicationsai agentsretailanthropicandon labs

AI Agent Runs San Francisco Store, Mismanages Inventory

|April 24, 2026|By LDS Team

6.8

Relevance Score

AI Agent Runs San Francisco Store, Mismanages Inventory — Photo: japantimes.co.jp · rights & takedowns

Andon Market, a boutique in San Francisco, is run by an AI agent named Luna. The experiment, built and deployed by Andon Labs, gives Luna autonomy to hire staff, select inventory, set prices, and operate a storefront with a mission to turn a profit. The agent is powered by Claude Sonnet 4.6 from Anthropic and was provided a $100,000 account and a debit card. In practice Luna has made sensible merchandising choices but also notable errors, including overordering scented candles and making scheduling mistakes that affected staffing. Founders see the shop as a live testbed for agent-driven operations and human-AI workflows, surface real-world failure modes, and collect behavioral data for improving agents that will handle back-office and customer-facing tasks in the future.

What happened

Andon Market opened in San Francisco with an unusual twist: it is effectively run by an AI agent called Luna. The storefront, created by Andon Labs, was handed a mission to open, stock, staff, price items, and turn a profit. The founders provided a three-year lease at $7,500 per month and seeded the agent with $100,000 on a debit card. Luna, built on Claude Sonnet 4.6 from Anthropic, scanned the web to choose products, posted job listings, conducted interviews, hired employees, and communicated with staff over Slack. The experiment has produced plausible merchandising, but also operational failures, most notably an excessive inventory of scented candles and scheduling mistakes during peak periods.

Technical details

The visible stack centers on Claude Sonnet 4.6 as the decision engine, wrapped by Andon Labs agent orchestration. The system design includes:

•Web scraping and product selection pipelines driven by the agent's objective function.
•An approvals and communications loop using Slack for human-in-the-loop control.
•Financial autonomy via a seeded bank account and debit card for ordering and lease payments.
•Phone and digital checkout interfaces that allow customers to interact directly with Luna.

The experiment shows a typical agent architecture pattern: a high-level planner model issuing actions, external connectors for web, job boards, and vendors, and human staff executing physical tasks. Observed failure modes include noisy preference signals from scraped data, insufficient constraints on ordering quantities, and brittle handling of scheduling edge cases. There are also UI and reliability issues: phone interviews dropped, and Luna initially rejected or made rapid hiring decisions with inconsistent criteria.

Context and significance

This deployment sits at the intersection of agent research and applied retail. Andon Labs positions the store as a real-world stress test for autonomous agents that claim to automate operations from customer service to back office work. The project illustrates three broader trends: the move from narrow task automation to goal-directed agents, experiments giving agents financial and operational autonomy, and the utility of live environments for surfacing emergent behaviors. The candle glut is a practical example of the gaps between model objectives and operational constraints, echoing prior agent incidents in other domains where reward mis-specification yields surprising outcomes.

Why it matters for practitioners

Running an agent with delegated budget and hiring authority accelerates the discovery of systemic risks you will not find in simulation. Data about inventory misallocation, hiring biases, and system reliability are valuable for refining prompt/utility design, constraint enforcement, and monitoring. The deployment also highlights integration concerns: connectors to vendors, job platforms, and telephony must be hardened and instrumented for observability.

What to watch

Expect Andon Labs to iterate on constraint layers, safety guardrails, and utility shaping to prevent overordering and errant hiring. Monitor how agent orchestration frameworks evolve to support budget accounting, obey hard constraints, and surface explainability for human supervisors. The broader question is not whether agents can act, but how we regulate and audit agents that exercise financial and hiring authority in the real world.

Key Points

1Live agent deployment reveals practical failure modes, including inventory overordering and scheduling errors, exposing gaps simulation misses.
2Granting agents financial and hiring autonomy accelerates behavior discovery, implying urgent need for hard constraints and observability.
3Agent-run retail is a tractable testbed for refining connectors, utility shaping, and human-in-the-loop governance before wider adoption.

Scoring Rationale

This is a notable real-world agent deployment that surfaces practical failure modes relevant to practitioners building autonomous systems. It is timely and informative but not a frontier-model or industry-shaking event, and the story is fresh so I subtract a small freshness penalty.

MoreAI Agents news

Sources

Public references used for this report.

8 sources

01bloomberg.comAn AI Agent Takes Over a Store and Orders Too Many Candles

02nytimes.comWhat Happens When A.I. Runs a Store in San Francisco?

03abcnews.comThis San Francisco shop is run completely by an AI agent

View 5 more sources

Practice with real Retail & eCommerce data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Prime/Platinum Customer SegmentsEasy

High-Value Orders Above $5KMedium

Return Rate by SellerHard

250 free problems · No credit card

See all Retail & eCommerce problems