OpenAI Updates Agents SDK, Adds Sandbox
OpenAI expands the Agents SDK with a model-native harness and built-in sandboxed execution to let agents inspect files, run commands, edit code, and perform long-horizon tasks safely. The initial release targets Python, with TypeScript support planned, and exposes primitives for filesystem access, code execution, and progressive disclosure. The update standardizes agent infrastructure for OpenAI models, adds sandbox clients such as UnixLocalSandboxClient, and integrates orchestration features like configurable memory, tool use via MCP, and file edits using apply_patch. The SDK is generally available through the API under standard token and tool-usage pricing. For architects and engineers building autonomous workflows, the change reduces bespoke harness work and raises the baseline for safe local execution.
What happened
OpenAI released a major update to the Agents SDK that introduces a model-native harness and native sandboxed execution so agents can inspect files, run commands, edit code, and complete long-horizon tasks within controlled environments. The new capabilities ship first for Python and expose primitives and orchestration tuned to OpenAI models; TypeScript support is planned. The SDK now supports sandbox clients such as UnixLocalSandboxClient and shows examples using SandboxAgent with gpt-5.4 in the documentation.
Technical details
The update shifts the SDK from a minimal orchestration layer toward a harness that understands model operating patterns and provides safe runtime contexts. Key technical elements include:
- •SandboxAgent, Runner.run, and RunConfig for authoring and executing agent runs inside sandboxes
- •Manifest and LocalDir for declaring controlled workspace mounts and file-access surfaces
- •Sandbox clients like UnixLocalSandboxClient to isolate execution from host systems
- •Built-in primitives for tool use (MCP), progressive disclosure via AGENTS.md, file edits via apply_patch, and shell code execution
- •Configurable memory, durable execution primitives, and hooks for multi-agent orchestration and tracing
The docs include executable Python examples that create ephemeral datarooms, seed files, instantiate a SandboxAgent with explicit instructions, and run comparison tasks using Runner.run. The release notes and coverage indicate the SDK is generally available via the OpenAI API under standard token-and-tool pricing. Additional features, including a dedicated code mode and subagent patterns, are listed as in development.
Context and significance
Agents are evolving from experimental notebooks and ad-hoc orchestrators into platform-grade constructs that need predictable safety and data controls. Model-agnostic frameworks trade off alignment with frontier models, while managed agent APIs constrain where work runs and how sensitive data is accessed. By providing a model-native harness, OpenAI reduces the engineering lift required to exploit frontier model behavior while keeping agents closer to the model's latency and capability assumptions. Native sandboxing addresses a critical pain point for practitioners: executing code and shell commands safely without exposing the host or sensitive datasets. This matters for teams building legal, finance, and enterprise document workflows where agents must read evidence, mutate code, or run analyses under auditability and least-privilege constraints.
What to watch
Track the TypeScript release and the SDKs integrations with cloud storage and durable execution. Pay attention to how teams adopt apply_patch and sandbox clients in production, and whether additional governance controls, logging, and RBAC primitives follow rapidly.
Practical takeaway
If you build autonomous workflows, adopt the new harness to reduce bespoke infrastructure, use the sandbox clients for any code execution, and treat Manifest-driven mounts and AGENTS.md instructions as primary controls for auditability and least privilege.
Scoring Rationale
This is a notable product update that meaningfully reduces engineering friction and increases safety for agent-based systems, impacting practitioners building autonomous workflows. It is not a frontier model release, so its influence is significant but not transformational.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

