Mira Murati Debuts Thinking Machines Interaction Model

Former OpenAI CTO Mira Murati gave her first major interview since leaving OpenAI to preview what her startup is building, according to Observer and TechCrunch. Per multiple outlets, Thinking Machines Lab has raised $2 billion and previously shipped a developer tool called Tinker (Built In). Murati described a class of "interaction models" that process continuous streams of audio, text, and video in roughly 200-millisecond intervals and collaborate with humans in near real time, per TechCrunch and Wired. Observer reports the model is named TML-Interaction-Small and that Murati said she plans to release it publicly later this year. NextWeb reported a run-time claim of about 0.40 seconds response for the small model and noted recent high-profile researcher departures from the company.
What happened
Mira Murati, the former chief technology officer of OpenAI, re-emerged in a public forum to preview a new family of models from her startup, Thinking Machines Lab, according to reporting by Observer, TechCrunch, Wired, and NextWeb. Observer reports the company has a model named TML-Interaction-Small that Murati said she expects to release publicly later this year. Multiple outlets including Built In and Observer report Thinking Machines raised $2 billion in early funding, and Built In notes the company previously shipped a developer product called Tinker.
Technical details
Per TechCrunch, Wired, and NextWeb, Murati described the new systems as multimodal "interaction models" designed to process continuous streams of audio, text, and video rather than operate in a turn-based prompt-response loop. TechCrunch and NextWeb report the models are engineered to ingest and output in roughly 200-millisecond intervals, a design the coverage frames as aiming for conversational, near-real-time responsiveness. NextWeb reports the company showed a demo in which TML-Interaction-Small responded in about 0.40 seconds, and coverage uses the term "full duplex" to describe continuous two-way input and output.
Editorial analysis - technical context
Companies experimenting with low-latency, continuous multimodal interfaces are pursuing a different product constraint set than large turn-based LLMs. For practitioners: these systems increase emphasis on streaming architectures, efficient on-device or edge serving, incremental state tracking, and latency-optimized model kernels. Industry-pattern observations: teams building similar interfaces typically invest heavily in real-time data pipelines, audio and video front-ends that preserve timing and prosody, and tight model-runtime integration to hit subsecond response targets.
Context and significance
Editorial analysis: The story matters for two reasons. First, the combination of a high-profile founder, a sizable early war chest, and a public demo reintroduces another well-funded entrant into the multimodal model race, a dynamic already noted by TechCrunch and Wired. Second, the interaction-model framing shifts product emphasis from single-turn capability to sustained, context-rich human collaboration, an architectural and UX challenge that could influence how teams design conversational agents and human-in-the-loop systems going forward.
Reported operational notes
Multiple outlets, including NextWeb and TechCrunch, flagged a string of high-profile departures from Thinking Machines, naming founding researchers who left for other tech labs. TechCrunch reports Murati avoided committing to a specific release date during the interview. Wired quotes Murati emphasizing keeping humans in the loop, saying, "At some point we will have super-intelligent machines," and arguing for prolonged human collaboration.
What to watch
Editorial analysis: Observers and practitioners should track three indicators. First, whether Thinking Machines publishes benchmarks or reproducible latency measurements for TML-Interaction-Small. Second, whether the company opens APIs or SDKs to support streaming multimodal inputs and what latency and cost profiles they present. Third, hiring and retention trends among their research staff, which coverage has identified as a visible operational signal. These items will clarify technical maturity and ecosystem impact without ascribing internal intent to the company.
Bottom line
Reporting across Observer, TechCrunch, Wired, NextWeb, and Built In documents a public preview of a low-latency multimodal model from Mira Murati's Thinking Machines Lab, a confirmed $2 billion-backed startup that has shipped Tinker and is now showcasing interaction-focused models. Editorial analysis: For ML engineers and product teams, the practical implications center on streaming data engineering, latency-optimized model serving, and UX patterns for continuous human-AI collaboration.
Scoring Rationale
A well-funded, high-profile founder previewing a new class of low-latency multimodal models is notable for practitioners. The story impacts model design, runtime engineering, and UX patterns, but it is not yet a landmark release because core claims await independent verification.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


