AI Waves #9: Murati bets against autonomy

When in doubt, financialize

May 16, 2026

May 14, 2026 | Nazaré Ventures

Previous issues: #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8

Murati’s first release

Thinking Machines Lab broke its silence this week. TML-Interaction-Small, released on May 11, is a 276B-parameter MoE with 12B active, trained from scratch, running a real-time loop on 200ms chunks of audio, video, and text, with a separate background model handling slower reasoning and tool use. The model counts pushups from video while you talk to it, translates speech live, and interrupts when it has something to say. Response latency is 0.40 seconds. TML’s own line: no existing model can meaningfully perform any of these tasks.

TML is the most credible post-OpenAI lab to publicly bet against autonomy maximalism. Murati’s pitch (per Semafor), “the way we work with AI matters as much as how smart it is,” is a direct counter to the agents-running-alone framing that has dominated industry discourse for a year. TML quotes a competitor’s model card admitting that interactive use felt too slow and that “autonomous, long-running agent harnesses better elicited the model’s coding capabilities.” They treat that as an indictment of the interface.

This is the argument I made in Much Ado About Autonomy two weeks ago. Autonomy is a design choice the field keeps collapsing into a binary. TML is making the same refusal at the architectural level. Their bitter-lesson move: hand-crafted interaction harnesses (voice-activity detection, turn-boundary prediction, dialog management) are meaningfully less intelligent than the model itself, and will lose to in-weights capability. If the principle generalizes, hand-crafted scaffolding around a model is a temporary intermediate, whether the scaffolding is interaction harnesses or workflow orchestration.

Two months earlier, three AWS researchers (one an Amazon Scholar) published a framework reaching a similar critique from the UX side. Their prescription differs. Amazon proposes a monitoring agent that dials supervision up and down; TML argues that monitoring agent loses to a model that natively perceives the user. Same critique of autonomy maximalism, from opposite ends of the stack. The counter-narrative is forming.

Lazy maximalism is probably wrong in both directions. The apocalyptic version where autonomous agents end work or end the species, and the superabundance version where they hand us a world we never have to touch, are mirror-image fantasies. Some form of collaboration is the likely path forward, at least in the interim. And there are probably two tiers in tandem. Rote automation, the unglamorous interior of most knowledge work, runs largely autonomously and should. The harder problems benefit from more capable agents working alongside humans in real time, because the hardest problems are the ones whose specifications cannot be written down in advance.

The labs buy distribution

Anthropic launched a $1.5 billion joint venture on May 4 with Blackstone, Hellman & Friedman, and Goldman Sachs, plus General Atlantic, Leonard Green, Apollo, GIC, and Sequoia. The vehicle embeds Anthropic engineers inside mid-size businesses starting with the PE firms’ portfolio companies.

A week later OpenAI launched DeployCo with $4 billion from a 19-firm consortium, TPG leading, Advent and Bain Capital and Brookfield co-leading. Bain & Co, Capgemini, and McKinsey signed on as services partners. On day one OpenAI acquired Tomoro and folded in roughly 150 forward-deployed engineers.

The labs are spending billions to be present in the layer between the model and the customer. That is a revealed preference. If general-purpose intelligence captured the value, the labs would not be capitalizing consulting arms and acquiring forward-deployed engineering at this scale. Model quality is necessary but insufficient. Value accrues to whatever sits between the model and the buyer, and the labs want to be there too.

Compute as commodity

CME Group and Silicon Data announced on May 12 the first compute futures market, launching later this year pending regulatory review. Contracts reference Silicon Data’s daily GPU rental indices (SDA100RT, SDH100RT, SDB200RT), already on Bloomberg, built from a proprietary data network covering 80%-plus of the global H100 rental market across 50 countries and 50 to 100 platforms spanning hyperscalers, neoclouds, and compute exchanges. Silicon Data is backed by DRW and Jump Trading. CME’s Terry Duffy called compute “the new oil of the 21st century.” Larry Fink said last week he expects a new asset class to emerge to buy compute futures.

AI data centers have been hard to underwrite because the tenant base is small AI teams whose revenue depends on whether their model works. A futures market lets developers lock in revenue before they know who the tenants will be. Compute becomes financeable the way oil and electricity became financeable: reference pricing, then derivatives, then bank capital. Reference pricing already exists. Derivatives are next.

Two further beats land the same week. Anthropic and SpaceX put 220,000 GPUs on a single bilateral contract. DeepSeek is raising up to $7.35 billion at $45-50 billion led by the China Integrated Circuit Industry Investment Fund, the state vehicle that has financed China’s semiconductor independence push since 2014. Liang Wenfeng, who owns roughly 90% of DeepSeek, is putting up to $2.94 billion of his own into the round. Tencent and Alibaba are in discussions. First outside funding after years of self-funding through Liang’s hedge fund.

Compute is now a derivatives market in Chicago, a bilateral commodity in Memphis, and a state-financed asset class in Hangzhou. The plumbing is being built in three jurisdictions at once.

What We’re Watching

Microsoft is multi-aligning. Satya Nadella testified on May 11 in Musk v. Altman that Microsoft began treating OpenAI as a competitor in 2024 and has since allied with other model labs, including xAI on Azure and Anthropic for specific use cases backed by a $5 billion investment. Discovery showed Nadella worried about OpenAI supplanting Microsoft as early as April 2022, seven months before ChatGPT launched. The hyperscaler that was supposed to own one model now hedges across at least three.

GPT-5.5 Instant. Rolled out on May 5 as ChatGPT’s new default. 81.2 on AIME 2025 (up from 65.4 on GPT-5.3 Instant). 52.5% reduction in hallucinated claims on sensitive domains. Memory sources, the auditable-context feature we covered last week, ships with it.

Portfolio

Memco: new investment, shared memory for agents. Nazaré joined Memco’s pre-seed round. Memco builds the shared memory layer for AI agents. Their first product, Spark, captures real developer experience (intent, failed attempts, eventual fix) and makes it reusable across IDEs, CLIs, and CI pipelines, so the next agent doesn’t rediscover the bug your last agent fixed. Published benchmarks (arXiv 2511.08301) show 40% token reduction and 34% faster execution. CTO Valentin Tablan leads the team with Scott Taylor and Kristoffer Bernhem. Incubated by Moonsong Labs.

Intelligent Internet: Postgres-native retrieval ships. II released psql_bm25s on May 13, an open-source Postgres-native BM25 extension. The thesis: long-running agents lose to retrieval, not context. Most production agent state already lives in Postgres, and existing BM25 extensions are slow enough that harnesses ration retrieval rather than running it on every step. psql_bm25s closes the gap: 4x the Python bm25s reference at the median on BEIR, 7x TensorChord vchord_bm25, 23x ParadeDB pg_search. On MSMARCO it clears 96.7 QPS against pg_search’s 4.4. Built as the storage primitive for Common Ground and II-Commons.

Provably: the verification SDK ships. Provably released the Verifiable Data AgentKit, a Python SDK for detecting hallucinations, verifying agent answers, and stopping bad data from spreading across workflows. Shipped to GitHub on May 7, it works with major agent frameworks including OpenAI, Anthropic, LangChain, LangGraph, LlamaIndex, AutoGen, CrewAI, and Google GenAI. The SDK stops agents from fetching data from unapproved sources, records every response they do fetch, and verifies that what one agent claims it found matches what the source actually returned when work is handed to another agent. That removes the need to re-run source queries just to check an answer, saving time and speeding up verification in both single-agent and multi-agent systems. Walkthrough demo | GitHub.

Dimensional: agentic multi-drone swarms. Dimensional announced agentic multi-drone swarms on DimOS this week (Pomichter on X). Agents coordinate action across multiple robots and write MAVLink-native skills (Arm, Takeoff, Search, Follow, GoTo, DropPayload). DimOS sits underneath as the physical harness and guardrail layer, bounding every action by configurable safety thresholds for altitude, command velocity, and radial velocity.

LayerLens: instrumenting the 12M-token frontier. LayerLens partnered with SubQuadratic on May 14 to continuously evaluate SubQ, SubQuadratic’s long-context model family. SubQ claims a native 12M-token context window, 92.1% NIAH recall at 12M tokens, and a 52.2x prefill speedup over dense attention at 1M tokens via content-dependent Subquadratic Selective Attention. LayerLens Stratix will run continuous benchmarks and publish prompt-level results. SubQuadratic uses Stratix Premium as its primary evaluation platform.

Vast.ai: App Studio launches. Vast shipped the All-in-One App Studio on May 13, a single container packaging eight creative AI tools (ComfyUI, SD Forge, Wan2GP, ACE Step 1.5, Voicebox, Whisper WebUI, Ostris AI Toolkit, Unsloth Studio) alongside a full KDE Plasma desktop and Blender. Tools share model directories, so a LoRA trained in the AI Toolkit is immediately usable in ComfyUI, and a model fine-tuned in Unsloth is reachable via an OpenAI-compatible API. The move takes Vast.ai up the stack from compute exchange to packaged creator workstation. Travis Cannell’s LinkedIn essay the same week named the dual customer thesis directly: agents, not just humans, are the next wave of customers.

Closing

The most credible technical statement of the week is not about better autonomy. It is about better collaboration, from a lab that raised $2 billion to publish a counter-thesis before shipping a product. The labs are buying the layer between the model and the customer. Wall Street is securitizing the compute underneath. State capital is fusing model and silicon into one vehicle. The discipline opened earlier still holds. What is true this week is not what will be true in August. When in doubt, financialize.

Robot Wave

Ready for more?