Artificial Confidence: The Spec for the Agent-Native Cloud, and Who Might Actually Ship It
I tweeted a twelve-point spec for what an agent-native cloud actually needs to look like. Vercel volunteered. Cloudflare's engineers got to work. Here's the test.
I’ve spent a decade watching companies blundering into the discovery that the cloud they built their company on is not the cloud they need anymore, usually at a moment when the bill arrives or the auditor shows up. We’re heading into one of those moments now, and the existing hyperscalers are not going to be the ones who build the cloud that agents actually want to run on.
Last week, Vercel CEO Guillermo Rauch was posting about Grok CLI deploying to Vercel, and I replied that an agent-native cloud platform was coming. It might be Cloudflare, it might be Vercel, and it absolutely wasn’t going to be AWS. Rauch responded inside an hour: “It’ll be ▲. Would love your feedback. This is our primary focus!” That seemed like a sufficiently confident claim to warrant taking him up on it, so I posted a twelve-point thread laying out the spec a serious contender would have to clear:
Cloudflare’s response came from a different altitude: Principal Systems Engineer Sid Chatterjee replied “I saw your list on the thread. We’re on it. Will report back once they’re all in.” Rauch volunteered the company as the test subject; Chatterjee volunteered to do the work and produce the receipts. Both responses are legitimate, but the engineer-level “report back once they’re all in” is the one that the rest of this post is calling for.
Here’s the spec, expanded from the thread, with credit to the practitioners who supplied the parts I missed. It assumes a specific deployment shape: an agent running semi-autonomously, taking actions over minutes or hours, against real infrastructure, with real money attached. If you’re typing prompts and watching every step, you don’t need an agent-native cloud; you need a less hostile CLI. The hard but compelling work is what changes when the agent runs unattended and the platform has to be trustworthy enough that you don’t have to babysit the bill.
Robertus on Twitter put the thesis better than I did: “agent-native cloud needs boring primitives more than magic. identity, permissions, logs, rollback, and cost controls before the sci-fi layer.” The vendors who lose this race build the sci-fi layer first and the boring primitives never. The vendors who win recognize that “boring primitives” is a euphemism for “the hard infrastructure problems that took AWS twenty years to get most of the way through, which is why a clean-slate competitor has a real opening.”
Identity and blast radius
Agents need their own identity. Today every agent action is laundered through the human’s IAM role. The audit log reads “corey@duckbill did this” when the truth is “Claude’s third retry at 2am did this.” That isn’t an audit log so much as it is compliance theater. First-class agent identities have to be scoped, time-limited, and revocable, so that when a postmortem rolls around, the answer to “which agent, what session, what tools, what action” is in the log rather than reconstructed from inference and the meeting notes of whoever was on call that night.
Blast radius as a primitive. “This session may spend up to X dollars, touch up to N resources, in environment Y, expiring in 30 minutes.” Today every agent is either fully privileged or fully fenced off, and the entire interesting design space is in between. Almost nobody is building there, because it requires answering hard questions about resource-graph traversal that AWS has spent a decade pretending IAM was solving.
Secrets brokering. Stop making the agent fish for API keys every time it wants to light up a new service. The platform holds the secret; the agent gets a handle; calls go through the broker. A compromised agent cannot exfiltrate what it never had. This is a solved problem in OAuth flows for human users and a completely unsolved problem for agent-to-agent service calls, mostly because nobody has wanted to solve it.
The money problem
Hard budget caps that actually halt. Not the AWS approach of “we noticed you spent $47,000 yesterday, here’s a CloudWatch email,” which is a postmortem with the dollar amount filled in, not a functioning budget control. Fail closed at the boundary. A Lambda stuck in a loop racking up data transfer charges is a real failure mode and deserves real boundary enforcement, not retroactive grief. The platform that ships caps that actually halt eliminates a category of incident I’ve spent a decade collecting war stories about: agent runs a recursive S3 list against a misconfigured bucket for fourteen hours, discovery happens at invoice time, blame routes to the most junior engineer who touched IAM that quarter.
Cost circuit breakers with human escalation. The agent session has an allotment; when it depletes faster than expected, the platform pages a human to authorize more or kill it. Finding out at the end of the month is how the surprise-bill incidents keep happening.
Cost preview as a first-class API. Before any state-changing call: “this adds approximately $340 per month fixed, plus $0.09 per thousand requests.” Most pricing is usage-based now, so the preview has to model the workload rather than return a single number. Agents are bad at AWS pricing because AWS pricing is bad at being prices. The platform that ships a working cost preview API breaks a fifteen-year stalemate in cloud finops.
The reversibility problem
Gated changes by default. The agent does not mutate production directly. It opens a PR, kicks off an Action, proposes a change that a human or another agent reviews. The pattern is established and agents haven’t started routing around it; the platform’s job is to make it the path of least resistance.
Time travel by default. Every state change is reversible for some defined window. “Roll back the last twenty minutes” is one command, not an archaeological dig through CloudTrail that ends with restoring yesterday’s snapshot and losing four hours of customer data in the process of un-doing the agent’s mistake. This may well be impossible to engineer for anything beyond trivial levels of complexity, but by god, we need a better answer than today’s “hope your backups are ready for an impromptu test!”
Error messages designed for an LLM to act on. Not “AccessDenied: User arn:aws:... not authorized because no identity-based policy allows...” which is technically information but functionally a puzzle the agent will fail at solving. More like: “denied: this agent lacks dynamodb:Query on the ‘users’ table; the owner can grant it at [link].” Errors as instructions, not riddles. The industry-wide bill for inference cost burned decoding AWS error messages is already in the eight figures, distributed across millions of individual agent loops where nobody is going to notice it until someone like me writes a report explaining what they’ve been paying for.
The interface problem
The API has to be consistent. AWS has 347 services, pending an update by AWS Corporate Comms (good job, buddy! You’re making a difference here!), depending on what counts as a service this week and whether you’re counting the ones that have been deprecated but not removed from the console. Roughly 43 of them do approximately the same thing, with bespoke verbs, inconsistent pagination, regional quirks, and conventions that exist because a single Principal Engineer in 2014 had strong opinions that accidentally became load-bearing. Agents inherit this inconsistency tax at a higher rate than humans do, paying it on every retrieval against a token budget that gets spent trying to remember whether this particular service uses NextToken, pageToken, or Marker.
Observability that ties action to reasoning to cost. Not “Lambda X fired” but “agent invoked Lambda X while attempting task Y, prompted by request Z, costing $0.0003 against a $5 session budget.” The AI-native equivalent of dmesg for distributed systems. The vendor that ships this becomes the default observability layer for agentic infrastructure, which is to say, becomes Datadog with a four-year head start. Ideally with a more dignified mascot situation.
Convention over configuration as an iron rule. AWS forces explicit decisions on a thousand things with one obviously-right answer 95% of the time. The agent-native platform should have opinionated defaults; when it does need to ask, ask the human, not flail through alone burning tokens on guesses. Vercel understands this; Framework-defined Infrastructure is exactly that thesis applied to web apps. Whether it generalizes from “deploy a Next.js app” to “operate a stateful multi-agent system” is the question on which the bet rests.
The thirteenth and fourteenth items (which I missed)
Ross Brown replied with what should have been item 13: a universal context injection system that pre-loads agents with the relevant architecture, the active alert posture, and the company policy (”agents may not modify the billing table without dual approval”) rather than relying on stuffing it into a CLAUDE.md and praying. Wire is one of the companies building this; there will be others.
Item 14, which several practitioners flagged: every spec item above is about the build phase, where the agent operates against the platform. The run phase, where what the agent built has to serve real users, is a separate set of problems the platform should solve so the agent doesn’t have to invent them, badly, from a half-remembered StackOverflow post about JWT handling. Authentication, session management, password reset flows, OAuth, MFA. The cleanest vendor example shipping on this axis is exe.dev, which puts an IAM proxy in front of every VM by default: TLS, DNS, and auth handled at the platform layer, not retrofitted into the agent-generated app. A full run-phase spec is its own post, but for now: nobody should be allowed to claim “agent-native cloud” while only solving the build-phase problems, even though the build-phase problems are the ones currently getting all the attention.
Who’s credibly in the race
Within forty-eight hours of the thread, my mentions filled up with founders explaining that they had already built three of these, were working on the next four, were 90% there, were the obvious frontrunner, were the only serious contender, and had also been doing this for years before anyone else noticed. The actual contenders, sorted by capital behind the claim rather than enthusiasm: Vercel and Cloudflare at the top, with meaningfully different architectural bets (Vercel: serverless functions with durable workflows; Cloudflare: stateful Durable Objects where the agent identity is the addressable compute unit); Railway, which raised $100 million in January explicitly for this and whose founder Jake Cooper replied that there’s “a prize on offer worth playing for”; exe.dev on the run-phase auth axis; and then agentuity, islo.dev, hostess.sh, cnap.tech, and a long tail of less-evaluable seed-stage projects.
This spec serve as the obvious set of requirements that follows from the deployment shape, legible to anyone who has spent thirty minutes operating an agent against real infrastructure. The reason a dozen vendors can simultaneously claim “we’re working on this” is that the spec is not a secret. What’s hard is shipping it.
AWS won’t win this race, and it’s not because AWS doesn’t understand the requirements. It’s because the org structure can’t ship them. Twenty years of accumulated surface area, three thousand product managers with stakes in keeping their service distinct, and a billing system designed in 2008 to make it hard to comparison-shop against itself are not fixable from inside AWS. They’re only fixable by starting from a clean slate, which is what the clean-slate vendors are doing.
The Heroku question
Even if a vendor ships the entire spec, do they still lose? The Heroku pattern is the obvious template: great for proofs of concept, fine at modest scale, but the moment the business takes off, someone dispatches Claude Code to migrate the workload to AWS because that’s where the enterprise procurement, the compliance surface area, and the volume discounts live. Replace “place where companies start” with “place where indie devs run their agents” and you have a real risk to the entire thesis.
Here’s why it might not repeat. Heroku’s value proposition was developer experience: git push deploy, automatic Postgres, the Procfile abstraction. Those are workflow primitives, and AWS replicated them well enough to drain the at-scale customer. Amplify, App Runner (deprecated at the end of April, RIP), Lightsail, and Elastic Beanstalk are all “Heroku, but it’s on AWS so your CFO is happy.” None of them excellent; all of them sufficient.
The agent-native cloud’s value proposition is operational, not workflow-oriented. The operational primitives: capability-bounded sessions, time-travel rollback, hard budget caps with first-class agent identity attached. These are architectural primitives that would require AWS to refactor IAM, CloudTrail, and the billing system simultaneously. That’s not a console UI ship but a five-year coordinated rebuild across organizations that have spent twenty years optimizing for incompatible goals. An enterprise that has built operational practice around “my agents have first-class identities and hard budget caps” is migrating away from safety to go to AWS, not toward better economics. That’s a different migration vector than Heroku faced.
The risk is that AWS doesn’t have to ship the primitives well; they only have to ship something procurement will accept as “good enough” alongside existing AWS spend. The bar for keeping an enterprise customer isn’t “match Vercel on agent safety,” it’s “give the CFO a story they can tell the board about consolidating on one vendor.” Bedrock Guardrails today doesn’t clear half the spec: it’s content filtering but not capability-bounded sessions or first-class agent identity. But “doesn’t pass the spec” and “good enough to win the renewal” are different bars, and AWS only has to clear the second.
The realistic call is that the agent-native cloud may end up serving two distinct populations: indie developers and small teams where the platform is the value, and enterprise pilots that eventually migrate to AWS once the workload matters enough that the CFO gets involved. My bet is that operational practice defined at the indie tier propagates up to where the enterprise workloads actually live, because that’s historically how new infrastructure categories have worked. But “eventually” is doing a lot of obnoxiously heavy lifting in that sentence.
What I’ll be watching for
The fourteen items above are the test. Hit them all, and I’ll concede you’ve built an agent-native cloud. Miss them, and you’ve built a marketing page with the word “agentic” in the headline; you can guess what my opinion is gonna be on that.
Vercel has a CEO willing to make a specific public bet on a thread written by a guy whose entire professional brand is calling out bullshit cloud claims. That deserves credit. Cloudflare’s response was different: the engineer who already shipped the most direct attempt at first-class agent identity committed to shipping more of it. That also deserves credit, and now it’s a horse race.
Because neither company has yet shipped a credible answer to the hardest items: blast-radius primitives, capability-bounded sessions, or the cost preview API that would break fifteen years of AWS pricing opacity. Whether they get there before Railway, exe.dev, a startup we haven’t heard of yet, or AWS shipping something passable enough to keep procurement happy is the question that determines who defines operational practice for agent infrastructure over the next several years, even if it doesn’t determine where every at-scale workload eventually runs.
That’s the bet. I’ll be watching the changelogs.




The AWS "solution" to this would be to get a few Solutions Architects to wire up 12 different Lambda functions, four CloudFormation stacks, a marketplace vendor, and wrap it up into a blog post and awslabs GitHub repository.
That said, AWS might still make out here. How many of the solutions are using AWS infrastructure to power what their customers don't see?
How does AWS AgentCore not meet the definition of the right set of primitives for Agentic Engineering?