Artificial Confidence

Artificial Confidence: You can’t repossess a download

Corey Quinn — Tue, 23 Jun 2026 15:12:14 GMT

I’m in Indianapolis this week to keynote the AWS Community Day tomorrow morning; come by if you’re in town!

Last week’s thesis was that everything in AI is rented, and rented things get repossessed. This happens via a pricing email, a status page, or apparently a Commerce Department directive that lands at 5:21pm on a Friday and pulls Fable 5 and Mythos 5 offline for everyone because someone failed to genuflect properly to their feudal lord. Eleven days later they’re still dark, and the company says that’ll get fixed “in the coming days.” I am not going to rehash this.

This week’s about what the market did while the frontier sat unplugged and Twitter sat grousing.

What Actually Changed (Adjusted For Spin)

Three open-weight coding models landed since last I (graced|darkened) your inbox. Zhipu’s (ghesundheit) Z.ai shipped GLM 5.2 on June 13, live on its Coding Plan day one, offering a million-token context window, with MIT weights to follow. Moonshot shipped Kimi K2.7-Code on June 12, a trillion-parameter model under a modified MIT license at $0.95 per million input tokens. Cohere’s North Mini Code arrived June 9, Apache 2.0, 30 billion parameters. That last one matters specifically because Cohere is Canadian, which cuts against the ongoing “China bad” narrative. Zhipu, meanwhile, has been on the Commerce Entity List since January 2025: right now the industry cares a hell of a lot more that this model hasn’t been ripped away from them. Now we know that models can be turned off at the will of the US government. For some use cases that’s unacceptable, and as we know by now the internet treats censorship as breakage and routes around it; this is going to lead us to fascinating places.

We should be clear about what this trio of models represents. None of them are Fable-class; that tier is exactly what got pulled, and nothing open touches it—yet! But I ran GLM 5.2 this week, served through Baseten, against the multi-file work I’d normally hand Opus 4.8, and it’s an Opus-class contender: it did the job, it didn’t need four tries, and nobody had to approve my access once I shoved the key into my harness. Fable, for the brief window I had access, was pretty clearly going to be the big, slow, excellent frontier model you reached for occasionally. Opus has positioned itself as the model you reach for all day, and the all-day tier now has an open-weight peer that lives on hardware no angry White House letter can reach.

Follow The Money (It Went To The Landlord)

The capital people noticed before you did. Baseten, the platform I ran that model through, closed a $1.5 billion round yesterday at a valuation of up to $13 billion. “Up to,” because it’s split-priced at $11 billion for some investors and $13 billion for others, because of reasons that aren’t worth going into. That roughly triples its $5 billion mark from January, on something like $600 million of annualized revenue. Baseten doesn’t make a model. It makes the unglamorous layer that turns a free download into something that answers in production, across clouds it doesn’t own. Right now that’s looking like the most fundable pitch in AI: not the model (whose financials are, let’s say... dubious), but rather the place you run the model once you’ve decided it should be something nobody upstream can switch off.

For the record, the company that just had two models repossessed is the one I pay every month, so weigh my enthusiasm for ownable weights accordingly.

One last thing

A rented model can be turned off by someone who isn’t your vendor; we all answer to the sovereign government that controls the places we sit. But weights on your own disk are a lot more durable, and can change jurisdiction very quickly. So the number worth chasing is starting to look less like which hosted frontier tops the leaderboard, but rather what a correct answer costs on the version you host yourself, where “they turned it off” isn’t a meaningful risk factor. We’re building toward a real figure on that. Price your exit before you need it.

See you next week.

— C

No failover for the Commerce Secretary

Corey Quinn — Mon, 15 Jun 2026 14:47:41 GMT

The problem with writing this newsletter is that very often the things that are true when I start writing are no longer true an hour or so later, when I’ve finished writing. Let’s see if I can get this out early, in-flight to the AWS Summit in NY, before the situation changes. If events once again outpace me, it’s imperative that you not email me about it. Let’s race the clock:

I’ve spent many years telling folks that various things (this week’s: AI dependency) distills down to a vendor-risk problem, with knowable failure modes: prices go up, rate limits get chokingly tight, the version or model you’ve standardized on gets Googled, the green of the status page becomes a comforting lie, etc. There are playbooks for all of these, and you can pay your way free of most of them.

On Friday evening we saw a fifth failure mode show up and the playbooks were either lacking or non-existent. The US government, or the closest thing we’ve got in this era, told Anthropic to take its two best models away from foreign nationals, Anthropic (rightly) determined it couldn’t do that selectively without risking jail time for its execs, and so it took them away from everyone. If you were building on Fable 5 at 5:20pm you were not building on Fable at 6:05pm, much to your surprise. There wasn’t a migration window, no announced timeline, just the equivalent “how quickly can we rip the power cord out the back of these GPU rigs like we’re ripstarting some very expensive lawnmowers.”

So this week, the theme is “learning the thing you rent can not just be repriced, but can also in fact be repossessed.”

What Actually Changed (Adjusted For Spin)

Anthropic’s best models went dark by federal letter

Anthropic released Fable 5 and Mythos 5 on June 9, with Fable pitched as the first time the company put a model of that tier in front of the general public. The Commerce Department, in a directive signed by Secretary Lutnick, had some thoughts the next evening at 5:21pm ET, of the form “now you listen here.” It cited national security authorities and ordered access suspended for any foreign national, whether inside or outside the country, including Anthropic’s own foreign-national employees. Because there is no clean and prison-proof way to wall those people off one model at a time, Anthropic disabled both models for every customer to stay compliant. It’s been reported that this is the first time a leading lab has pulled a deployed model offline because the federal government said so, and I have not found a counterexample. Given the reaction we’re seeing, I kinda think I would have heard if it were otherwise.

The stated reason for this is a jailbreak. As per Anthropic, the demonstrated technique amounts to asking the model to read a codebase and fix the flaws in it, which surfaced a few minor vulnerabilities that other public models find too. That seems less a “jailbreak” than it is “the exact capability they’re using to market the model.” In effect, it seems that “being good at the job” is now a reason your vendor can lose the ability to offer it to you.

Anthropic is complying, because prison, while saying that it believes this to be a misunderstanding, CNN, citing Axios, reports the directive would require a license not just for export but for domestic transfer of the models, which, if this remains true, means moving the weights between two datacenters in the same country is now a paperwork event. Yay, paperwork!

Either way, the practitioner takeaway is not “the government overreached” or “Anthropic was reckless.” Pick whichever of those you like at dinner, fine, whatever, I really don’t care that much about the reason. Because the takeaway is that your disaster-recovery plan has a runbook for us-east-1 falling over due to coup or asteroid strike or whatever, and nothing whatsoever for a one-page letter from Commerce, and no amount of multi-region anything fixes that, because the thing that failed was not the infrastructure unless you believe in the 8th layer of the OSI model (politics).

Meanwhile, three coding tools metered the buffet in two weeks

While the dramatic repossession was getting the headlines and Twitter screaming, a far quieter one has been going on relatively stealthily, and it is the one that will actually move your bill. The flat all-you-can-eat seat for agentic coding is being retired across the category, because an agent that runs autonomously for an hour burns serious compute and the flat fee was subsidizing the heavy users out of the light ones’ pockets, as is always the way with subscriptions.

We’ve talked about GitHub Copilot doing this a couple of weeks ago, but I missed that Windsurf became Devin Desktop on June 2 in an over-the-air update from Cognition, with Cascade, the local agent a lot of CI pipelines call by name, going end-of-life July 1. This wasn’t coordinated, so put your tinfoil hats away; it’s an emergence of what we’re seeing industry wide. What it means is “if you have automation that invokes Cascade, that is a serious deadline with your name splattered on it, and “the editor renamed itself overnight and my pipeline broke” is a sentence you would presumably prefer not to say to your team on July 2.

The honest read is that metered pricing is more correct than flat pricing. A heavy user and a light user genuinely do not cost the vendor the same, and pretending otherwise was always a subsidy with an expiration date. The catch is the one every cloud customer already knows in their soul: “you only pay for what you use” is a wonderful promise right up until an agent decides to use rather a lot of it at three in the morning.

Reliability: A Brief Retrospective

Follow The Money (Or Watch It Follow Itself)

SpaceX went public on June 12 under SPCX, raised about $75 billion in the largest IPO ever recorded, priced at $135, and closed its first day up 19% near $161, which put the whole thing somewhere north of two trillion dollars, brushing against Amazon’s own valuation intraday. This is of course an AI story now, because the future is stupid, but also because in February SpaceX absorbed xAI and folded it into an AI division, and that division reported an operating loss of $6.36 billion last year and burned another $2.47 billion in the first quarter of this one. The market looked at a rocket company carrying an AI furnace and apparently decided it was worth more than all but a handful of companies that have ever existed. This seems fine,

Note: only about 4% of the shares are actually trading; insiders are locked up for six months and Musk holds something like 85% of the voting power. So the two-trillion-dollar number you are reading is the price the world is paying for the 4% it is allowed to touch, multiplied across the 96% it is not, which is both a fine way to generate an enormous headline and a poor way to learn what the company is worth. Morningstar’s discounted-cash-flow model lands around $780 billion, and the difference between that and two trillion is the dollar value of believing the burn turns into something. (I am pointedly not mentioning the formal investigations now open in several jurisdictions over Grok, because nothing about the substance of those is a joke and I am not going to treat it as one.)

One last thing

Go look at your architecture diagram and find the box, or be honest, boxes where the model lives. You have probably already drawn the arrows for what happens when it times out, and maybe even what happens when the price changes. Add the case where it is simply gone on a Friday because someone in Washington sent a letter, decide now whether your business survives that week, and price the answer accordingly. The vendors spent two years teaching us to pay only for what we use. This week we learned “…only for as long as we are allowed to use it.”

See you next week, unless events once again outpace me..

— C

Claude Opus / Fable / Shitpost

Corey Quinn — Tue, 09 Jun 2026 21:45:05 GMT

It figures; I whack “Send” and a new model drops. Let’s see here...

In April, Anthropic told the world that Mythos was too dangerous to release. It was so scary! But it’s great. But it’s scary! The fear-based marketing spiel was... a bit much. They built a government consortium around keeping it in a vault, handed it to a hundred and fifty cyberdefenders under the codename Project Glasswing, and said they had no plans to ship it to the public. The model was so good at finding software vulnerabilities that letting strangers use it got framed as a national-security question.

And then two months later they shipped it to anyone with a Pro subscription.

The thing they shipped is called Fable 5, following their naming convention of “types of writing.” One day I look forward to seeing them release Claude Shitpost, but that day is not today. Fable is the same underlying model as Mythos with a layer of classifiers bolted on top that watch what you ask and, roughly one session in twenty, decide you can’t have the good model and route your question to the previous one instead. So the most capable model Anthropic has ever made generally available also ships with an asterisk that occasionally hands you the model it replaces. Let’s talk about the asterisk, because it’s where all the interesting economics live.

The price is Opus, in a hurry

Fable 5 is out today on the API and every paid tier at $10 per million input tokens and $50 per million output. That’s exactly double Opus 4.8’s $5 and $25. It is also, to the dollar, the price of Opus 4.8 in Fast Mode. So the new frontier model costs precisely what the old frontier model costs when you ask it to type faster.

The capability claims are the usual launch buffet, and some of them are even real. Stripe says it compressed months of engineering into days on a fifty-million-line Ruby codebase, which is the kind of specific, customer-attributed claim worth more than ten benchmark charts. It also beat Pokémon FireRed using only screenshots, which is delightful and tells you approximately nothing about your bill.

The classifiers route you to Opus 4.8, and they tell you

Here’s the part to understand before you wire it into anything. Fable ships with classifiers covering three buckets: cybersecurity, biology and chemistry, and distillation, which is their word for people trying to clone the model’s capabilities to train a competitor. Trip one and the response comes from Opus 4.8 instead, with a note telling you it happened.

Anthropic says this fires in under 5% of sessions and that a fallback beats a refusal, which is true. But analyze the economics for a second. You’re paying Fable’s price, $10 and $50. When the classifier fires you get an Opus 4.8 answer, the same Opus you could have bought directly for $5 and $25. So one session in twenty, the safety system’s net effect is to charge you double for the cheaper model, presumably while you watch the little note explain why. The biology and chemistry net is currently cast wide enough that most requests in those areas fall back, so if your work touches a beaker you’re paying frontier rates for the floor below frontier a good deal more than one time in twenty.

This isn’t me objecting to the safeguards. The uplift case for a model that finds zero-days across every major OS is a concern and they’re right to be nervous about it. Rather, it’s an observation about who’s holding the meter when the safeguard does its job, because it feels somehow wrong to refuse to do something but take someone’s money for it anyway.

Thirty-day retention, now mandatory

Anthropic is also requiring 30-day data retention on all Mythos-class traffic, business customers included, across first- and third-party surfaces. They won’t train on it, they’re logging human access to it, they delete it after the window in almost all cases, and we can probably pretend this isn’t happening, right? The stated purpose is catching multi-request attacks and reducing false positives, which is plausible.

It’s also a thing that, until today, your enterprise data agreement may have promised you didn’t have to do. If you’re the person who negotiated zero-retention into a contract so you could put a frontier model in front of regulated workloads, “mandatory 30-day retention on the new model tier” is a sentence you want to read before, rather than after, the migration. Enterprise legal departments will almost certainly scream about this before inevitably signing the contract anyway.

The subscription that comes and goes

Now the part that reads like a hostage note with somehow worse legibility.

Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans through June 22. On June 23 it comes off those plans, and using it requires usage credits. After that, “when sufficient capacity allows,” they aim to put it back into the subscriptions. They’ll extend the included window if capacity allows, and restore standard access as quickly as they can.

Read in order, the offer is: it’s free, then it costs money, then it’s free again, terms and dates subject to how the GPUs are feeling. This is a thirteen-day trial that they’re calling a launch, attached to a model they’re describing as the most capable they’ve ever released. The honest version of the sentence is “we don’t have enough compute to give everyone this model, so we’re going to give it to everyone for two weeks and then take it back,” and to their credit the announcement nearly says exactly that, in the register of a company that would prefer you focus on the giving rather than the taking-back. It pointedly does not address how, if capacity is so dear, they’re able to give it to everyone for those two weeks at launch, when interest and thus demand is clearly spiking.

I read the rollout schedule three times to make sure I had the sequence right. I believe that I did. It’s in, then out, then in, and the only firm date in the whole arrangement is the one where it leaves.

It’s that “demand is hard to predict” and “capacity allows” are included in a launch announcement for a flagship model, which is the AI-industry version of a restaurant putting its best dish on the menu with a footnote reading “if we feel like it.”

So if you’re putting Fable into production, do the boring thing before June 23: figure out which of your workloads trip the classifiers, because those are the ones where you’re paying double for Opus 4.8, and price your fallback path at the model you actually get rather than the one on the label. The capability is, as per early reports, real. The asterisks are painfully real. Plan accordingly.

Artificial Confidence: xAI, the Neocloud

Corey Quinn — Tue, 09 Jun 2026 16:18:20 GMT

There’s an indicator that shows up in an S-1 when a company’s stopped doing the thing it raised money to do, and SpaceX’s prospectus has it. xAI raised the kind of capital you raise to win the AI model race and be the Bestest Boy, but then spent the last week disclosing that its actual business is instead renting GPUs to the companies who’re winning the model race. Meanwhile, Apple conceded it cannot run its own assistant on its own servers and is paying Google a billion dollars a year to borrow one. And Anthropic, which rents xAI’s spare compute to keep Claude running, filed to go public on revenue it won’t show you yet, then watched Claude fall over twice in four days.

So somehow we’ve arrived here, a place where everyone in AI is now renting the one part of the stack they spent three years insisting they had to own themselves.

What Actually Changed (Adjusted For Spin)

Google agreed to pay SpaceX $920 million a month for compute

In a June 5 filing, Google committed to roughly $920 million (what’s a few million here or there between friends?) a month through June 2029 to rent capacity from SpaceX. The world’s largest owner of AI compute cannot build data centers fast enough to keep up with itself, so it is renting them from a rocket company. Totally normal. Very sane. This is all fine.

Apple shipped the betas that let you replace Siri

iOS 27, iPadOS 27, and macOS 27 developer betas landed yesterday with the new Extensions framework: you can now designate Claude, ChatGPT, or Gemini as the default provider for Apple Intelligence features. Apple also deprecated SiriKit in favor of App Intents, the only framework that will talk to the rebuilt assistant because Apple is Very Special. GA in the fall. A real API and availability change, not a keynote promise—but can we really trust Apple’s keynotes after their Apple Intelligence oversteps?

Claude went down. Twice.

June 2 and June 5. Oops. #hugops to them.

xAI Is A Neocloud Now (They Just Can’t Say So)

xAI built Colossus 1, the Memphis data center fueled by gas turbines and bad faith, to train Grok. Then, per SpaceX’s S-1, it decided the smarter move was to rent the whole thing to Anthropic for $1.25 billion a month through 2029, and last week added Google at another $920 million. Combined, that’s about $26 billion a year from compute it bought to do something else. Anthropic raised Claude’s usage limits the day that deal was announced, the clearest sign the constraint was always the compute, never the demand.

That’s the part the prospectus dances around with the same care you’d bring to defusing a landmine. SpaceX calls this a “dual monetization strategy” and says it “allows us to monetize unused compute capacity,” which is the corporate-finance way of describing the spare bedroom you rent out as a “diversified hospitality vertical.” It’s unused because Grok didn’t need it, presumably because most businesses have minimal use cases in the workplace for “revenge porn.” xAI moved its real training to Colossus 2. So model lab is now a landlord, the tenants are the companies beating it, and this timeline remains profoundly stupid.

The counterfactual everyone politely skips is “why not point all that compute at making Grok better.” The answer’s in the public record: the last time Grok made headlines for its capabilities, it was enthusiastically introducing itself as MechaHitler. Thirty billion dollars of GPUs is not, right now, the obvious value-maximizing play, so the GPUs are for rent and the S-1 found a much nicer word for it.

This matters because of what trades Friday: SPCX opens June 12 at a fixed $135, around a $1.75 trillion valuation, roughly 95 times last year’s revenue and the largest IPO ever attempted. A chunk of what’s being sold as frontier-AI upside is, upon inspection, a leasing business. Because nobody can build anything in this industry without sleeping with everybody else, Google (newly signed as a Colossus tenant) was an early SpaceX investor, holds a stake, and has a director on the board. So one of the two customers anchoring xAI’s revenue narrative also profits when that narrative prices the IPO higher. What an economist calls vertical integration, and what everyone else calls a massive conflict of interest.

Everyone Else Is Renting Too

If xAI is the supply side of the great AI sublet, Apple spent yesterday as the demand side. Tim Cook’s last keynote as CEO unveiled a rebuilt Siri that runs on a custom 1.2-trillion-parameter Google Gemini model, for which Apple is reportedly paying around a billion dollars a year (which is, to Apple, chump change). The company that designs its own silicon and built a retail religion on owning the whole stack apparently could not, as it turns out, build the one part that now matters to everyone.

It gets better. Apple originally tried to host the model on Private Cloud Compute and found, per The Information, that a trillion-parameter model ran too slowly at Siri’s scale. So the heaviest queries route to Nvidia B200s, and the contract leans on Nvidia’s on-chip encryption to keep Google from reading them. The “we own the whole stack” company is now shipping as their flagship announcement what is in effect a group project, which feels surreal.

So Claude and ChatGPT lost the “which AI lab does Apple partner with” bake-off, but the consolation prize is arguably better. Extensions let you set them as the default for the rest of Apple Intelligence.

And of course, none of this is free of history. Apple is shipping (in beta) the contextual Siri it promised at WWDC 2024 and didn’t deliver, a gap that cost it a $250 million settlement whose approval hearing lands ~nine days from now. If they got it right this time, then the features will finally arrive, as someone else’s model on rented hardware, two years and a class action later. Which, in the current AI industry, resembles a... passing grade?

Reliability: A Brief Retrospective

Anthropic filed its confidential S-1 on June 1. The next day, Claude went down. Three days later, it went down again, taking claude.ai, the API, Claude Code, and Cowork with it. Oops.

I have a professional interest in Claude staying online (because I am *NOT* going to write IAM policies myself like some kind of agrarian farmer), and watching the tool you use to do your job blink out twice in one week, while its parent is valued like “every power utility combined,” concentrates the mind on the gap between calling yourself a utility and behaving like one. You don’t wonder if water is going to come out of the tap when you turn it on, unless you’ve been vibe-plumbing again.

The customers who ran Claude through Vertex or Bedrock mostly rode it out, which is the lesson nobody selling you a trillion-dollar single point of failure wants underlined. It’s priced as critical infrastructure and yet it’s run, for now, like a startup having a week. If you’re putting it in production, architect for the Tuesday it isn’t there.

One last thing

Every deal this week is a bet that the tokens keep flowing through somebody else’s building. xAI rents out the data center it couldn’t train on, Apple rents the model it couldn’t build, Google rents capacity from a rocket company, and Anthropic rents all of the above, files to go public on the strength of it, and then trips and falls down the availability stairs. A trillion dollars of valuation are currently resting on the premise that inference is something you have to drive somewhere to buy.

A Stanford lab spent last November documenting that the laptop already in your bag handles something like nine of every ten everyday questions on its own, and got five times better at it in two years. Nobody’s neocloud is priced for the day the easy tokens stop making the trip.

So watch the intelligence-per-watt curve, not the IPO calendar. The tokens are already walking home.

See you next week.

— C

Artificial Confidence: GitHub Repriced the Habit it Built

Corey Quinn — Fri, 05 Jun 2026 00:44:00 GMT

I was overcome by events at Microsoft Build + fwd:CloudSec this week, so I’m sending this later than I normally do. I missed you, too. Now then:

GitHub moved Copilot to usage-based billing on June 1, and the folks who responded in the first day are precisely the heavy agentic users GitHub built this pricing structure to find. Meanwhile, back at the ranch, everyone else spent the week reporting run-rates: Cognition annualized a number, Anthropic presumably has one inside a confidential filing, and a small chorus of VCs sang out what I had assumed was already widely known: ARR means “a strong month, multiplied by twelve, assuming number only ever go up.” The net result of this is that the one dollar figure that actually changed got buried under a pile of imaginary ones instead.

What Actually Changed (Adjusted For Spin)

GitHub changes course

GitHub Copilot moved to usage-based billing on June 1. Premium request units are gone, replaced by token-metered “AI Credits,” which while sounding inscrutable, isn’t THAT far removed from the ever-shifting definition of a token. The base subscription prices did not change, which GitHub would very much like you to notice given that they haven’t stopped harping on that particular detail. What did change is that those prices now describe how much you get before the meter starts, which... may not be how many customers would like this story to end.

Two details buried in the notes are significant here. The fallback model is gone, so when your credits run out you no longer downgrade to a cheaper model, you simply stop. It is the first subscription I have encountered that ends mid-sentence, much as I am tempted to do to this one. And a Copilot code review now bills against AI Credits and GitHub Actions minutes at the same time, which feels... unfortunate.

The screenshots going around show projected bills jumping from “$50” to “several thousand,” and the caveat is that those are extrapolations from people a single day into a billing period that has not yet produced a real invoice, with zero changes to their workflows. The funnier and also truer point is who is doing the extrapolating. This may be hard for some folks to hear, but you absolutely do not arrive at a terrifying projected number by accident; you encounter them after the fact, in your bill, when you’re contemplating doing something truly desperate but also cannot afford rope. You get there by being precisely the high-volume agentic user GitHub spent two years encouraging you to become, and then doing the math. The loudest complaints this week are a confession of exactly the usage profile the new pricing was built to locate. GitHub will correctly tell you your bill is atypical compared to a hypothetical spherical cow / customer. They’re learning that “correct” is not the same as “reassuring,” as it’s becoming clear that while customers value transparency, it’s as a means to the end of what they really want: predictability.

Cursor doubled a price and let everyone watch GitHub instead

Composer 2.5’s Fast tier went from $1.50/$7.50 to $3.00/$15.00 per million tokens, a 100% increase that landed two weeks ago and that almost nobody filed as a price hike, because it was positioned as “the more important number to go up is the version number of the model.” It’s interesting, because this is directionally the same move that GitHub made, at roughly the same time, to half the outrage—because a new model number is a press release, while a new price is relegated to a footnote.

And the introductory rates are expiring in a chorus. Composer 2.5’s launch promo ended May 25, Codex Pro’s ended May 31, and the Opus 4.7 multiplier inside Copilot already doubled on April 30. The pattern is now established: ship at a subsidized rate, train the workflow, then let the meter find its level. If you have not locked in your workflow’s economics before the promo expires, surprise! You have some thinking to do.

Follow The Money (Or Watch It Follow Itself)

Cognition raised $1B, and you should read the metrics carefully

Cognition closed a Series D of more than $1 billion at a $26 billion post-money valuation on May 27, up from $10.2 billion eight months earlier. The headline statistic, repeated everywhere, is that 89% of the code committed at Cognition is now written by Devin, the company’s own AI software engineer and is totally not just some guy in a trench coat and a fake moustache.

What this means, filtered through my snarky lens, is that a company that sells an AI software engineer is reporting that its AI software engineer writes most of its software, and offering this as proof the product works. It is the cleanest available example to date of a vendor grading its own homework, on a test it wrote, in a classroom it owns, then issuing a press release about the score. The 89% may be entirely real, but it’s also the least independent benchmark imaginable until next week, when something will no doubt surpass it somehow.

The number that underpins that is revenue that grew from $37 million to $492 million in twelve months, a roughly 53x multiple on the valuation. To Cognition’s credit, they pulled an Andy Jassy and called it run-rate, which is the honest term. The problem comes in the shape of everyone who read “run-rate” and heard “revenue.”

@Lux_Capital, @generalcatalyst, and @8vc.\n\nOur enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492 M.\n\nWe launched Devin two years ago as the first AI software engineer. Since ","username":"cognition","name":"Cognition","profile_image_url":"https://pbs.substack.com/profile_images/1765909640364068865/MvH-m0gd_normal.jpg","date":"2026-05-27T15:39:26.000Z","photos":[{"img_url":"https://pbs.substack.com/media/HJViewebAAE1uVB.jpg","link_url":"https://t.co/k99LLLyWhZ"}],"quoted_tweet":{},"reply_count":165,"retweet_count":200,"like_count":2467,"impression_count":856528,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM">

Everyone’s revenue is a run-rate now, and a few VCs finally said so

ARR used to mean annual recurring revenue: money a customer was contractually obligated to pay you. Think “I have signed up for a one year contract with you at a fixed fee schedule, committing me to pay you $X a month.” In a TechCrunch piece last month, Spellbook CEO Scott Stevenson called the current usage of the term a “scam,” and he is not even slightly wrong about the mechanics. Usage-based billing breaks the “contracted” half of ARR, so a strong month gets annualized—much like a salesperson who will take their best ever month, multiply it by 12, and claim that was their total annual compensation.

The really damning line came from an unnamed investor in the same piece: once one company in a category does it, the rest nearly have to, just to keep pace. That is a prisoner’s dilemma of revenue reporting brought to you by someone funding both prisoners.

Anthropic filed the most confident empty document of the year

Anthropic confidentially filed a draft S-1 on June 1, beating OpenAI to the announcement that carries the least checkable information of any in the AI IPO cycle. OpenAI will likely shortly file an “S-1o” with improved reasoning or whatnot. “Confidential” means, obviously, that we cannot read a word of it. The valuation and the roughly $47 billion run-rate everyone is quoting come from the last funding round and what the company told its investors, not from the audited document that is currently sealed from view. This is the filing that has real numbers of the “make these up and you may well serve prison time” variety.

The Agents Got Expensive. They Did Not Get Safer.

So let’s review the week. The bill for agentic coding went up (GitHub, Cursor). The valuation of agentic coding went up (Cognition, $26 billion). And the independently measured quality of what these agents actually commit did not move because of course it didn’t.

CodeRabbit’s analysis of 470 real-world pull requests found AI-co-authored code introduced up to 2.74x more security vulnerabilities than human-written code. Veracode tested more than 100 models and found 45% of AI-generated samples introduced an OWASP Top 10 vulnerability, a pass rate that has not improved across testing cycles despite a steady stream of vendor claims that the latest model finally fixed it. The security curve (motto: “the one nobody puts on a slide”) is remaining flat, regardless of how the capability advances.

Which puts Cognition’s 89% in a stark light: if your AI writes 89% of your code, and independent measurement says AI-written code ships vulnerabilities at multiples of the human rate, then 89% is less a productivity statistic than it is a description of your attack surface—annualized.

Meanwhile, inference itself keeps getting cheaper. DeepSeek’s V4-Flash runs around $0.14 per million input tokens against GPT-5.5’s $5.00, a gap of roughly 36x on input and north of 100x on output, at comparable performance on many tasks. So the raw cost of intelligence is collapsing in public while the cost of the tools you actually code in went up this week. So the “token cost is shrinking” is offset by “so let’s immediately burn as many as they can in nondeterministic ways in the harnesses.”

One last thing

If you run AI workloads, do the boring thing this week: pull your own usage numbers before the next promo expiry (there’s always another one!), and find out which of your workflows is hanging out in a cohort waiting to be repriced. The vendors already know. Remember: the only number in this entire issue you can fully verify is the one on your own invoice.

See you next week.

— C

Artificial Confidence: Banned by Pentagon, blessed by Pope, paid for by you

Corey Quinn — Tue, 26 May 2026 16:58:53 GMT

Artificial Confidence: Banned by Pentagon, blessed by Pope, paid for by you

I have spent a fairly depressing decade reading AWS bills for a living, and the dependable feature of every quarter remains constant: the vendor’s headline announcements and the customer’s resulting bill invariably have nothing to do with each other. The AI vendors are improving on that formula in realtime, and it’s really something to see.

This week alone: three trillion-dollar IPOs queued in the same fortnight, a $30 billion Series H closing, two cybersecurity products launched in the same news cycle, an SDK supplier acquihired and wound down, a 235-page papal encyclical personally presented by a pope for the first time in modern history, and a new payments protocol with a transaction class called, on purpose, “Human Not Present.”

The press is reading this as a string of capability and capital announcements. Good for them; here in the real world the customers paying these vendors’ bills just lost roughly five assumptions they didn’t know they had. Most of them are only gonna find out from their Q3 invoice, but you’re ahead; count all five below.

What Actually Changed (Adjusted For Spin)

Anthropic bought the company that generated SDKs for OpenAI and Google

Anthropic acquired Stainless on May 18 for a reported $300 million-plus, roughly double Stainless’s December valuation. Stainless powered every official Anthropic SDK, and also generated client libraries for OpenAI, Google, and Cloudflare. Anthropic is winding down all hosted Stainless products which is how we know it’s an acquihire. You assumed the official Claude or OpenAI SDK you pip installed was a first-party product, by which I mean that you didn’t consider where the SDK came from at all, because why would you? And also the vendor never told you. Surprise, that assumption was incorrect. Isn’t it great when we get to learn things together? Instead, it was a third-party deliverable from a vendor Anthropic now owns and is sunsetting. If your stack relied on Stainless’s hosted workflow, you have an engineering project this quarter that you did not have last week, and the vendor management group at OpenAI has the same one. Maybe you can trade tips?

Project Glasswing’s first month found 10,000 critical vulnerabilities

Anthropic published the first numerical results on May 22 from a fascinating project: roughly fifty partners ran Claude Mythos Preview against critical software for a month. The headline was over 10,000 high-or-critical-severity vulnerabilities, including 2,000 at Cloudflare (400 high or critical), 271 patched in Firefox 150 (ten times what a comparable Opus 4 scan found in Firefox 148), and a 90.6% true-positive rate on the open-source subset that external firms verified. You assumed your security budget was priced against the scarcity of capable vulnerability scanners. That scarcity ended on May 22. Tenable, CrowdStrike, and Palo Alto Networks did not price their products against Opus 4.7 inference. They will need to, in the next RFP cycle.

Claude Security launched the same afternoon, helpfully

Anthropic launched Claude Security in public beta on May 22, an Opus 4.7-powered codebase scanner that proposes patches. It had already patched 2,100 enterprise vulnerabilities in the preceding three weeks. The company that announced an industry-wide discovery-rate problem in the morning was selling the cleanup contract by the afternoon, which is a sequencing move that should be appreciated for its craft, if not its decorum. Because that’s more than a bit crass.

OpenAI confidentially filed S-1 the same week its Q1 margins leaked

OpenAI confidentially filed S-1 on Friday May 22 with Goldman Sachs and Morgan Stanley, targeting a Q4 listing at $852B–$1T. The same news cycle, The Information reported that OpenAI generated $5.7 billion in Q1 revenue at a non-GAAP adjusted operating margin of negative 122%, meaning the company lost $1.22 for every dollar of revenue it brought in. ChatGPT weekly actives stalled at 905 million, down from a 920 million February peak; the free-to-paid conversion rate is approximately 6%. What’s that mean for you? Simply that you assumed your OpenAI rate card was sustainable. It is apparently instead subsidized by venture capital at approximately 45% of cost. The S-1 disclosure cycle is the mechanism that’s fated to end the subsidy. This is a data point for the growing thesis that the relatively cheap tokens you are using today are not going to be cheap tokens in eighteen months.

Anthropic’s Series H closing this week at $900B-plus

Bloomberg confirmed Anthropic’s $30 billion Series H closing next week at a pre-money valuation above $900 billion. This is the company’s second $30 billion round of 2026; the February Series G closed at $380 billion post-money. Anthropic’s reported annualized run rate moved from $14 billion in February to $45 billion in early May. Even Anthropic appears to be a little startled by Anthropic. You assumed Anthropic’s pricing discipline was a structural property of the company, the “we’re not OpenAI” pitch made flesh, as it were. At $900 billion-plus closing this week and a Q4 2026 listing reportedly targeted, the same public-market disclosure cycle that is about to retire OpenAI’s rate-card subsidy is now starting at Anthropic. The “disciplined alternative” pitch has an IPO clock on it, and the clock is running.

“Human Not Present” payments became a real schema in a real standards body

Google donated AP2 to the FIDO Alliance and shipped v0.2, introducing autonomous-transaction support that the protocol’s own documentation officially calls “Human Not Present” payments. Sixty partner organizations including PayPal, Mastercard, Visa, and American Express. The payments industry has used “Card Not Present” for two decades as the elevated-fraud category for online card use. “Human Not Present” is the same naming convention, except now the variable that has been removed from the loop is the buyer. You assumed “Human Present” was the only transaction class your e-commerce stack had to support. That assumption retired on the same Tuesday Google announced the Universal Cart.

The Pope Showed Up. The Pentagon Was Conspicuously Absent.

This is the week’s most colorful item and consequently its least practical, which is why I am putting it after the line items rather than ahead of them.

On Monday morning in Rome, Pope Leo XIV personally presented Magnifica Humanitas, his 235-page first encyclical, the first to be personally presented by a pope in modern history. He invited one of Anthropic’s approximately forty co-founders (Chris Olah) to sit next to him and speak as one of five named presenters alongside three cardinals (the religious figures, not the birds; I’m not Simon Willison) and two theologians. The encyclical text criticizes “concentration of power and data in the hands of so few people in the private sector” and does not name a company. The Vatican did not need to. Olah was confirmed by Reuters as the only Big Tech representative invited to the event. Cardinal Czerny, still not a bird, asked whether Anthropic’s reputation as a safety-forward AI company had influenced the Vatican’s decision, said “I’m sure it did,” and then added, “We dialogue with anyone. We don’t endorse.” Endorsement is not the technical term I would reach for either, but the photograph runs on every Catholic news service in five languages.

On March 3, the Pentagon designated Anthropic a national security supply chain risk, the first American company ever to receive a label historically reserved for foreign adversaries, after Anthropic refused to remove guardrails on autonomous-weapons and domestic-surveillance use of Claude. The president ordered federal agencies off Anthropic via Truth Social, his social network that answers the question “what if Twitter were somehow worse.” The legal fight is, of course, ongoing.

You assumed your AI vendor selection was a technical and economic decision. Picking your primary AI vendor is no longer like picking a cloud provider. It is more like picking a defense contractor: the political conditions under which you are permitted to use them are now part of the contract, and the political conditions change with the news cycle. Multi-vendor strategy is no longer a redundancy hedge, but also a political-volatility hedge. The engineering cost of building a vendor-abstraction layer that lets you swap from Anthropic to OpenAI on twelve hours’ notice just became non-optional, and you can put it in next year’s budget under “geopolitical risk.” The trouble is, the models and tooling around them are differentiating, so that twelve hour window is growing longer by the day. Talk to your engineering teams about that.

One last thing

The vendors will continue announcing capabilities and the press will continue covering those capabilities, but the important part for you is that regardless of those, the bills will continue arriving. The vendors and the press will not be on the cc line when you read your bill. Read your rate card this quarter. Understand what it’s telling you, which is harder than it looks. Audit your dependency graph. Assume the political climate around your primary vendor is different next quarter than it is this one.

See you next week.

— C

Artificial Confidence: The Spec for the Agent-Native Cloud, and Who Might Actually Ship It

Corey Quinn — Wed, 20 May 2026 18:09:51 GMT

I’ve spent a decade watching companies blundering into the discovery that the cloud they built their company on is not the cloud they need anymore, usually at a moment when the bill arrives or the auditor shows up. We’re heading into one of those moments now, and the existing hyperscalers are not going to be the ones who build the cloud that agents actually want to run on.

Last week, Vercel CEO Guillermo Rauch was posting about Grok CLI deploying to Vercel, and I replied that an agent-native cloud platform was coming. It might be Cloudflare, it might be Vercel, and it absolutely wasn’t going to be AWS. Rauch responded inside an hour: “It’ll be ▲. Would love your feedback. This is our primary focus!” That seemed like a sufficiently confident claim to warrant taking him up on it, so I posted a twelve-point thread laying out the spec a serious contender would have to clear:

@vercel's CEO replied that it'll be them. Cool! Here's the spec they (or @Cloudflare, or some startup not yet invented) actually have to hit. \n\nIt won't be @awscloud.\n\nThread...","username":"QuinnyPig","name":"Corey Quinn","profile_image_url":"https://pbs.substack.com/profile_images/1840839119037218817/3aPpjjwH_normal.jpg","date":"2026-05-16T03:56:22.000Z","photos":[],"quoted_tweet":{"full_text":"@QuinnyPig It'll be ▲. Would love your feedback. This is our primary focus!","username":"rauchg","name":"Guillermo Rauch","profile_image_url":"https://pbs.substack.com/profile_images/1783856060249595904/8TfcCN0r_normal.jpg"},"reply_count":28,"retweet_count":36,"like_count":438,"impression_count":132661,"expanded_url":null,"video_url":null,"belowTheFold":false}" data-component-name="Twitter2ToDOM">

Cloudflare’s response came from a different altitude: Principal Systems Engineer Sid Chatterjee replied “I saw your list on the thread. We’re on it. Will report back once they’re all in.” Rauch volunteered the company as the test subject; Chatterjee volunteered to do the work and produce the receipts. Both responses are legitimate, but the engineer-level “report back once they’re all in” is the one that the rest of this post is calling for.

Here’s the spec, expanded from the thread, with credit to the practitioners who supplied the parts I missed. It assumes a specific deployment shape: an agent running semi-autonomously, taking actions over minutes or hours, against real infrastructure, with real money attached. If you’re typing prompts and watching every step, you don’t need an agent-native cloud; you need a less hostile CLI. The hard but compelling work is what changes when the agent runs unattended and the platform has to be trustworthy enough that you don’t have to babysit the bill.

Robertus on Twitter put the thesis better than I did: “agent-native cloud needs boring primitives more than magic. identity, permissions, logs, rollback, and cost controls before the sci-fi layer.” The vendors who lose this race build the sci-fi layer first and the boring primitives never. The vendors who win recognize that “boring primitives” is a euphemism for “the hard infrastructure problems that took AWS twenty years to get most of the way through, which is why a clean-slate competitor has a real opening.”

Identity and blast radius

Agents need their own identity. Today every agent action is laundered through the human’s IAM role. The audit log reads “corey@duckbill did this” when the truth is “Claude’s third retry at 2am did this.” That isn’t an audit log so much as it is compliance theater. First-class agent identities have to be scoped, time-limited, and revocable, so that when a postmortem rolls around, the answer to “which agent, what session, what tools, what action” is in the log rather than reconstructed from inference and the meeting notes of whoever was on call that night.

Blast radius as a primitive. “This session may spend up to X dollars, touch up to N resources, in environment Y, expiring in 30 minutes.” Today every agent is either fully privileged or fully fenced off, and the entire interesting design space is in between. Almost nobody is building there, because it requires answering hard questions about resource-graph traversal that AWS has spent a decade pretending IAM was solving.

Secrets brokering. Stop making the agent fish for API keys every time it wants to light up a new service. The platform holds the secret; the agent gets a handle; calls go through the broker. A compromised agent cannot exfiltrate what it never had. This is a solved problem in OAuth flows for human users and a completely unsolved problem for agent-to-agent service calls, mostly because nobody has wanted to solve it.

The money problem

Hard budget caps that actually halt. Not the AWS approach of “we noticed you spent $47,000 yesterday, here’s a CloudWatch email,” which is a postmortem with the dollar amount filled in, not a functioning budget control. Fail closed at the boundary. A Lambda stuck in a loop racking up data transfer charges is a real failure mode and deserves real boundary enforcement, not retroactive grief. The platform that ships caps that actually halt eliminates a category of incident I’ve spent a decade collecting war stories about: agent runs a recursive S3 list against a misconfigured bucket for fourteen hours, discovery happens at invoice time, blame routes to the most junior engineer who touched IAM that quarter.

Cost circuit breakers with human escalation. The agent session has an allotment; when it depletes faster than expected, the platform pages a human to authorize more or kill it. Finding out at the end of the month is how the surprise-bill incidents keep happening.

Cost preview as a first-class API. Before any state-changing call: “this adds approximately $340 per month fixed, plus $0.09 per thousand requests.” Most pricing is usage-based now, so the preview has to model the workload rather than return a single number. Agents are bad at AWS pricing because AWS pricing is bad at being prices. The platform that ships a working cost preview API breaks a fifteen-year stalemate in cloud finops.

The reversibility problem

Gated changes by default. The agent does not mutate production directly. It opens a PR, kicks off an Action, proposes a change that a human or another agent reviews. The pattern is established and agents haven’t started routing around it; the platform’s job is to make it the path of least resistance.

Time travel by default. Every state change is reversible for some defined window. “Roll back the last twenty minutes” is one command, not an archaeological dig through CloudTrail that ends with restoring yesterday’s snapshot and losing four hours of customer data in the process of un-doing the agent’s mistake. This may well be impossible to engineer for anything beyond trivial levels of complexity, but by god, we need a better answer than today’s “hope your backups are ready for an impromptu test!”

Error messages designed for an LLM to act on. Not “AccessDenied: User arn:aws:... not authorized because no identity-based policy allows...” which is technically information but functionally a puzzle the agent will fail at solving. More like: “denied: this agent lacks dynamodb:Query on the ‘users’ table; the owner can grant it at [link].” Errors as instructions, not riddles. The industry-wide bill for inference cost burned decoding AWS error messages is already in the eight figures, distributed across millions of individual agent loops where nobody is going to notice it until someone like me writes a report explaining what they’ve been paying for.

The interface problem

The API has to be consistent. AWS has 347 services, pending an update by AWS Corporate Comms (good job, buddy! You’re making a difference here!), depending on what counts as a service this week and whether you’re counting the ones that have been deprecated but not removed from the console. Roughly 43 of them do approximately the same thing, with bespoke verbs, inconsistent pagination, regional quirks, and conventions that exist because a single Principal Engineer in 2014 had strong opinions that accidentally became load-bearing. Agents inherit this inconsistency tax at a higher rate than humans do, paying it on every retrieval against a token budget that gets spent trying to remember whether this particular service uses NextToken, pageToken, or Marker.

Observability that ties action to reasoning to cost. Not “Lambda X fired” but “agent invoked Lambda X while attempting task Y, prompted by request Z, costing $0.0003 against a $5 session budget.” The AI-native equivalent of dmesg for distributed systems. The vendor that ships this becomes the default observability layer for agentic infrastructure, which is to say, becomes Datadog with a four-year head start. Ideally with a more dignified mascot situation.

Convention over configuration as an iron rule. AWS forces explicit decisions on a thousand things with one obviously-right answer 95% of the time. The agent-native platform should have opinionated defaults; when it does need to ask, ask the human, not flail through alone burning tokens on guesses. Vercel understands this; Framework-defined Infrastructure is exactly that thesis applied to web apps. Whether it generalizes from “deploy a Next.js app” to “operate a stateful multi-agent system” is the question on which the bet rests.

The thirteenth and fourteenth items (which I missed)

Ross Brown replied with what should have been item 13: a universal context injection system that pre-loads agents with the relevant architecture, the active alert posture, and the company policy (”agents may not modify the billing table without dual approval”) rather than relying on stuffing it into a CLAUDE.md and praying. Wire is one of the companies building this; there will be others.

Item 14, which several practitioners flagged: every spec item above is about the build phase, where the agent operates against the platform. The run phase, where what the agent built has to serve real users, is a separate set of problems the platform should solve so the agent doesn’t have to invent them, badly, from a half-remembered StackOverflow post about JWT handling. Authentication, session management, password reset flows, OAuth, MFA. The cleanest vendor example shipping on this axis is exe.dev, which puts an IAM proxy in front of every VM by default: TLS, DNS, and auth handled at the platform layer, not retrofitted into the agent-generated app. A full run-phase spec is its own post, but for now: nobody should be allowed to claim “agent-native cloud” while only solving the build-phase problems, even though the build-phase problems are the ones currently getting all the attention.

Who’s credibly in the race

Within forty-eight hours of the thread, my mentions filled up with founders explaining that they had already built three of these, were working on the next four, were 90% there, were the obvious frontrunner, were the only serious contender, and had also been doing this for years before anyone else noticed. The actual contenders, sorted by capital behind the claim rather than enthusiasm: Vercel and Cloudflare at the top, with meaningfully different architectural bets (Vercel: serverless functions with durable workflows; Cloudflare: stateful Durable Objects where the agent identity is the addressable compute unit); Railway, which raised $100 million in January explicitly for this and whose founder Jake Cooper replied that there’s “a prize on offer worth playing for”; exe.dev on the run-phase auth axis; and then agentuity, islo.dev, hostess.sh, cnap.tech, and a long tail of less-evaluable seed-stage projects.

This spec serve as the obvious set of requirements that follows from the deployment shape, legible to anyone who has spent thirty minutes operating an agent against real infrastructure. The reason a dozen vendors can simultaneously claim “we’re working on this” is that the spec is not a secret. What’s hard is shipping it.

AWS won’t win this race, and it’s not because AWS doesn’t understand the requirements. It’s because the org structure can’t ship them. Twenty years of accumulated surface area, three thousand product managers with stakes in keeping their service distinct, and a billing system designed in 2008 to make it hard to comparison-shop against itself are not fixable from inside AWS. They’re only fixable by starting from a clean slate, which is what the clean-slate vendors are doing.

The Heroku question

Even if a vendor ships the entire spec, do they still lose? The Heroku pattern is the obvious template: great for proofs of concept, fine at modest scale, but the moment the business takes off, someone dispatches Claude Code to migrate the workload to AWS because that’s where the enterprise procurement, the compliance surface area, and the volume discounts live. Replace “place where companies start” with “place where indie devs run their agents” and you have a real risk to the entire thesis.

Here’s why it might not repeat. Heroku’s value proposition was developer experience: git push deploy, automatic Postgres, the Procfile abstraction. Those are workflow primitives, and AWS replicated them well enough to drain the at-scale customer. Amplify, App Runner (deprecated at the end of April, RIP), Lightsail, and Elastic Beanstalk are all “Heroku, but it’s on AWS so your CFO is happy.” None of them excellent; all of them sufficient.

The agent-native cloud’s value proposition is operational, not workflow-oriented. The operational primitives: capability-bounded sessions, time-travel rollback, hard budget caps with first-class agent identity attached. These are architectural primitives that would require AWS to refactor IAM, CloudTrail, and the billing system simultaneously. That’s not a console UI ship but a five-year coordinated rebuild across organizations that have spent twenty years optimizing for incompatible goals. An enterprise that has built operational practice around “my agents have first-class identities and hard budget caps” is migrating away from safety to go to AWS, not toward better economics. That’s a different migration vector than Heroku faced.

The risk is that AWS doesn’t have to ship the primitives well; they only have to ship something procurement will accept as “good enough” alongside existing AWS spend. The bar for keeping an enterprise customer isn’t “match Vercel on agent safety,” it’s “give the CFO a story they can tell the board about consolidating on one vendor.” Bedrock Guardrails today doesn’t clear half the spec: it’s content filtering but not capability-bounded sessions or first-class agent identity. But “doesn’t pass the spec” and “good enough to win the renewal” are different bars, and AWS only has to clear the second.

The realistic call is that the agent-native cloud may end up serving two distinct populations: indie developers and small teams where the platform is the value, and enterprise pilots that eventually migrate to AWS once the workload matters enough that the CFO gets involved. My bet is that operational practice defined at the indie tier propagates up to where the enterprise workloads actually live, because that’s historically how new infrastructure categories have worked. But “eventually” is doing a lot of obnoxiously heavy lifting in that sentence.

What I’ll be watching for

The fourteen items above are the test. Hit them all, and I’ll concede you’ve built an agent-native cloud. Miss them, and you’ve built a marketing page with the word “agentic” in the headline; you can guess what my opinion is gonna be on that.

Vercel has a CEO willing to make a specific public bet on a thread written by a guy whose entire professional brand is calling out bullshit cloud claims. That deserves credit. Cloudflare’s response was different: the engineer who already shipped the most direct attempt at first-class agent identity committed to shipping more of it. That also deserves credit, and now it’s a horse race.

Because neither company has yet shipped a credible answer to the hardest items: blast-radius primitives, capability-bounded sessions, or the cost preview API that would break fifteen years of AWS pricing opacity. Whether they get there before Railway, exe.dev, a startup we haven’t heard of yet, or AWS shipping something passable enough to keep procurement happy is the question that determines who defines operational practice for agent infrastructure over the next several years, even if it doesn’t determine where every at-scale workload eventually runs.

That’s the bet. I’ll be watching the changelogs.

Artificial Confidence #2: The week AI labs became Palantir

Corey Quinn — Tue, 19 May 2026 17:41:38 GMT

There’s been a shift: the model layer used to be the prize, yet this week the AI industry quietly conceded there’s a great chance that it might be the loss leader. Anthropic launched a $1.5 billion enterprise services joint venture with Blackstone, Hellman & Friedman, and Goldman Sachs on May 4. Seven days later, OpenAI countered with a $4 billion subsidiary called DeployCo, valued internally at $14 billion, anchored by 150 forward-deployed engineers (this cycle’s Hot New Job Title for “engineers who don’t embarrass themselves trying to hold a conversation”) acquired from a London consultancy you have not previously heard of. Three management consulting firms, Bain & Company, Capgemini, and McKinsey, wrote checks into the entity that seems explicitly designed to replace them. Either they’re hedging against their own obsolescence, or they would simply like to stop doing PowerPoints; both readings are supported by the press release. I too would like to stop doing PowerPoints, but that’s the job sometimes.

The week’s other money headlines line up underneath this thesis. If I can stuff them into one sentence like a flailing Labrador into an apartment bathtub: Cerebras traded, Anthropic floated $900 billion, OpenAI capped Microsoft’s revenue share at $38 billion, GitHub Copilot conceded its subscription pricing was never going to survive contact with agentic workloads, and Vercel published seven months of production data showing exactly why. That’s great, but to see most of what actually shipped, you had to read past the IPO frenzy to find. And that, dear reader, is why I’m writing this.

What Actually Changed (Adjusted For Spin)

Anthropic launched Claude Platform on AWS, demoting Bedrock to “legacy” in its own docs

On May 11, Anthropic announced general availability of Claude Platform on AWS, a direct Anthropic API surface accessed through an AWS account. Authentication is IAM. Billing is through AWS Marketplace. Audit goes through CloudTrail. AWS is reduced to authentication, audit, and billing, which they do well—but none have anything to do with running the actual product. The hyperscaler has been demoted to “being Stripe, if Stripe’s UX was absolute dogshit.” Note: the inference runs on Anthropic-managed infrastructure outside the AWS security boundary, which is the sort of sentence that would have made a compliance officer eat a stapler in 2024 and is now a marketing bullet that the compliance officer is debating eating instead.

The framing AWS chose is “additive.” The framing Anthropic’s own docs chose is less beneficial to AWS, and consequently more honest. The existing Bedrock integration is now labeled “legacy Amazon Bedrock integration.” Claude Opus 4.7, Anthropic’s current flagship, does not have an ARN-versioned model ID on the legacy interface at all, which is the corporate equivalent of moving someone into a smaller office next to the printer spewing toner into the air and forgetting to give them the new keycard.

I clock two details the press release did not lead with. First, migrating from Bedrock changes the SigV4 signing context, the base URL, the API format, the model IDs, the SDK client, the streaming format, the request headers, and the region availability, with an implicit customer message of “good luck, asspony.” Eight independent changes is not a “migration,” it’s a goddamned rewrite. Second, negotiated discounts and AWS Marketplace private offers do not transfer automatically between Bedrock and Claude Platform on AWS. Translation: if you spent a quarter negotiating your Bedrock pricing, you get to spend another quarter negotiating again, and your existing EDP commit is not portable. For added fun, the AWS-side pricing structure on this brings new meaning to “absurd:”

Usage is denominated in Claude Consumption Units (CCUs) at $0.01 USD per CCU. The CCU price is fixed and never discounted. Anthropic rates your token usage in USD at standard per-model, per-feature rates, applies any negotiated discount, then converts the result to CCUs at $0.01 per CCU. Discounts result in fewer CCUs metered, not a lower CCU price. CCUs are not prepaid credits; there is no CCU balance or commitment.

That’s a lot of words to say “you will not know what any of this costs until the bill shows up.”

GitHub Copilot is moving to usage-based billing June 1, and the multipliers portend darkness

GitHub announced that on June 1, all Copilot plans transition from premium request units to GitHub AI Credits, where one AI Credit equals one cent. Token consumption gets billed at published API rates, base subscription prices stay the same, code completions stay unlimited, and everything agentic gets metered.

The honest version of why came from GitHub’s own chief product officer, Mario Rodriguez: “a quick chat question and a multi-hour autonomous coding session can cost the user the same amount.” It’s the kind of sentence you give a Senate committee when you have stopped trying to pretend the previous answer made sense; you bury the implementation in complexity that the senator from Pennsyltucky has no hope of answering, since they used to be a surgeon instead of a cloud economist.

It still doesn’t make a whole ton of sense as to why, so we go deeper to the most useful disclosure: the annual-plan model multiplier table. For annual subscribers who stay on premium request billing after June 1, the Claude Opus 4.7 multiplier moves from 7.5x to 27x. The GPT-5.4 multiplier moves from 1x to 6x. GPT-4.1, previously a free 0x model under paid plans, is being pulled from the free tier entirely. The “prices aren’t changing” framing is technically accurate and also wildly misleading. The math under the prices changed by between four and six times depending on which frontier model you use. This is GitHub negotiating a contract cancellation through pricing. Annual subscribers running Opus 4.7 in agentic mode are being told, in the language of multipliers, that their existing contract no longer makes economic sense for GitHub, and would they please consider switching to monthly. Anyone who has ever taken a corporate “voluntary” buyout will recognize the structure.

If your team was paying $10 a month for Copilot Pro and burning Opus 4.7 in agentic mode, the unit economics you have been depending on were never real. The bill that arrives in July is the first one that reflects what your usage actually costs to serve, and a whole lot of folks are very much not going to enjoy this experience.

OpenAI deleted DALL·E 2 and DALL·E 3 from the API on May 12

Not deprecated like an AWS deprecation, deprecated like a Google deprecation: completely removed. The DALL·E 2 and DALL·E 3 model snapshots are no longer available through the OpenAI API as of May 12. The Realtime API Beta got the same “Old Yeller” treatment the same day. Your migration paths are gpt-image-2, gpt-image-1, gpt-image-1-mini, and the GA Realtime API respectively. Honestly, the hardest part of modern AI is teasing meaning from the model strings. Christ, I never thought I’d be nostalgic for AWS’s crap-ass naming “strategy.” Sure, Amazon DocumentDB (with MongoDB Compatibility) is a bad name, but at least you knew what the hell it was for.

If you’re one of the developers using the public OpenAI Image API in 2024, and you were not paying attention to the deprecation calendar, congratulations: you shipped a broken product last Tuesday. I am old; one of the things I always appreciated about AWS is that it’s vanishingly rare where their deprecations mean a thing that worked last week is broken this week. Meanwhile, that’s kinda the lived experience of being a Google customer. You get used to rapid change, invariably by surprise.

IBM announced Red Hat AI Inference on IBM Cloud, GA May 22

On the other end of the change continuum, over in IBM-land they’re launching a serverless inference API too. Powered by vLLM, OpenAI-compatible API, the catalog includes Granite, Mistral-Small-3.2, Llama 3.3 70B Instruct, GPT-OSS-120B, and Nemotron-3-Nano-30B-FP8... If that doesn’t sound painful enough, it’s billed through IBM Cloud IAM. This is Bedrock and Vertex and AI Foundry, only with an IBM logo on it. Every hyperscaler and also IBM now sells inference-with-IAM as the product. The audit trail, business relationships, and significant install base comprising various forms of hostagetaking is the new moat.

Follow The Money (Or Watch It Follow Itself)

Cerebras traded, priced at 111x trailing revenue, then dropped 10%

Cerebras Systems priced at $185 on May 13, sold 30 million Class A shares, and raised $5.55 billion. Shares opened at $350 on May 14, peaked at $385, closed the first day at $311.07, and dropped 10% on Friday. The marketed range moved from $115–125 to $150–160 to $185 in the days before pricing, which is what happens during a roadshow when you can feel and also unfortunately smell the room. At the IPO price, the implied fully diluted valuation was $56.4 billion. At the day-one peak, it was north of $120 billion.

You’ll have to forgive me, but I’m from the 1900s, an era where it seems money meant something different than it does today. Cerebras’s FY25 revenue was $510 million, up 76% from $290 million the year before. At the IPO price, that valued the company at roughly 111 times trailing revenue. At the day-one close, more than 180 times. The Cerebras pitch is that this is not a normal chip company being valued like a chip company, it is scarce AI infrastructure being valued like scarce AI infrastructure, and the difference is the entire bull case. The Cerebras bull case is that we will, in fact, never have enough compute. The Cerebras bear case is that we will. Both cases were priced at $185 a share.

Vercel published April production data, and the labs are not competing on the same axis

On May 12, Vercel published seven months of AI Gateway production data covering more than 200,000 unique teams. The headline numbers for April: Anthropic took 61% of spend on 26% of token volume. Google took 21% of spend on 38% of volume. OpenAI took 12% of spend on 13% of volume, with spend share roughly tripling between March and April after the GPT-5.4 and GPT-5.5 releases. The labs are not competing for the same call. Anthropic is winning the high-stakes layer. Google is winning the high-volume-low-cost layer, which was basically the only pitch that Amazon’s Nova models had. And OpenAI is winning whatever it just shipped last week. Vercel’s own framing is that “spend follows the cost of being wrong”, which is the cleanest one-line summary of inference economics anyone has shipped this year.

The same dataset shows 22.2% of AI Gateway requests in April ended with a tool call, carrying 58.9% of total token volume. The agentic share roughly doubled from October. The cost surface of production AI is now shaped like an agent, not a chat, and at the top of the request-volume curve, the average team is routing across 35 distinct models. The standard story about lab lock-in inverts the higher you go on the curve. Lab lock-in is a sales pitch. Routing graphs are infrastructure.

Vercel has skin in this game; the AI Gateway is their pitch to be the routing layer between those workloads and the labs, and “look at the multi-model fleets at the top of the curve” is exactly the argument a company selling routing infrastructure would want to make. The data is still the cleanest production-traffic read anyone has published, in part because nobody else with the volume has been editorially willing. I’m hesitant to overindex on their list of top providers, because unless customers override it the default selection is “whatever Vercel wants to use.” One wonders if they’re making these decisions based on their own commercial terms with various inference providers.

Anthropic’s valuation cycle is now playing speed chess

In February, Anthropic closed its Series G at a $380 billion post-money valuation on a $30 billion raise led by GIC and Coatue (gesundheit). On April 29, TechCrunch reported Anthropic had received preemptive offers at $800–900 billion and was sizing a $40–50 billion round. On May 12, Bloomberg reported the talks had crystallized into “at least $30 billion” at “more than $900 billion” pre-money, closing by the end of this month.

Four numbers, three weeks, one company. What’s interesting is that each leak walked the prior number in a particular direction. The raise size started at $40–50 billion (April 29), then narrowed to $30 billion (May 12). The valuation floor went from $800 billion (April 29) to $900 billion (May 12). The pattern is what it looks like: a series of trial balloons sized to discover the elastic limit of the room. Compare to the same company at $61.5 billion in March 2025 and you have a roughly fifteen-fold private-market valuation move in fourteen months, which is statistically indistinguishable from a meme stock with a science publication.

The (Reported! We have nothing concrete!) revenue figure has done its own dance. End of 2025: $9 billion. Mid-February: $14 billion. Late February: $19 billion. April 7: $30 billion, per CFO Krishna Rao. April 29: TechCrunch sources said “closer to $40 billion.” Some mid-May reports cite $44 billion. OpenAI, which has its own reasons to argue this, maintains the $30 billion figure is overstated by approximately $8 billion on a gross-versus-net cloud revenue accounting argument, which would make the comparable number $22 billion. If you would like to know Anthropic’s annualized revenue today, please specify a date, a source, and which Magic 8-ball you consulted. They will not agree, and neither will Anthropic.

The growth itself is clearly real; I mean, eight of the Fortune 10 are paying customers (who the hell are the two holdouts, and can we talk?). One thousand of their customers spend over $1 million a year (theoretically on purpose), doubled from the February disclosure. Claude Code reached $2.5 billion in run-rate revenue within nine months of public launch. I want to be clear here: I’m not an idiot who denies reality, and I don’t have an agenda. My skepticism isn’t around whether the revenue exists, but rather which number, on which day, with which accounting treatment, ends up in the S-1.

OpenAI capped Microsoft’s revenue share at $38B, $54B below Microsoft’s planning target

The Information reported on May 11 that OpenAI and Microsoft agreed to cap total revenue-sharing payments at $38 billion, which is coincidentally how much the first publicly announced OpenAI AWS deal was worth in November, so maybe that’s the default amount in OpenAI’s QuickBooks installation or something. Microsoft had internally been targeting approximately $92 billion in returns from its OpenAI stake, per planning documents disclosed in the Musk v. Altman trial. The cap takes roughly $54 billion off Microsoft’s modeling and puts it back on OpenAI’s side of the table, which is exactly the kind of number you want to wave at IPO bookbuilders. Meanwhile Microsoft will presumably make up the shortfall and then some by putting ads into the GitHub service outage notifications.

In the same renegotiation, Microsoft’s license to OpenAI models was extended to 2032 but also made non-exclusive. OpenAI can now serve all its products across any cloud provider. Microsoft’s previous revenue share to OpenAI was eliminated, leaving the cash flow one-directional. Read together, this is the contractual end of OpenAI’s Azure-exclusive era, made just visible enough that a public-market investor reading the S-1 will not accidentally believe the words “strategic partnership” mean anything specific.

OpenAI launched DeployCo, raised $4B, and bought 150 Palantir-style engineers

OpenAI’s corporate ADHD struck again as they announced DeployCo on May 11, a majority-owned subsidiary capitalized with more than $4 billion from nineteen investors led by TPG. The implied valuation reported by Axios is $14 billion, which is the number you produce by assuming a consulting practice that has existed for one day will scale faster than every consulting practice that has ever existed in the history of the world. The investor structure reportedly includes a 17.5% guaranteed return, which is the rate at which OpenAI has chosen to borrow $4 billion while calling the borrowing equity. That’s a similar guaranteed rate of return to that of many crypto emails lurking in my spam folder from 2019.

Three management consulting firms wrote checks: Bain & Company, Capgemini, and McKinsey. My snark aside, companies are generally not run by idiots. Therefore, the polite reading is they are buying option value on the disruption of their own business. The less polite but spot-on reading is they have correctly priced the future cost of saying no.

Concurrent with the launch, OpenAI agreed to acquire Tomoro, an Edinburgh-and-London consultancy founded in 2023 in alliance with OpenAI, employing approximately 150 forward-deployed engineers. At a $14 billion unit valuation, those engineers are valued at approximately $93 million per head, which is generous even by 2026 AI hiring standards. Anthropic shipped the same play seven days earlier with Blackstone, Hellman & Friedman, and Goldman Sachs on a $1.5 billion joint venture. Both labs have now formally conceded that the company that sells the model is not necessarily the company that captures the margin on its deployment; these are likely the early days of the frontier labs devouring their own ecosystems as pressure to show revenue builds.

The deeper reason I suspect drives these moves is that token revenue has structural unit economics problems that a public market analyst will, sooner rather than later, notice. Consultant time does not. Forward-deployed engineering is a revenue line that gets booked in dollars, not in inference losses, and converts cleanly to a chart that ends with the line going up. The labs aren’t pivoting to consulting because consulting is a great business. They’re pivoting to consulting because consulting is the only revenue line on the deck that does not require a footnote.

Reliability: A Brief Retrospective

The Claude status page records at least one investigated incident on May 12, 13, 14, 15, 16, and 18. The AI fanboys will no doubt point out that taking Sunday off has biblical precedent, and we’re closer than ever to summoning God via JSON. The May 13 cluster includes two separate investigations totaling roughly two and a half hours. The May 14 investigation lasted about two hours. The May 15 incident is the most interesting one editorially: the status update specifically notes that “success rates for Opus 4.7 have returned to normal” while Opus 4.6 and Sonnet 4.6 were still degraded. The newer flagship recovered first. The older model and the smaller cheaper one stayed down longer.

The 90-day uptime numbers on the same status page tell a similar story by tier. Claude API: 98.99%. Claude Code: 99.14%. Claude Cowork: 99.45%. Claude for Government: 99.87%. The government tier gets approximately eight times less downtime than the public API, which is a useful way to think about exactly how much your federal contracting line item should cost to be worth it. OpenAI’s equivalent number is 99.82%.

The number that offers more insight than either of those is the one Vercel published on May 12 from seven months of AI Gateway data. About 3.5% of requests on the gateway end up rescued by failover to a healthy alternative. Measured by tokens, the rescue rate runs at 5.1%. Measured by dollars, 4.9%. The expensive end of the workload, long contexts, multi-step agent runs, heavy reasoning calls, is also the end most likely to need rescuing. A provider’s SLA measures request-level uptime. A production application experiences cost-weighted uptime, and the two blow themselves apart on exactly the calls that paid for the model. If your CFO is reading the SLAs and believing them, the CFO is reading the wrong document.

The Hype Audit Department

It’s worth saying out loud, because the prospectus does not. Cerebras shipped 30 million shares to public markets last week on the strength of a 76% revenue growth rate from $290 million to $510 million. That growth is real, or at least “real enough that if it’s not somebody will theoretically be going to jail.” The customers driving it are two government-funded UAE entities operating in the same emirate under the same sovereign sponsorship, plus a Master Relationship Agreement with OpenAI whose payments do not start materializing in the income statement until 2027 and whose existence assumes that OpenAI will be a buyer of physical inference compute four years from now in the volumes its current cap-table mathematics requires.

The phrase “diversified customer base” appears in the prospectus. The phrase “the same emirate’s two largest AI procurement vehicles” does not. The first phrase is technically accurate. The second is also technically accurate, and if we’re being direct it’s the one that should be priced in. Cerebras’s 86% two-customer concentration in 2025 is one percentage point higher than its 85% single-customer concentration in 2024. The change is a new LLC name on the second-largest line, not a new geography.

Both LLCs sit inside the same sovereign portfolio, and the prospectus knows it. On page 22, the company discloses that “G42 and MBZUAI are considered related parties with respect to each other as defined by Accounting Standards Codification 850.” For those of you who aren’t giant nerds, ASC 850 is the accounting rule that requires companies to flag related-party connections; Cerebras checked the box, used the accounting-standard citation as the entire structural acknowledgment, and stopped, hoping everyone else would too. The prospectus never specifies the relationship.

The relationship is this: G42 is chaired and controlled by Sheikh Tahnoun bin Zayed Al Nahyan, the UAE’s National Security Advisor since 2016, who oversees roughly $1.5 trillion in sovereign capital and was deputy national security advisor at the time of Project Raven, the surveillance program Reuters documented in 2019 that hired former NSA personnel to spy on American citizens, journalists, and dissidents. MBZUAI (pronounced like an Amazon seller who’s about to scam you) stands for “Mohamed bin Zayed University of Artificial Intelligence” and is named after his brother, Sheikh Mohamed bin Zayed Al Nahyan, the President of the UAE. E.

The closest the S-1 comes to acknowledging any of this is one passing reference, in the same risk factor, to “laws or regulations applicable to OpenAI, G42 or MBZUAI, or the United Arab Emirates.” That is the entire UAE disclosure. The investor is left to assemble the rest, which is the work AC exists to do.

The bear case for CBRS is not “the growth is fake,” it’s that “Mohamed bin Zayed University of Artificial Intelligence” and “G42” are not the names of a diversification strategy any more than “we are diversified between a guy and also his brother, both of whom have diplomatic immunity” is.

One last thing

This week the AI industry decided what kind of company it actually is. It turns out that it’s not a model vendor with a consulting practice attached (usually called “Professional Services”). Rather, it’s a consulting practice with a model vendor attached, and the model vendor’s job is to keep the consulting practice differentiated from Accenture. Anthropic and OpenAI both put $5.5 billion of investor capital toward this thesis in the same fortnight. Cerebras went public on the back of a procurement relationship with two foreign-government-adjacent buyers. Microsoft accepted a $54 billion haircut on its OpenAI returns so the cap table would look right for an IPO. GitHub admitted the subscription pricing it has been running for two years was never going to survive the agentic workloads it explicitly built the product around. Vercel published the data. The model layer is still the thing investors are buying, but it is not the thing they are paying for.

If you run AI workloads and you have not renegotiated your cloud commit this quarter, your counterparty just made it harder for you. If you sell consulting and you have not noticed that the labs are now your competitors, your counterparty also just made it harder for you. And if you have a Copilot Pro subscription on auto-renew and you have not looked at the June multiplier table, you are about to be one of the case studies in a future issue of Artificial Confidence. Either way, the bill changed.

See you next week.

— C

Artificial Confidence #1: AWS gave the agents a credit card

Corey Quinn — Tue, 12 May 2026 19:13:36 GMT

Hello and surprise to many of you; welcome to the inaugural issue of “Artificial Confidence.” Here, I cover the AI news from roughly the past week that doesn’t quite fit into Last Week in AWS.

“Last Week in AWS” came from something I desperately wanted: a source to round up the stuff from AWS’s cloud ecosystem that mattered to customers. We’re seeing a similar content spew in the AI space: lots of hype, lots of noise, yet remarkably low signal. I’ve grown weary of waiting for someone else to do it, so it’s time to be the change I want to see in the world. I want to bring an overheated, overhyped space to life in a way that humans actually care about without spending hours a day drudging through the muck. I want to surface the things that may have slipped past unremarked under a deluge of CEO said a thing style “journalism.” And I want to write it myself; the mortal sin of so much AI generated content nowadays is people believing that you’ll take the time to read something they couldn’t even be bothered to write.

If this isn’t for you, I understand completely; whack the unsubscribe link. Your “Last Week in AWS” subscription will remain unaffected; go ahead and cancel that too if you’re annoyed with me and were waiting for an excuse. I get it; even AWS doesn’t talk about AWS releases the way they once did. But I hope you’ll stick around.

Vendor story this week: AI agents can autonomously do everything. Research story this week: no the hell they cannot. Microsoft’s own scientists found the agents corrupt 25% of multi-step work on average, and that adding tools makes performance 6% worse. AWS, the same week, announced you can now give those agents a wallet. I have spent a decade watching AWS announce capabilities that arrived years before the safety infrastructure to support them; it’s refreshing to see the AI industry compress that timeline into a single news cycle.

What Actually Changed (Adjusted For Spin)

Claude Opus 4.7 raised prices without raising prices

Anthropic shipped Claude Opus 4.7 on April 16 with what they have, repeatedly, called “unchanged pricing.” Five dollars per million input tokens, twenty-five per million output, identical to Opus 4.6, 4.5, and 4.1. The pricing page has been the very model of consistency.

The tokenizer, however, has not. Opus 4.7’s new tokenizer is denser, which is the polite engineering phrasing for “turns the same English sentence into more tokens than the old one because we have hilariously overcommitted to buy every GPU on the planet and must pretend to be able to pay for them somehow.” Anthropic’s docs put the multiplier at 1.0x to 1.35x, with the upper end showing up on code, structured data, and non-English text. The number is filed on the “what’s new” page rather than the pricing page. That’s the kind of editorial decision you make when you would prefer the number not appear where purchasing decisions get made. The same page recommends “updating your max_tokens parameters to give additional headroom,” which is the advice you give people about to use more tokens than they were planning to, while hoping they aren’t astute enough to figure that out. A practitioner write-up on Medium benchmarked a real workload at a 27% bump on identical prompts.

This is more elegant than charging more for the same number. It is also less honest.

AWS Bedrock AgentCore now lets agents pay for things

Announced May 7. AI agents can now autonomously pay for APIs, MCP servers, web content, and other agents, via Coinbase CDP wallets or Stripe Privy (was “Shitr” taken?) wallets, with what the announcement repeatedly describes as “session-level spend limits,” whatever the hell that’s supposed to mean.

First: At last, I finally get to solve my biggest pain point as a customer: not being able to pay for things without human supervision. Er… wat? Does anyone actually have a problem doing this?

Second: I have done some looking, and I have not yet found a clear, durable definition of what a “session” is. Single API call? Single agent invocation? Single user-facing transaction? AWS, with their characteristic forthrightness, has provided exactly enough specificity to ship the feature and exactly enough ambiguity to ship the blog post. It’s now technically possible for an agent to burn $40,000 overnight against a misconfigured spend limit, an outcome that has been moved from “theoretical concern” to “forthcoming case study.” The post-mortem write-up is already in my saved-drafts folder, dated approximately six months from today.

Amazon Q Developer is being deprecated on an unusually short timeline

Announced April 30. Q Developer IDE plugins and paid subscriptions, which until very recently AWS was attempting to shove down our throats with zeal and gusto, reach end-of-life on April 30, 2027. Twelve months from announcement to “gone,” which is by AWS standards, brisk. The usual cadence is longer, but then again the usual cadence is also aligned with customers who are knowingly using the product.

May 15, 2026 (this Friday): no new signups, not that that was a problem.
May 29, 2026: Opus 4.6 disappears from Q Developer Pro.
Opus 4.7, the current Anthropic flagship, is available exclusively on Kiro; the replacement product you also have no interest in using.

If you run Q Developer Pro and you have been pretending Kiro is not a thing, AWS would like you to know that you have approximately two weeks before they begin to migrate you on their schedule.

Both Anthropic and OpenAI are now on Bedrock

Announced April 28. GPT-5.5 and GPT-5.4 on Bedrock in limited preview. Codex on Bedrock. Also “Amazon Bedrock Managed Agents, Powered by OpenAI” which is the longest AWS product name of 2026, and that is saying something.

For two years, Bedrock has been “Claude on AWS, plus a bunch of other rando models you won’t use on purpose.” It is now “either of the top two US AI labs on AWS.” Anthropic’s special-est-friend status has been quietly rezoned to “one of two preferred partners,” which is the corporate-relationship equivalent of being informed your spouse has decided to start dating again.

AgentCore in GovCloud (US-West) also went live May 5. Government workloads can now run agents. We’ll revisit in six months when something interesting happens.

Agents Got Powerful This Week. They Also Got Worse.

Microsoft’s own scientists: agents corrupt 25% of your work, and tools make it worse

Microsoft Research published a paper on Monday with the genuinely on-brand title “LLMs Corrupt Your Documents When You Delegate.” Unlike statements from the non-Research parts of Microsoft, it is exactly the paper the title suggests.

They created a benchmark called DELEGATE-52 which appropriately tests flows across fifty-two professional domains. Because the devil lives inside your corporate process, they feature twenty-interaction multi-step workflows. This brings us to their findings, ordered by how irritating each is to the marketing departments of every major AI vendor:

Frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT-5.4) lose, on average, 25% of document content over twenty interactions. The all-model average is 50%. You are reading those numbers correctly. If you hear screaming coming from the C-suite, so are they—and you just learned a valuable thing about the demographics of this publication.
Of fifty-two domains tested, exactly one met the bar of “ready for delegation” (≥98% accuracy after twenty rounds). That domain, completely unsurprisingly, is Python programming. Every other domain: accounting, music notation, crystallography; the actual knowledge work you would actually delegate to an actual agent? Yeah, that’s a harsher number you’re really not gonna like.
“Catastrophic corruption” (≤80% score) occurred in 80%+ of model/domain combinations.
The errors don’t accumulate gradually. They arrive in single 10-to-30-point drops. That’s the failure mode hardest to build an SLA against, and leads to questions that start with blistering profanity in the first sentence.
The somehow worse finding: adding an agentic harness with tools makes performance 6% worse on average. The entire architectural premise of the agentic movement currently being rammed down our throats (“give the model tools and it becomes more capable”) provably degrades outcomes in this benchmark.

This is not “AI is useless.” It is narrower and more devastating: the exact vendor positioning that Anthropic, OpenAI, Microsoft, Google, and AWS are all currently selling on the same Tuesday morning, consisting of “hand the agent a multi-step task and walk away,” is empirically contradicted by Microsoft’s own employees. The Register opens with “an intern who failed this much would be shown the door.” That is generous. An intern who lost 25% of a document gets a performance improvement plan. An intern who made the problem worse by adding a calculator gets introduced to a baseball bat after hours in some shops.

The same week, AWS announced autonomous-spending agents

Microsoft Research, Monday: agents corrupt a quarter of your work, and tools make that worse. AWS, the previous Thursday: now you can give those agents a wallet. I will leave these two stories next to each other and let you make the joke. Consider it a participatory newsletter.

A British mathematician handed an agent a credit card

The Register, May 5. An experimental run of an AI agent given payment authority that ended in password leaks, CAPTCHA chaos, and the kind of behavior you would expect from a sufficiently empowered toddler in a Best Buy. The headline is tabloid, but the experimental setup is approximately what AgentCore payments enables in production. The headline is also a more honest preview of where this leads than the AgentCore announcement is. It has to be.

Reliability: A Brief Retrospective

Last week was unusually rough for AI infrastructure. Five days, four user-affecting incidents, four different vendors, and we’ll skip GitHub because at this point I don’t think they come here for the hunting anymore:

May 5: Google Gemini degraded widely starting around 8:44 AM EDT, free and paid both. Multimodal hit harder than text; that’s an instructive nugget about which paths apparently share infrastructure.
May 7: IBM Cloud lost power at a datacenter. IBM “Cloud” is technically not an AI provider nor a real cloud, but enough AI runs on top of it that this counted.
May 8: Significant Claude outage. The same afternoon, OpenAI’s Responses API threw 404s for 35 minutes after a bad deploy. Two of the three major US AI providers degraded within hours of each other.
May 9: Claude Code on Web partial outage; Opus 4.1 elevated errors.

IsDown has logged 671 Anthropic incidents since June 2024, showing incidents typically resolving within 246 minutes. Multiply that by 671 and you’re measuring uptime that’s comparable to your bank’s business hours. The industry’s response to this baseline is, apparently, to put autonomous spending capability on top of it on the theory that none of us is as dumb as all of us. If your foundation has four-hour outages every couple of weeks and your stated direction is “deploy agents that pay for things on top of that foundation,” I would respectfully suggest that your circus is missing one of their underperforming clowns. (Decent recap on DEV.to.)

Follow The Money (Or Watch It Follow Itself)

Cerebras trades Thursday

Cerebras (CBRS) hits Nasdaq Thursday morning, set to be approximately the seventh “biggest AI IPO of 2026 so far,” a title that has changed hands roughly every six weeks since January, in a year not yet half over. The price-range escalation moved through three acts in a fortnight: $115-$125, then $125-$135, then $150-$160 on 20x oversubscription. That is the bankers’ way of admitting they underpriced the offering so embarrassingly that they would, in retrospect, like a do-over with witnesses present.

At the top, Cerebras raises ~$4.8 billion at a $48.8 billion fully-diluted valuation, or approximately 96 times trailing revenue. Trailing revenue is $510 million for 2025, with a reported 47% net margin. The word “reported” is doing considerable structural work in that sentence. Headline GAAP net income: $237.8 million. Of which $363.3 million is a one-time non-cash gain from extinguishing a forward-contract liability tied to G42. Strip out the accounting and Cerebras posted a non-GAAP net loss of $75.7 million for the year. Cerebras is “profitable” in roughly the same way you are profitable in a year you cleaned out the storage unit.

The prospectus is also unusually candid about customer concentration, in the way that suggests counsel concluded the SEC was going to ask anyway and did not want to be the ones holding the bag. Two UAE-based entities historically account for roughly 86% of revenue, the kind of currently war-afflicted geographic dependence that turns “concentration risk” into “two phone calls and a passport.” They say that OpenAI’s $20+ billion 750-megawatt deal represents “a substantial portion of projected revenue over the next several years,” which is S-1 language for “if anything happens to that one phone call, the rest of this prospectus is fiction.” The other hyperscaler customer is AWS, which is, as everyone knows, deeply enthusiastic about taking hard dependencies on third-party compute and never abandons them halfway through.

Read alongside Nvidia’s $40+ billion in 2026 equity investments (including, of course, a reported $30 billion stake in OpenAI), and the picture is this: Nvidia, the largest AI compute supplier, has placed its biggest equity bet on OpenAI; OpenAI is the largest customer of Cerebras; Cerebras is the AI compute supplier going public this week. This is what economists call “circular” and what regulators tend to call “some kind of obscene financial ouroboros we will be looking into in three years.”

Snap and Perplexity quietly buried the $400M partnership

The $400 million Snap-Perplexity deal (which was supposed to put Perplexity’s conversational search inside Snapchat for reasons the original announcement struggled to articulate clearly even at announcement time) has been “amicably ended,” per Snap’s Q1 earnings disclosure last week. “Amicably ended” is the financial-PR phrasing for “one or both parties walked into a meeting in February and could no longer remember what the slide deck was for.” Whatever the testing surfaced was bad enough that nine figures of pre-committed capital wasn’t enough to paper over it. In 2026, that is quaintly reassuring.

The Hype Audit Department

Mythos vs. cURL: one low-severity CVE, after all that

Anthropic’s Mythos is the company’s flagship “too dangerous to release publicly” cybersecurity model, purportedly capable of identifying and exploiting security vulnerabilities at a level beyond what’s safe to put in general circulation. The marketing implication: Pandora’s box on legs. Project Glasswing, via the Linux Foundation, provides gated access to selected open-source maintainers so they can use it defensively. Daniel Stenberg, the cURL maintainer, was on the list.

Mythos ran against the cURL codebase and returned five “confirmed” vulnerabilities. Stenberg’s team reviewed them. Three were false positives pointing at things already documented in cURL’s own API docs. One was a non-security bug. The fifth—and only—actual vulnerability is a low-severity CVE shipping with cURL 8.21.0 in late June. In Stenberg’s words: “The flaw is not going to make anyone grasp for breath.” For context: AI tooling has contributed 200–300 bugfixes to cURL over the last 8–10 months, and Stenberg says modern AI analyzers are better than what came before. He just doesn’t think Mythos is meaningfully better than the other modern AI analyzers. He calls the hype “primarily marketing.” (The Register hits harder than Stenberg actually did.)

The caveat: Stenberg never received hands-on access to Mythos. He signed up for Glasswing, but someone else ran the scan and sent him the report. He is, as of his Monday blog post, still waiting for direct access because the model is oh-so-scawy. The gating is tight enough that even the maintainers Anthropic is ostensibly trying to help cannot independently confirm or contest the danger claim. Combine that with April’s Firefox audit (271 flaws found, zero a competent human couldn’t have spotted) and a pattern emerges: every time someone qualified gets within evaluation distance of Mythos, they conclude it is unremarkable. Capable, but unremarkable. When the safety story IS the marketing story, you cannot tell them apart. I don’t think that’s an accident.

Google: criminals already used AI-built zero-days in the wild

On the same day The Register published the cURL piece, Google’s Threat Intelligence Group reported that criminals had already operationalized an AI-built zero-day in an attempted mass exploitation campaign. The defensive AI-vulnerability-finder, you will recall, is gated as too dangerous to release publicly. The offensive use is happening regardless. “Too dangerous to release” is the kind of framing that requires the bad guys to be waiting on the release schedule. They are, alas, not. Cynically, I wonder if the real reason not to release Mythos rhymes with “mompute schmortage.”

Where I’ll be

The Duckbill team (y’know, my day job) has a busy May and June, and we’re using it as an excuse to host dinners at every stop. I’ll be at all of them, if that’s the kind of thing that influences your dinner plans.

First up: San Francisco on May 19th, a small, off-the-record dinner about negotiating with hyperscalers. Jim Moses and I will be there, but this isn’t a presentation. It’s a conversation among people who’ve actually been in those rooms, with all the wit and sarcasm you’ve come to expect.

Then, because apparently we hate ourselves, we’re doing back-to-back AWS Summits in LA on June 9th and NYC on June 16th, with dinners at both for people in cloud cost and FinOps who want to continue the conference conversation somewhere with better food.

Spots are limited and require approval. Not mine, of course; you’re all aces in my book.

One last thing

If you read one thing this week, read the Microsoft paper; do not let the agents spend your money unattended in the meantime.

See you next week.

— C