Artificial Confidence: GitHub Repriced the Habit it Built
GitHub evolved its billing model, and you’ll feel it soon. Everyone else annualized their best month and called it revenue.
I was overcome by events at Microsoft Build + fwd:CloudSec this week, so I’m sending this later than I normally do. I missed you, too. Now then:
GitHub moved Copilot to usage-based billing on June 1, and the folks who responded in the first day are precisely the heavy agentic users GitHub built this pricing structure to find. Meanwhile, back at the ranch, everyone else spent the week reporting run-rates: Cognition annualized a number, Anthropic presumably has one inside a confidential filing, and a small chorus of VCs sang out what I had assumed was already widely known: ARR means “a strong month, multiplied by twelve, assuming number only ever go up.” The net result of this is that the one dollar figure that actually changed got buried under a pile of imaginary ones instead.
What Actually Changed (Adjusted For Spin)
GitHub changes course
GitHub Copilot moved to usage-based billing on June 1. Premium request units are gone, replaced by token-metered “AI Credits,” which while sounding inscrutable, isn’t THAT far removed from the ever-shifting definition of a token. The base subscription prices did not change, which GitHub would very much like you to notice given that they haven’t stopped harping on that particular detail. What did change is that those prices now describe how much you get before the meter starts, which... may not be how many customers would like this story to end.
Two details buried in the notes are significant here. The fallback model is gone, so when your credits run out you no longer downgrade to a cheaper model, you simply stop. It is the first subscription I have encountered that ends mid-sentence, much as I am tempted to do to this one. And a Copilot code review now bills against AI Credits and GitHub Actions minutes at the same time, which feels... unfortunate.
The screenshots going around show projected bills jumping from “$50” to “several thousand,” and the caveat is that those are extrapolations from people a single day into a billing period that has not yet produced a real invoice, with zero changes to their workflows. The funnier and also truer point is who is doing the extrapolating. This may be hard for some folks to hear, but you absolutely do not arrive at a terrifying projected number by accident; you encounter them after the fact, in your bill, when you’re contemplating doing something truly desperate but also cannot afford rope. You get there by being precisely the high-volume agentic user GitHub spent two years encouraging you to become, and then doing the math. The loudest complaints this week are a confession of exactly the usage profile the new pricing was built to locate. GitHub will correctly tell you your bill is atypical compared to a hypothetical spherical cow / customer. They’re learning that “correct” is not the same as “reassuring,” as it’s becoming clear that while customers value transparency, it’s as a means to the end of what they really want: predictability.
Cursor doubled a price and let everyone watch GitHub instead
Composer 2.5’s Fast tier went from $1.50/$7.50 to $3.00/$15.00 per million tokens, a 100% increase that landed two weeks ago and that almost nobody filed as a price hike, because it was positioned as “the more important number to go up is the version number of the model.” It’s interesting, because this is directionally the same move that GitHub made, at roughly the same time, to half the outrage—because a new model number is a press release, while a new price is relegated to a footnote.
And the introductory rates are expiring in a chorus. Composer 2.5’s launch promo ended May 25, Codex Pro’s ended May 31, and the Opus 4.7 multiplier inside Copilot already doubled on April 30. The pattern is now established: ship at a subsidized rate, train the workflow, then let the meter find its level. If you have not locked in your workflow’s economics before the promo expires, surprise! You have some thinking to do.
Follow The Money (Or Watch It Follow Itself)
Cognition raised $1B, and you should read the metrics carefully
Cognition closed a Series D of more than $1 billion at a $26 billion post-money valuation on May 27, up from $10.2 billion eight months earlier. The headline statistic, repeated everywhere, is that 89% of the code committed at Cognition is now written by Devin, the company’s own AI software engineer and is totally not just some guy in a trench coat and a fake moustache.
What this means, filtered through my snarky lens, is that a company that sells an AI software engineer is reporting that its AI software engineer writes most of its software, and offering this as proof the product works. It is the cleanest available example to date of a vendor grading its own homework, on a test it wrote, in a classroom it owns, then issuing a press release about the score. The 89% may be entirely real, but it’s also the least independent benchmark imaginable until next week, when something will no doubt surpass it somehow.
The number that underpins that is revenue that grew from $37 million to $492 million in twelve months, a roughly 53x multiple on the valuation. To Cognition’s credit, they pulled an Andy Jassy and called it run-rate, which is the honest term. The problem comes in the shape of everyone who read “run-rate” and heard “revenue.”
Everyone’s revenue is a run-rate now, and a few VCs finally said so
ARR used to mean annual recurring revenue: money a customer was contractually obligated to pay you. Think “I have signed up for a one year contract with you at a fixed fee schedule, committing me to pay you $X a month.” In a TechCrunch piece last month, Spellbook CEO Scott Stevenson called the current usage of the term a “scam,” and he is not even slightly wrong about the mechanics. Usage-based billing breaks the “contracted” half of ARR, so a strong month gets annualized—much like a salesperson who will take their best ever month, multiply it by 12, and claim that was their total annual compensation.
The really damning line came from an unnamed investor in the same piece: once one company in a category does it, the rest nearly have to, just to keep pace. That is a prisoner’s dilemma of revenue reporting brought to you by someone funding both prisoners.
Anthropic filed the most confident empty document of the year
Anthropic confidentially filed a draft S-1 on June 1, beating OpenAI to the announcement that carries the least checkable information of any in the AI IPO cycle. OpenAI will likely shortly file an “S-1o” with improved reasoning or whatnot. “Confidential” means, obviously, that we cannot read a word of it. The valuation and the roughly $47 billion run-rate everyone is quoting come from the last funding round and what the company told its investors, not from the audited document that is currently sealed from view. This is the filing that has real numbers of the “make these up and you may well serve prison time” variety.
The Agents Got Expensive. They Did Not Get Safer.
So let’s review the week. The bill for agentic coding went up (GitHub, Cursor). The valuation of agentic coding went up (Cognition, $26 billion). And the independently measured quality of what these agents actually commit did not move because of course it didn’t.
CodeRabbit’s analysis of 470 real-world pull requests found AI-co-authored code introduced up to 2.74x more security vulnerabilities than human-written code. Veracode tested more than 100 models and found 45% of AI-generated samples introduced an OWASP Top 10 vulnerability, a pass rate that has not improved across testing cycles despite a steady stream of vendor claims that the latest model finally fixed it. The security curve (motto: “the one nobody puts on a slide”) is remaining flat, regardless of how the capability advances.
Which puts Cognition’s 89% in a stark light: if your AI writes 89% of your code, and independent measurement says AI-written code ships vulnerabilities at multiples of the human rate, then 89% is less a productivity statistic than it is a description of your attack surface—annualized.
Meanwhile, inference itself keeps getting cheaper. DeepSeek’s V4-Flash runs around $0.14 per million input tokens against GPT-5.5’s $5.00, a gap of roughly 36x on input and north of 100x on output, at comparable performance on many tasks. So the raw cost of intelligence is collapsing in public while the cost of the tools you actually code in went up this week. So the “token cost is shrinking” is offset by “so let’s immediately burn as many as they can in nondeterministic ways in the harnesses.”
One last thing
If you run AI workloads, do the boring thing this week: pull your own usage numbers before the next promo expiry (there’s always another one!), and find out which of your workflows is hanging out in a cohort waiting to be repriced. The vendors already know. Remember: the only number in this entire issue you can fully verify is the one on your own invoice.
See you next week.
— C



