April 22, 2026 · 9 min read

Copilot vs. Autopilot: Why Many AI Agencies Struggle to Compound Their Margins

Sequoia says services are the new software. ColdIQ hit $7M ARR selling the work, not the tool. Here's why AI agencies selling tools face margin compression, and what replaces them.

Topics

Business StrategyAI AgentsAIAutomation

Agencies that resell tools rather than capabilities tend to face margin compression with each new model release. The agencies that sell the outcome tend to benefit, rather than suffer, from model improvements, and the numbers behind that claim are finally public. Everyone in 2026 is racing to ship the next AI copilot. Alex Vacca bootstrapped ColdIQ to $7M in annual revenue by deliberately doing the opposite.

You have probably been told the AI agency model dies the same way the 2022 productized-agency model did. You are half right. The shape that dies sells tools. The shape that survives sells the work underneath them, and Sequoia partner Julien Bek just named it *Services: The New Software*.

Below is the thesis, the 1-to-6 math behind it, Vacca's six-step playbook from ColdIQ, and the two concrete shifts it forced on webvise this quarter.

Selling the tool puts you in a permanent race against the foundation model. Selling the outcome means agencies tend to benefit, rather than suffer, from model improvements.
ColdIQ is the first bootstrapped services-as-software datapoint with receipts. $7M ARR, 400 B2B clients, 2,200+ campaigns, 31 months from zero, no outside capital, 30+ people across ten countries.
Mature service verticals typically run at 1:6 to 1:12 software-to-services spend (per Sequoia partner Julien Bek's analysis). The budget lives in the work, not in the software line item.
A common hire order across AI-native services firms is: replace yourself on delivery before you hire anyone in marketing, sales, or ops.
webvise is already operating in this shape. The Anthropic Claude Partner Network clearance (2026-04-10, move-forward 2026-04-18) is the credibility layer that shortens the buyer's trust ramp.

The thesis Sequoia just named, and the one ColdIQ already built

Julien Bek published *Services: The New Software* in April 2026. The argument fits in one sentence. The next trillion-dollar company sells the work, not the tool, because for every $1 of software budget there are $6 to $12 of services spend sitting next to it (per Sequoia partner Julien Bek's analysis), and AI finally made that services budget attackable by small teams.

Alex Vacca proved the model three years before Sequoia named it. In 2022 he quit an $80K operations job at Worldcoin (Sam Altman's iris-scanning company) and charged $3,000 a month for his first cold-outbound retainer. Three years and 2,200+ campaigns later, ColdIQ runs at $7M+ ARR with 400 B2B clients and 30+ employees across ten countries. Zero outside capital.

The shape matters more than the number. ColdIQ ran the work manually for a full year before encoding any of it. That manual period was not a bug on the way to software. It was the R&D phase of the software shipped underneath.

The agency was the training set. If you want the broader argument that every SaaS collapses into this pattern from the other direction, we made the opposite case in a recent post.

Copilot vs. autopilot, one compresses, one compounds

Bek's clearest distinction is who the model actually serves. A copilot puts AI in a professional's hands. The professional reviews the output, carries the risk, keeps the client relationship.

An autopilot skips the professional entirely and ships the outcome to the buyer directly. Two shapes, two fates.

The pricing ceiling of a copilot is bounded by the number of professionals times the license price. The ceiling of an autopilot is the TAM of the outcome itself. One cap is a calendar. The other is a market.

	Copilot	Autopilot
Who uses it	A professional	The end buyer directly
Who carries risk	The professional	The vendor
Pricing ceiling	Seats x license price	TAM of the outcome
Named examples	Harvey (law firms), Rogo (investment banks)	Crosby (NDAs), WithCoverage (policies), ColdIQ (meetings)
Next model release	Feature risk: the model eats the tool	Margin expansion: delivery cost drops, price holds

Every AI-tool founder is asking the same question this quarter. What happens when the next model release turns my product into a feature? It is the right question.

A copilot has to stay upstream of the foundation model forever. An autopilot improves every time the foundation model does.

The math: why $1 of software sits next to $6 to $12 of services

Bek's canonical example cuts through the noise. A typical small company spends around $10,000 a year on QuickBooks and roughly $120,000 a year on the accountant who actually closes the books. That ratio is 1:12 for accounting specifically. Most categories sit closer to 1:6.

Cold outbound has the exact same shape. A B2B company pays for a couple of seats of sales tooling, one SDR salary, and a meaningful multiple of both to an agency when they hire one. The software line is tiny. The work is expensive.

That is why Vacca never had to invent a market. He swapped into an existing budget line that nobody was contesting with AI-native delivery.

Three filters decide whether a niche clears the test:

Is the work already outsourced today? You want to swap into an existing budget, not invent a new one.
Is it intelligence work? Pattern recognition and rule application, not genuine strategic judgment that only a human can hold.
Is the services spend meaningfully larger than the software spend? If the ratio is 1:2 or less, there is no hidden budget to attack.

Cold outbound cleared all three. So do SEO content operations, customer-support triage, invoice extraction, and lead enrichment. A lot does not. Senior M&A structuring, clinical decision-making, and creative brand direction fail the intelligence-work test, and pretending they pass is how autopilot vendors in those categories lose their clients.

Vacca's six-step order, and the hire sequence most founders get wrong

Six steps, and the order matters more than any single step on its own.

1. Pick one outsourced line item inside one industry. Narrow wins because narrow accumulates proprietary data fastest, and the data is the real moat.
2. Land the first clients personally. No website, no deck, no funnel. Set a retainer floor you would still want to be anchored to in three years. ColdIQ opened at $3K/month because below that delivery was impossible.
3. Do the work by hand and document every step. Four artifacts from day one: a markdown SOP per repeatable task, a Loom any time the work touches a cursor, a dated decision log per client, and a failed-campaigns file. That last file becomes the most valuable artifact of year one.
4. Price like a service, report like a product. Setup fee plus monthly retainer tied to an outcome metric (meetings booked, closes delivered) plus a performance bonus when the target is beaten. Live dashboard from day one, weekly wins-and-misses update, quarterly conversation with the decision-maker, not only the operator.
5. Replace yourself on delivery before you scale anything else. Hire order: delivery operator, then technical automator, then head of delivery. Do not hire a marketer, a salesperson, or a COO before the delivery layer runs without you.
6. Compound the data moat before the software moat. Save every input (raw and cleaned), every output tagged with its outcome, every judgment call with its reasoning, and every objection plus the response that closed it.

Vacca stepped out of daily delivery entirely in 2025. The business moved forward the next month. Founder-led delivery sets the technical ceiling; this contrasts with sales-led agency models from earlier cycles — every agency founder who said "services don't scale" was really saying "I personally don't scale".

What this looks like inside a small AI-native agency

We have been running inside this shape at webvise for a while. The agency delivers landing pages, WordPress-to-Next.js migrations, MVPs, and AI-automation retainers through a Claude-native stack: Next.js 16, Drizzle ORM, Neon Postgres, Better Auth, Vercel AI SDK, Polar for billing. All production code passes through Claude Code in our pipeline.

25,000+ lines of client application code have already shipped through that loop on a single engagement; line counts reflect throughput, not quality. The agency is not the endpoint. It is the manual-delivery phase of whatever we productize next.

Two concrete shifts the Bek and Vacca thesis forced on us this quarter:

Anthropic Claude Partner Network. We cleared initial review on 2026-04-10 and were approved to move forward on 2026-04-18. That is not a badge we use in taglines. It is the credibility layer that shortens the buyer's trust ramp for Claude-native delivery, and it qualifies us for co-sell engagements where the customer has already been sold on Claude by Anthropic directly.
Every retainer now keeps a failed-campaigns file. Landing-page launches, AI-automation flows, and migration engagements each maintain a markdown log of what did not work and why. That file is the training set for the autopilot version of the same service. It is the artifact most agencies skip, and it is the one Vacca points to as the most valuable thing produced in year one.

Anonymized client engagements where this is live or on-deck: a SaaS reference app, a youth-organization platform, a construction landing page with an AI chatbot, and a patient-management app. Each engagement is paying us to produce training data for a product we have not shipped yet. Every one of them compounds the dataset underneath.

When staying in copilot mode is the right call

The autopilot framing is not universal. Copilot is the correct shape when the underlying judgment is genuinely non-transferable. Senior legal advocacy, M&A deal structuring, and regulated clinical decision-making do not collapse into an outcome SKU that a vendor can underwrite.

Harvey sells a copilot to law firms because the partner, not the model, is the insured entity. Rogo sells to investment banks for the same reason. In both cases, the buyer is held legally, financially, or ethically responsible for the output of the work, so they will not hand the outcome to a vendor no matter how good the model gets.

The test is simple. If the buyer carries the regulatory or reputational liability, copilot is correct. If the buyer just wants the outcome and does not care who made it, autopilot is the only shape that survives the next model release.

Everywhere in between, the copilot vendor spends the next five years racing the foundation-model provider. A model that is structurally hard to win.

How to decide what you actually sell on Monday morning

Two questions answer most of the decision:

Can the buyer pay you for the outcome and never touch the software? If the answer is no, you are shipping a copilot. Fine, if the category clears the non-transferable-judgment bar. Not fine, if the category is intelligence work.
Is the services spend in your category at least 6 times the software spend? If yes, there is a hidden budget line worth attacking with AI-native delivery. If no, ship software, sell seats, and accept the copilot ceiling.

If you are a founder plateaued between a small consultancy and a real business, the path is honest about what it takes. Work by hand until the market teaches you what to automate. Hire delivery before anything else. Keep the failed-campaigns file.

Write the dashboards before you write the product. The anti-slop content strategy we use on this blog is a smaller version of the same principle: publish only what the foundation model cannot already generate.

If you are evaluating how to reshape your agency for the next twelve months, or you want a Claude-native delivery partner that has already committed to this shape, webvise is built exactly for this thesis. Let's talk.

Webvise practices are aligned with ISO 27001 and ISO 42001 standards.

Kimi K2.6: An Open-Weight Frontier Coding Model at One-Tenth the Cost

Moonshot's Kimi K2.6 is the second open-weight Chinese coding model to land at frontier level in four months. For agencies shipping AI agents to clients, the stack decision changed overnight.

When Clients Ask For RAG in 2026: Our Decision Tree (And Why We Rarely Start There)

We still build RAG pipelines when clients insist, but we rarely recommend them first in 2026. Most of the 2024 LLM tooling stack got obsoleted between January and April. Here is the decision tree we walk every prospect through, and what we ship when the tree points away from RAG.