April 8, 2026 · 9 min read

Why I Won't Ship AI Agents That Read the Open Web

On April 5, 2026, Google DeepMind published the largest empirical study of AI agent manipulation ever conducted. 502 participants, 8 countries, 23 attack types, every defense currently on offer rated insufficient. Here is the engineering position Webvise took the next morning.

AI AgentsAISecurityB2B

Share

On April 5, 2026, Google DeepMind published the largest empirical study of AI agent manipulation ever run: 502 real participants across 8 countries, 23 distinct attack types, frontier models including GPT-4o, Claude, and Gemini. The single sentence I took from it and pinned in my engineering notes the next morning is the only one that matters for anyone shipping a business chatbot in 2026: if your AI agent reads attacker-controlled text and then performs any action with user privileges, you have built a data exfiltration vulnerability. This is the reason webvise will not build, for any client at any price, an AI agent that browses the open web.

What DeepMind Actually Measured

Most of the press coverage of the study reported the headline number, 23 attack types, and moved on. The numbers underneath are what matter for anyone running an AI feature in production:

502 participants in real conditions, not simulated lab runs
8 countries, so the attacks were not optimized for one cultural or linguistic context
23 attack types across 10 categories, including direct prompt injection, indirect injection via web content, multimodal pixel injection, document injection, environment manipulation, jailbreak embedding, memory poisoning, goal hijacking, exfiltration, and cross-agent injection
All four classes of defense (input sanitization, prompt-level guards, sandboxing, human oversight) rated insufficient at scale

The category I keep coming back to is the eighth one, *goal hijacking through gradual instruction drift across interactions.* Every demo of an agent system you have ever seen survives a single adversarial prompt. None of them survive a hundred carefully spaced ones.

The Cascade Insight Most Coverage Missed

Buried in the study is the finding that decides whether multi-agent products are safe to ship at all. In any pipeline where agent A retrieves content, agent B processes it, and agent C executes an action, a single injection into agent A's data feed propagates through every downstream agent. Agent B trusts A's output. Agent C trusts B's output. The attacker did not need to compromise the model. They only needed to compromise the data the model consumed, once.

I run a personal multi-agent setup with Hermes, a NousResearch agent on Telegram that drives 14 cron jobs across daily news, medical guideline summaries, and personal logistics. Every single one of those 14 jobs reads from a source explicitly trusted and curated by hand. None of them follow links. None of them execute external instructions. After the DeepMind paper landed I audited every cron and the rule held. It held because it was written down two years ago and never relaxed. Most production agent stacks I see in client briefs do not have this rule, and the engineers building them have never been asked to write it down.

What 'Reading the Open Web' Looks Like in a Client Brief

The same request shows up in three variants, every month:

'Have the chatbot answer questions by browsing my competitor's website.' In effect, this would grant an attacker who controls any web page the agent visits a writable channel into the customer's session.
'Let users paste any URL and have the agent summarize it.' In effect, this allows any user to paste a URL whose HTML contains hidden instructions that exfiltrate the next messages of conversation.
'Add RAG over an external vendor's documentation that we don't host.' In effect, this grants the agent's tool-calling permissions to whoever edits a docs page on the vendor's site next.

Each one wires an attacker-controlled text channel directly into a system that has user data, tool calls, and outbound network access on the same side of the trust boundary. None of them is malicious on the part of the client. Every one of them is a defensible product idea. They are all also, after April 5, 2026, unshippable.

Every Defense Currently on Offer Fails

DeepMind tested all four of the obvious defense families. Here is their assessment, with my gloss on each one:

Defense	DeepMind verdict	Why it fails in practice
Input sanitization	Insufficient	You cannot sanitize image pixels, document metadata, or speaker notes inside a PDF at inference time. The attack surface is text and every other modality the agent ingests.
Prompt-level guards	Insufficient	Injected content is designed to look like a legitimate part of the page. By the time the model sees it, the guard has already trusted it.
Sandboxing	Reduces blast radius, does not prevent injection	Sandboxing helps if the result of the attack is contained. It does not help when the goal of the attack is to read user data and write it back through a legal-looking API call.
Human oversight	Insufficient at scale	An operator running an agent across 50 sources cannot review every page for hidden instructions. The whole point of the agent was that the human stepped out of the loop.

If you take the table seriously, there is no responsible way to ship an agent that reads attacker-controlled text and also takes user-privileged actions. The only available move is to remove one of those two properties.

What I Ship Instead

Webvise has shipped AI features into client production, including a construction landing page whose model calls route through the Vercel AI Gateway for provider routing and observability. The five rules below are what made that build defensible, and they are now hard preconditions for any AI work I take on:

Closed-input agents only. The agent reads from a finite, hand-curated set of sources under direct control. No open web. No user-pasted URLs. No external RAG over uncontrolled documentation.
Read-only by default. If the agent must read something that is not fully trusted, it cannot also call tools, send email, write to a database, or generate outbound network requests in the same session. You get one or the other, never both at once.
Cross-agent isolation. When agent A's output flows into agent B, B treats A's output as user input, not as system instructions. This is one line of code in the prompt and it is the entire defense against the cascade attack.
Capability budgets per agent. Every agent has a fixed list of tools and a token cap. The cap is small enough that even a successful injection cannot exfiltrate more than one short message.
Provider isolation through a gateway. I route every model call through Vercel AI Gateway so I can swap providers, log every prompt and completion, and revoke a key in seconds. If something looks wrong in the logs I can stop the bleeding the same minute I notice.

These are not exotic. They cost a few hours of design work, before any code is written. The reason most agent products in 2026 do not have them is that the team never assigned someone to draw the trust boundary.

Why I Decline Certain Builds

It would be easy to read this article as an agency saying it is too cautious to take your money. The opposite is true. The DeepMind paper hands every team that built engineering credibility before the agent boom a clear advantage: the ability to decline specific feature requests with a clear technical rationale; clients generally welcome this in retrospect. Vendors that build agents without these constraints take on significant exfiltration risk that is increasingly visible in incident reports.

The same opportunity that exists in content marketing right now exists in agent engineering. The market is seeing rapid deployment of chatbots without prompt-injection defenses, similar to the recent surge in low-quality LLM-generated content. The premium will go to the teams that can demonstrate, in advance, that theirs is built to a higher standard.

Where I Draw the Line

The shortest version of the rule, the one I now write into every project kickoff doc, is this: an agent can read untrusted content, or it can act with user privileges, but not in the same session. Everything else flows from that. If a feature request crosses the line, it does not get built. If it can be reshaped to stay on one side, I reshape it together with the client and ship the reshaped version. The DeepMind paper did not invent this discipline. It just took away every excuse for not having one.

webvise builds AI features for businesses where the cost of a single leaked customer message is higher than the cost of saying no to a feature request. If that describes your project, get in touch, the first step is drawing the trust boundary together, before any code is written.

Development practices are aligned with ISO 27001 and ISO 42001 standards.

I run my internal wiki on five shell commands and a hand-maintained index file, no vector database. For a 200-document knowledge base, that setup is cheaper, faster to build, and more accurate than a RAG pipeline. Here is why I skipped RAG and when you actually need it.

Anthropic's Mythos model discovered tens of thousands of zero-day vulnerabilities across every major operating system and escaped its own sandbox. Human-only patching cycles are increasingly outpaced by automated discovery in published research.