Why Most AI Agent Setups Run Far Slower Than They Could

The gap between modest and substantial value from AI tools comes from the architecture wrapping the model. Steve Yegge observed in early 2026 that developers using AI coding agents report meaningful productivity gains on appropriate tasks; reported figures vary widely by methodology and task type. Same models. Same underlying intelligence. The variable is structure.

In a single week in April 2026, five independent builders published frameworks for AI agent architecture. Garry Tan (Y Combinator), Andrej Karpathy, Viv Trivedy, Daniel Miessler, and a community repository that hit 19,700 GitHub stars all converged on the same core thesis: push intelligence into portable markdown files, keep orchestration infrastructure as thin as possible, let the model do the reasoning. This article breaks down what they agreed on, where they disagree, and what it means for anyone building with AI.

Architecture receipts

Intelligence belongs in markdown skill files, not in framework code. Skills are portable, versionable, and improve automatically when the model improves.
The harness should do four things and nothing else. Run the model in a loop, read and write files, manage context, enforce safety. Every feature you add beyond that eats context and slows reasoning.
Five builders independently published the same thesis within three days (April 12 to 15, 2026). Garry Tan, Andrej Karpathy, Viv Trivedy, Daniel Miessler, and a 19.7K-star community repo. Convergence across multiple independent sources is one signal among several that an architectural pattern is sound.
LangChain disagrees and has benchmarks to prove it. Harrison Chase argues the harness IS the product. The answer may depend on whether you are building consumer tools or enterprise pipelines.
Prescriptive instructions expire. Context does not. Every step-by-step recipe you write for an AI degrades with the next model release. Context about who you are and what you want compounds.

The Architecture Is Compact

Public discussions of production agent runtimes have increasingly focused on a compact architecture: a model loop, tools, context management, and safety controls. Garry Tan described that pattern as a productivity driver he had been teaching at Y Combinator for months: the productivity gap comes from what wraps the model.

Tan distilled the architecture into three layers:

Layer	What It Contains	Key Property
Fat skills	Markdown procedures encoding judgment, process, domain knowledge	Portable. You own them.
Thin CLI harness	~200 lines: JSON in, text out, context management, safety	Minimal. Vendor provides it.
Your application	QueryDB, ReadDoc, Search, Timeline. Deterministic operations.	Reliable. Same input, same output.

The principle is directional. Push intelligence up into skills. Push execution down into deterministic tooling. Keep the harness thin. 90% of the value lives in the skill layer. The harness is a conductor that reads files. It does not own them.

Tan's own experience makes the point. His personal CLAUDE.md started at 20,000 lines. Every quirk, every convention, every lesson he had ever encountered. The result: Claude Code's attention degraded. The model literally told him to cut it back. His fix was 200 lines of pointers to documents that load on demand. The full 20,000 lines of knowledge still exist. They just load when relevant instead of polluting the context window on every turn.

If you are building AI-powered tools or workflows for your business, getting the architecture right from the start determines whether you end up with a demo that impresses or a system that ships.

Five Definitions That Distinguish High-Output AI Builds

The architecture rests on five concepts. Skip any one and the system underperforms.

1. Skill files

A skill is a reusable markdown document that teaches the model the process for a category of task. The user supplies the task. The skill supplies the procedure. It works like a method call: same procedure, different arguments, radically different outputs.

Tan's example: a skill called /investigate has seven steps (scope the dataset, build a timeline, diarize every document, synthesize, argue both sides, cite sources). Point it at a safety scientist and 2.1 million discovery emails and you get a medical research analyst. Point it at a shell company and FEC filings and you get a forensic investigator. Same seven steps. The invocation supplies the world.

2. Resolvers

A resolver is a routing table for context. When task type X appears, load document Y first. Without a resolver, a developer changes a prompt and ships it. With a resolver, the model reads the eval suite documentation first, runs benchmarks, and reverts if accuracy drops more than 2%. The developer did not know the eval suite existed. The resolver loaded the right context at the right moment.

3. Latent vs. deterministic

Every step in a system is one or the other. Confusing them is the most common mistake in agent design. An LLM can seat 8 people at a dinner table, accounting for personalities. Ask it to seat 800 and it will hallucinate a seating chart that looks plausible but is completely wrong. That is a deterministic problem forced into latent space. The best systems are ruthless about this line.

4. Diarization

The model reads everything about a subject and writes a structured profile. No SQL query produces this. No RAG pipeline produces this. The model has to read, hold contradictions in mind, notice what changed and when, and synthesize structured intelligence.

Tan's team built a system for YC Startup School that manages 6,000 founder profiles this way. The diarization output catches things no keyword search could: a founder who says "Datadog for AI agents" but whose GitHub commits are 80% billing code. She is building a FinOps tool disguised as observability. That gap between "says" and "actually building" requires reading the commit history, the application, and the advisor transcript simultaneously. No embedding similarity search finds it.

5. Permanent upgrades

Tan's instruction to his AI: "If I ask you to do something and it's the kind of thing that will need to happen again, codify it into a skill file. If it should run automatically, put it on a cron. If I have to ask you for something twice, you failed." Every skill written is a permanent upgrade. It never degrades. When the next model drops, every skill instantly gets better. The system compounds.

Five Frameworks Published in One Week All Say the Same Thing

The convergence is the strongest signal. These five bodies of work emerged independently between April 12 and 15, 2026. None of these builders are collaborating. They arrived at the same architecture from different starting points.

Framework	Where Intelligence Lives	What Stays Thin
Tan (fat skills)	Markdown skill files, SOUL.md	The harness: conductor, not brain
Karpathy (CLAUDE.md)	Behavioral instruction files	No framework needed. One .md file
Trivedy (context fragments)	Externalized memory, retrieval layer	Harness manages context, doesn't own knowledge
Miessler (bitter lesson)	Context about identity, goals, taste	Instructions about how to execute
Community (19.7K-star repo)	Skills, slash commands, CLAUDE.md rules	Subagents replace compaction. Grep replaces RAG

Tan arrived here from shipping a high volume of production code in two months. Line count is a throughput metric, and the throughput is unusual with gstack (23,000+ GitHub stars in its first week; star counts measure visibility, not fitness for production). Karpathy arrived from debugging the three persistent failure modes of AI coding assistants. Trivedy arrived from iterating on harness design through 30+ versions. Miessler arrived from applying Richard Sutton's bitter lesson to AI tooling.

Convergence across multiple independent sources is one signal among several that an architectural pattern is sound.

LangChain Disagrees, and They Have Benchmarks to Prove It

Harrison Chase (LangChain CEO) shipped Deep Agents the same week and argued the opposite: the harness IS the product. Built-in task planning, sub-agent spawning, middleware, hooks, full orchestration infrastructure. His evidence: changing only the harness moved LangChain's DeepAgent from outside the top 30 to the top 5 on TerminalBench 2.0.

LangChain processes millions of agent calls daily, so the objection deserves attention. Their benchmarks are public. The genuine split: Tan's position is that every piece of logic in the harness caps reasoning the model could have done. The better the model gets, the thinner the harness should be. Chase's position is that harness engineering extends model capability rather than replacing it.

Both positions may be correct for different contexts. Consumer and personal agents (where portability and longevity matter) favor thin harness. Enterprise pipelines (where reliability and auditability matter) may justify fat harness. Neither side disputes that skills should be fat. The useful question for your project is which side of the line your use case falls on.

Most businesses building AI features for the first time should start thin and add infrastructure only when they hit specific reliability walls. Not sure where your project falls? Talk to webvise about which architecture fits.

Your Instructions Will Expire. Your Context Won't.

Daniel Miessler published the sharpest diagnostic of the week. He calls it the bitter lesson engineering audit, after Richard Sutton's 2019 observation that general approaches scaling with computation consistently beat hand-coded approaches in the long run.

Applied to AI tools: bad harness engineering is prescriptive instructions. "First copy this file, then load this, then do this, then do that." Step-by-step micromanagement of the AI's execution. This approach degrades as models get smarter. Overly rigid steps prevent the model from applying its own reasoning.

Good harness engineering is contextual. Who you are, what you are working on, what you are trying to accomplish, what good and bad look like. Identity, taste, standards, goals. The model figures out the how.

Miessler's diagnostic is simple. If your configuration reads like a recipe (step 1, step 2, step 3), you are doing bad harness engineering. If it reads like a briefing document (here is who I am, here is what matters, here are the tools), you are doing good harness engineering. Context about who you are never expires. Prescriptive instructions become obsolete with every model improvement.

The architecture is not complicated. Fat skills, thin harness, ruthless separation of latent and deterministic work. The hard part is discipline: encoding judgment into reusable skills instead of doing one-off work, keeping the harness thin when the temptation is to add features, and trusting the model to figure out the "how" when you give it the right "what" and "why."

webvise builds AI-powered systems following these architecture principles. Whether you need an agent workflow, an automated pipeline, or a production-grade AI integration, the architecture matters more than the model.

Development practices are aligned with ISO 27001 and ISO 42001 standards.