Why Your AI Agent Setup Is 50x Slower Than It Should Be
The gap between 2x and 100x AI productivity isn't the model. It's the architecture wrapping it. Five builders published the same thesis in one week.
The gap between a developer getting 2x value from AI tools and one getting 100x is not the model they use. It is the architecture wrapping that model. Steve Yegge claimed in early 2026 that people using AI coding agents are 10x to 100x more productive than those still using chat interfaces and autocomplete. Same models. Same underlying intelligence. The variable is structure.
In a single week in April 2026, five independent builders published frameworks for AI agent architecture. Garry Tan (Y Combinator), Andrej Karpathy, Viv Trivedy, Daniel Miessler, and a community repository that hit 19,700 GitHub stars all converged on the same core thesis: push intelligence into portable markdown files, keep orchestration infrastructure as thin as possible, let the model do the reasoning. This article breaks down what they agreed on, where they disagree, and what it means for anyone building with AI.
Key Takeaways
Intelligence belongs in markdown skill files, not in framework code. Skills are portable, versionable, and improve automatically when the model improves.
The harness should do four things and nothing else. Run the model in a loop, read and write files, manage context, enforce safety. Every feature you add beyond that eats context and slows reasoning.
Five builders independently published the same thesis within three days (April 12 to 15, 2026). Garry Tan, Andrej Karpathy, Viv Trivedy, Daniel Miessler, and a 19.7K-star community repo. That convergence is the signal.
LangChain disagrees and has benchmarks to prove it. Harrison Chase argues the harness IS the product. The answer may depend on whether you are building consumer tools or enterprise pipelines.
Prescriptive instructions expire. Context does not. Every step-by-step recipe you write for an AI degrades with the next model release. Context about who you are and what you want compounds.
The Whole Architecture Fits on an Index Card
On March 31, 2026, Anthropic accidentally shipped the entire Claude Code source code to the npm registry. 512,000 lines. Garry Tan read it. What he found confirmed a pattern he had been teaching at Y Combinator for months: the productivity gap is not about model intelligence. It is about what wraps the model.
Tan distilled the architecture into three layers:
| Layer | What It Contains | Key Property |
|---|---|---|
| Fat skills | Markdown procedures encoding judgment, process, domain knowledge | Portable. You own them. |
| Thin CLI harness | ~200 lines: JSON in, text out, context management, safety | Minimal. Vendor provides it. |
| Your application | QueryDB, ReadDoc, Search, Timeline. Deterministic operations. | Reliable. Same input, same output. |
The principle is directional. Push intelligence up into skills. Push execution down into deterministic tooling. Keep the harness thin. 90% of the value lives in the skill layer. The harness is a conductor that reads files. It does not own them.
Tan's own experience makes the point. His personal CLAUDE.md started at 20,000 lines. Every quirk, every convention, every lesson he had ever encountered. The result: Claude Code's attention degraded. The model literally told him to cut it back. His fix was 200 lines of pointers to documents that load on demand. The full 20,000 lines of knowledge still exist. They just load when relevant instead of polluting the context window on every turn.
If you are building AI-powered tools or workflows for your business, getting the architecture right from the start determines whether you end up with a demo that impresses or a system that ships.
Five Definitions Separate 100x Builders from Everyone Else
The architecture rests on five concepts. Skip any one and the system underperforms.
1. Skill files
A skill is a reusable markdown document that teaches the model how to do something. Not what to do. The user supplies the task. The skill supplies the process. It works like a method call: same procedure, different arguments, radically different outputs.
Tan's example: a skill called /investigate has seven steps (scope the dataset, build a timeline, diarize every document, synthesize, argue both sides, cite sources). Point it at a safety scientist and 2.1 million discovery emails and you get a medical research analyst. Point it at a shell company and FEC filings and you get a forensic investigator. Same seven steps. The invocation supplies the world.
2. Resolvers
A resolver is a routing table for context. When task type X appears, load document Y first. Without a resolver, a developer changes a prompt and ships it. With a resolver, the model reads the eval suite documentation first, runs benchmarks, and reverts if accuracy drops more than 2%. The developer did not know the eval suite existed. The resolver loaded the right context at the right moment.
3. Latent vs. deterministic
Every step in a system is one or the other. Confusing them is the most common mistake in agent design. An LLM can seat 8 people at a dinner table, accounting for personalities. Ask it to seat 800 and it will hallucinate a seating chart that looks plausible but is completely wrong. That is a deterministic problem forced into latent space. The best systems are ruthless about this line.
4. Diarization
The model reads everything about a subject and writes a structured profile. No SQL query produces this. No RAG pipeline produces this. The model has to read, hold contradictions in mind, notice what changed and when, and synthesize structured intelligence.
Tan's team built a system for YC Startup School that manages 6,000 founder profiles this way. The diarization output catches things no keyword search could: a founder who says "Datadog for AI agents" but whose GitHub commits are 80% billing code. She is building a FinOps tool disguised as observability. That gap between "says" and "actually building" requires reading the commit history, the application, and the advisor transcript simultaneously. No embedding similarity search finds it.
5. Permanent upgrades
Tan's instruction to his AI: "If I ask you to do something and it's the kind of thing that will need to happen again, codify it into a skill file. If it should run automatically, put it on a cron. If I have to ask you for something twice, you failed." Every skill written is a permanent upgrade. It never degrades. When the next model drops, every skill instantly gets better. The system compounds.
Five Frameworks Published in One Week All Say the Same Thing
The convergence is the strongest signal. These five bodies of work emerged independently between April 12 and 15, 2026. None of these builders are collaborating. They arrived at the same architecture from different starting points.
| Framework | Where Intelligence Lives | What Stays Thin |
|---|---|---|
| Tan (fat skills) | Markdown skill files, SOUL.md | The harness: conductor, not brain |
| Karpathy (CLAUDE.md) | Behavioral instruction files | No framework needed. One .md file |
| Trivedy (context fragments) | Externalized memory, retrieval layer | Harness manages context, doesn't own knowledge |
| Miessler (bitter lesson) | Context about identity, goals, taste | Instructions about how to execute |
| Community (19.7K-star repo) | Skills, slash commands, CLAUDE.md rules | Subagents replace compaction. Grep replaces RAG |
Tan arrived here from shipping 600,000 lines of production code in 60 days with gstack (23,000+ GitHub stars in its first week). Karpathy arrived from debugging the three persistent failure modes of AI coding assistants. Trivedy arrived from iterating on harness design through 30+ versions. Miessler arrived from applying Richard Sutton's bitter lesson to AI tooling.
When five independent sources converge on the same architecture within 72 hours, the architecture is probably right.
LangChain Disagrees, and They Have Benchmarks to Prove It
Harrison Chase (LangChain CEO) shipped Deep Agents the same week and argued the opposite: the harness IS the product. Built-in task planning, sub-agent spawning, middleware, hooks, full orchestration infrastructure. His evidence: changing only the harness moved LangChain's DeepAgent from outside the top 30 to the top 5 on TerminalBench 2.0.
This is not a fringe objection. LangChain processes millions of agent calls daily. Their benchmarks are public. The genuine split: Tan's position is that every piece of logic in the harness caps reasoning the model could have done. The better the model gets, the thinner the harness should be. Chase's position is that harness engineering extends model capability rather than replacing it.
Both positions may be correct for different contexts. Consumer and personal agents (where portability and longevity matter) favor thin harness. Enterprise pipelines (where reliability and auditability matter) may justify fat harness. Neither side disputes that skills should be fat. The question for your project is not which camp is right. It is which side of the line your use case falls on.
Most businesses building AI features for the first time should start thin and add infrastructure only when they hit specific reliability walls. Not sure where your project falls? Talk to our team about which architecture fits.
Your Instructions Will Expire. Your Context Won't.
Daniel Miessler published the sharpest diagnostic of the week. He calls it the bitter lesson engineering audit, after Richard Sutton's 2019 observation that general approaches scaling with computation consistently beat hand-coded approaches in the long run.
Applied to AI tools: bad harness engineering is prescriptive instructions. "First copy this file, then load this, then do this, then do that." Step-by-step micromanagement of the AI's execution. This approach degrades as models get smarter. Overly rigid steps prevent the model from applying its own reasoning.
Good harness engineering is contextual. Who you are, what you are working on, what you are trying to accomplish, what good and bad look like. Identity, taste, standards, goals. The model figures out the how.
Miessler's diagnostic is simple. If your configuration reads like a recipe (step 1, step 2, step 3), you are doing bad harness engineering. If it reads like a briefing document (here is who I am, here is what matters, here are the tools), you are doing good harness engineering. Context about who you are never expires. Prescriptive instructions become obsolete with every model improvement.
The architecture is not complicated. Fat skills, thin harness, ruthless separation of latent and deterministic work. The hard part is discipline: encoding judgment into reusable skills instead of doing one-off work, keeping the harness thin when the temptation is to add features, and trusting the model to figure out the "how" when you give it the right "what" and "why."
At webvise, we build AI-powered systems following these architecture principles. Whether you need an agent workflow, an automated pipeline, or a production-grade AI integration, the architecture matters more than the model.
Webvise practices are aligned with ISO 27001 and ISO 42001 standards.