April 7, 2026 · 8 min read

Most Business Knowledge Bases Do Not Need RAG

I run my internal wiki on five shell commands and a hand-maintained index file, no vector database. For a 200-document knowledge base, that setup is cheaper, faster to build, and more accurate than a RAG pipeline. Here is why I skipped RAG and when you actually need it.

AI AgentsAIAutomationBusiness Strategy

Share

I run my internal company wiki on five shell commands and a hand-maintained index file. No embeddings, no vector database, no re-indexing job. For a knowledge base of around 200 documents, that setup is faster to build, cheaper to run, and more accurate than a typical RAG pipeline. The trade-off only becomes worth it somewhere past a thousand documents, not before.

A Brief Note on Karpathy and Obsidian

Two things made this approach obvious. The first is Andrej Karpathy's consistent argument that AI agents should be given tools rather than fed retrieved chunks. His AutoResearch project, released in March 2026, makes the case literally: the agent runs code rather than querying embeddings, and progress comes from execution. I covered AutoResearch in detail in an earlier post.

The second is Obsidian. An Obsidian vault is just a folder of plain markdown files. There is no proprietary database, no schema to migrate, no SDK to learn. Combined with Obsidian's local REST API plugin, the entire knowledge base sits behind a normal HTTP endpoint that any process can read or write. That combination makes the "give the agent tools" pattern trivial to implement: a handful of shell commands and you have an LLM that can read, write, and search a structured knowledge base directly.

What I Actually Run

My internal wiki today is 22 pages of structured knowledge: entities (people, companies, projects), concepts (frameworks and principles), sources (raw research notes), and synthesis pages that connect them. Every page links to other pages with Obsidian wikilinks, and a hand-maintained `index.md` at the root lists everything by category with one-line descriptions.

The agent does not search the vault with embeddings. It runs five commands:

`wiki read <path>`. Fetch a single markdown page.
`wiki write <path> -`. Create or overwrite a page from stdin.
`wiki append <path> <text>`. Add a line to a page (used for the running activity log).
`wiki search <query>`. Hit Obsidian's built-in full-text search.
`wiki list <dir>`. List files in a directory.

The whole implementation is roughly 80 lines of bash and curl. There is no vector database, no embedding model, no chunking strategy, no reranker, no nightly index job. Adding a new note means writing the file. The agent picks it up on the next query without any pipeline step in between.

The Index File Is the Retrieval System

The uncomfortable bit: the agent starts with `wiki/index.md`. That curated table of contents lists every page with a one-sentence description grouped by category. From that single ~400 token read, the agent already knows which one or two pages are relevant.

The next step is one or two targeted reads to pull the relevant pages in full. Each page is between 200 and 800 tokens. Most queries finish with two or three reads and roughly a thousand tokens of vault content in the context window. That is less than what a default RAG configuration injects, and the content is coherent (whole pages) rather than chopped (chunks ripped out of their surrounding context).

A maintained index does the work that a vector database does in a RAG pipeline: it maps a query to the right documents. The difference is that a human curated the mapping once, instead of an embedding model approximating it on every query.

The Token and Cost Comparison

For a small business knowledge base of around 200 documents, here is what a default RAG setup costs versus the index-led file access pattern. The token figures are directional estimates based on typical retrieval patterns. The infrastructure figures are pulled from public pricing for the most common managed services.

Item	RAG Pipeline	Index + Tools
Tokens injected per query	~3,000 (5 to 10 chunks)	~1,000 (1 index read + 1 to 2 pages)
Vector database (monthly)	$25 to $80 (Pinecone, Weaviate, Qdrant Cloud)	$0
Embedding API (initial + updates)	$10 to $40	$0
Re-indexing on document change	Required, batch job	None, instant
Setup time	Days (chunking, retrieval, evaluation)	Hours (write a small CLI wrapper)
Answer accuracy on small corpora	Variable, sensitive to chunk boundaries	High, whole pages preserved

Token savings depend on corpus structure and query patterns. The larger point is everything in the second column that disappears: no vector database line item on the monthly invoice, no embedding model to maintain, and no chunking-strategy debugging session when answers shift. For a knowledge base that fits comfortably in a single person's head, every one of those moving parts is overhead without a corresponding benefit.

What You Stop Having to Think About

The cleanest way to argue for this pattern is to list the decisions that disappear:

Chunking strategy. No "should we chunk by paragraph, by sentence, by token count?" debate. The page is the unit.
Embedding model selection. No research project comparing text-embedding-3-small against fine-tuned alternatives.
Vector database operations. No managed service to monitor, upgrade, or budget for.
Re-indexing pipelines. No nightly batch job, no "the index is stale" Slack messages.
Retrieval evaluation harness. No precision-and-recall test suite running alongside the knowledge base.
Hybrid search tuning. No BM25-plus-vector-plus-rerank pipeline to keep in balance.

That is roughly the entire RAG operations playbook, removed. What replaces it is one shell script and the discipline to keep an index file accurate. The discipline is real, but it is the same discipline that makes a wiki valuable to humans in the first place.

When RAG Is Still the Right Call

This pattern has clear limits. A maintained index breaks down somewhere around a thousand documents, when a human can no longer hold the structure in their head and the index file becomes too long for the agent to scan efficiently on every query. Past that scale, embeddings and a real retrieval layer earn their keep.

Other cases where RAG remains the right tool:

Multimodal corpora. PDFs with tables, scanned documents, audio transcripts, image-heavy reports. A markdown vault assumes everything reduces to text.
High-frequency updates at scale. If you are indexing thousands of public documents that change every minute and need them queryable immediately.
Strict metadata filtering at retrieval time. When queries need structured filters (date ranges, author, document type) baked into the retrieval step, a vector database with metadata is the cleaner fit.
Untrusted or adversarial content. When the corpus comes from many writers with conflicting agendas and no single human can be trusted to maintain a curated index.

For most internal business knowledge bases (company wikis, product documentation, sales playbooks, onboarding guides, internal standard operating procedures) none of those conditions hold. The corpus is small, the writers are few, the structure is stable, and the people who maintain the documentation are the people who care most about it being correct. RAG is often the wrong default for the use cases I see.

Fit for Most Businesses

If you are a small or mid-sized business looking at your existing documentation and wondering how to make it queryable by AI, the honest answer is usually that you do not need a vector database. You need an index file, a short script that reads and writes your documents, and an LLM with tool access. The components are all off-the-shelf. The work is in keeping the index honest.

RAG-as-a-service vendors are not wrong about the technology; the disagreement is about when it should be the default architecture. RAG is the right tool for problems at a scale most businesses do not have, on content types most businesses do not store. Reaching for RAG first sometimes leads to long roadmaps and recurring infrastructure costs before delivering on the original use case.

webvise builds internal AI tools on this kind of pragmatic foundation: structured knowledge, simple tools, agents that read and write directly. If you are looking at a RAG project for your team's documentation and want a second opinion on whether the complexity is justified, get in touch to talk through your actual corpus before you commit to the infrastructure.

Development practices are aligned with ISO 27001 and ISO 42001 standards.

Claude doesn't count messages. It counts tokens. These 10 practical habits can significantly reduce token spend and keep you productive all day without hitting rate limits.

On April 5, 2026, Google DeepMind published the largest empirical study of AI agent manipulation ever conducted. 502 participants, 8 countries, 23 attack types, every defense currently on offer rated insufficient. Here is the engineering position Webvise took the next morning.