Most Business Knowledge Bases Do Not Need RAG
We run our internal wiki on five shell commands and a hand-maintained index file, no vector database. For a 200-document knowledge base, that setup is cheaper, faster to build, and more accurate than a RAG pipeline. Here is why we skipped RAG and when you actually need it.
Topics
We run our internal company wiki on five shell commands and a hand-maintained index file. No embeddings, no vector database, no re-indexing job. For a knowledge base of around 200 documents, that setup is faster to build, cheaper to run, and more accurate than a typical RAG pipeline. The trade-off only becomes worth it somewhere past a thousand documents, not before.
A Brief Note on Karpathy and Obsidian
Two things made this approach obvious to us. The first is Andrej Karpathy's consistent argument that AI agents should be given tools rather than fed retrieved chunks. His AutoResearch project, released in March 2026, makes the case literally: the agent runs code rather than querying embeddings, and progress comes from execution. We covered AutoResearch in detail in an earlier post.
The second is Obsidian. An Obsidian vault is just a folder of plain markdown files. There is no proprietary database, no schema to migrate, no SDK to learn. Combined with Obsidian's local REST API plugin, the entire knowledge base sits behind a normal HTTP endpoint that any process can read or write. That combination makes the "give the agent tools" pattern trivial to implement: a handful of shell commands and you have an LLM that can read, write, and search a structured knowledge base directly.
What We Actually Run
Our internal wiki today is 22 pages of structured knowledge: entities (people, companies, projects), concepts (frameworks and principles), sources (raw research notes), and synthesis pages that connect them. Every page links to other pages with Obsidian wikilinks, and a hand-maintained `index.md` at the root lists everything by category with one-line descriptions.
The agent does not search the vault with embeddings. It runs five commands:
- `wiki read <path>`. Fetch a single markdown page.
- `wiki write <path> -`. Create or overwrite a page from stdin.
- `wiki append <path> <text>`. Add a line to a page (used for the running activity log).
- `wiki search <query>`. Hit Obsidian's built-in full-text search.
- `wiki list <dir>`. List files in a directory.
The whole implementation is roughly 80 lines of bash and curl. There is no vector database, no embedding model, no chunking strategy, no reranker, no nightly index job. Adding a new note means writing the file. The agent picks it up on the next query without any pipeline step in between.
The Index File Is the Retrieval System
This is the part that took us a while to admit. When the agent gets a question, it does not start by retrieving anything. It starts by reading `wiki/index.md`, which is a curated table of contents written by a human (or by a maintainer agent run on a schedule). The index lists every page with a one-sentence description grouped by category. From that single ~400 token read, the agent already knows which one or two pages are relevant.
The next step is one or two targeted reads to pull the relevant pages in full. Each page is between 200 and 800 tokens. Most queries finish with two or three reads and roughly a thousand tokens of vault content in the context window. That is less than what a default RAG configuration injects, and the content is coherent (whole pages) rather than chopped (chunks ripped out of their surrounding context).
A maintained index does the work that a vector database does in a RAG pipeline: it maps a query to the right documents. The difference is that a human curated the mapping once, instead of an embedding model approximating it on every query.
The Token and Cost Comparison
For a small business knowledge base of around 200 documents, here is what a default RAG setup costs versus the index-led file access pattern. The token figures are based on what we observe in our own vault. The infrastructure figures are pulled from public pricing for the most common managed services.
| Item | RAG Pipeline | Index + Tools |
|---|---|---|
| Tokens injected per query | ~3,000 (5 to 10 chunks) | ~1,000 (1 index read + 1 to 2 pages) |
| Vector database (monthly) | $25 to $80 (Pinecone, Weaviate, Qdrant Cloud) | $0 |
| Embedding API (initial + updates) | $10 to $40 | $0 |
| Re-indexing on document change | Required, batch job | None, instant |
| Setup time | Days (chunking, retrieval, evaluation) | Hours (write a small CLI wrapper) |
| Answer accuracy on small corpora | Variable, sensitive to chunk boundaries | High, whole pages preserved |
Per-query token savings of roughly 30 to 60 percent are real, but they are not the headline. The headline is everything in the second column that disappears. No vector database line item on the monthly invoice. No embedding model to maintain. No "we changed our chunking strategy and the answers shifted" debugging session. For a knowledge base that fits comfortably in a single person's head, every one of those moving parts is overhead without a corresponding benefit.
What You Stop Having to Think About
The cleanest way to argue for this pattern is to list the decisions that disappear:
- Chunking strategy. No "should we chunk by paragraph, by sentence, by token count?" debate. The page is the unit.
- Embedding model selection. No research project comparing text-embedding-3-small against fine-tuned alternatives.
- Vector database operations. No managed service to monitor, upgrade, or budget for.
- Re-indexing pipelines. No nightly batch job, no "the index is stale" Slack messages.
- Retrieval evaluation harness. No precision-and-recall test suite running alongside the knowledge base.
- Hybrid search tuning. No BM25-plus-vector-plus-rerank pipeline to keep in balance.
That is roughly the entire RAG operations playbook, removed. What replaces it is one shell script and the discipline to keep an index file accurate. The discipline is real, but it is the same discipline that makes a wiki valuable to humans in the first place.
When RAG Is Still the Right Call
This pattern has clear limits. A maintained index breaks down somewhere around a thousand documents, when a human can no longer hold the structure in their head and the index file becomes too long for the agent to scan efficiently on every query. Past that scale, embeddings and a real retrieval layer earn their keep.
Other cases where RAG remains the right tool:
- Multimodal corpora. PDFs with tables, scanned documents, audio transcripts, image-heavy reports. A markdown vault assumes everything reduces to text.
- High-frequency updates at scale. If you are indexing thousands of public documents that change every minute and need them queryable immediately.
- Strict metadata filtering at retrieval time. When queries need structured filters (date ranges, author, document type) baked into the retrieval step, a vector database with metadata is the cleaner fit.
- Untrusted or adversarial content. When the corpus comes from many writers with conflicting agendas and no single human can be trusted to maintain a curated index.
For most internal business knowledge bases (company wikis, product documentation, sales playbooks, onboarding guides, internal standard operating procedures) none of those conditions hold. The corpus is small, the writers are few, the structure is stable, and the people who maintain the documentation are the people who care most about it being correct. RAG is the wrong default.
What This Means for Most Businesses
If you are a small or mid-sized business looking at your existing documentation and wondering how to make it queryable by AI, the honest answer is usually that you do not need a vector database. You need an index file, a short script that reads and writes your documents, and an LLM with tool access. The components are all off-the-shelf. The work is in keeping the index honest.
The companies selling RAG-as-a-service are not wrong about the technology. They are wrong about the default. RAG is the right tool for problems at a scale most businesses do not have, on content types most businesses do not store. Reaching for it first is how internal AI projects end up with a six-month roadmap and a recurring infrastructure bill before they answer their first real question.
At webvise, we build internal AI tools on this kind of pragmatic foundation: structured knowledge, simple tools, agents that read and write directly. If you are looking at a RAG project for your team's documentation and want a second opinion on whether the complexity is justified, get in touch and we can talk through your actual corpus before you commit to the infrastructure.