Why AI-Generated Software Still Needs Engineering Review

Andrej Karpathy coined the term vibe coding in February 2025 to describe a mode of software development where you describe what you want, accept whatever the AI outputs, and do not read the code. His framing was generous, a weekend hobby mode for personal projects. What followed was not. By mid-2025, a wave of non-engineers had shipped production SaaS apps built entirely in Cursor, Replit Agent, v0, and bolt.new, without ever understanding what they had built. The apps looked good in demos. Some are breaking in production.

What Vibe Coding Actually Is

Karpathy's original description is precise: you are "in the zone," you tell the AI what you want, it outputs code, you mostly hit accept, and you don't fully understand what is running. He acknowledged this explicitly: "I don't read the code, I just vibe with it." For a personal tool or throwaway prototype, this is fine. The vibe coder is not pretending to be an engineer. The problem is that the tooling ecosystem, Replit Agent's "ship your startup in a weekend," v0's one-click deploys, bolt.new's instant full-stack generation, has packaged this mode as a legitimate path to production software.

The resulting technical debt is qualitatively different from ordinary bad code.

Why Vibe Code Is Worse Than Bad Hand-Written Code

When a junior developer writes bad code, they understand what they intended. You can sit with them, trace the logic, and fix it. When an AI generates bad code that the operator never read, there is no mental model to recover. The developer cannot explain why the authentication is structured the way it is, because they never read the authentication. They cannot tell you which third-party library handles payments, because they accepted the file without opening it. The code is a black box they own but cannot reason about.

The failure patterns I see consistently across AI-generated production apps:

Auth bypasses baked into a scaffold; JWT secrets present in environment-variable examples committed to public repositories: these are common findings in code reviews of AI-assisted projects without engineering oversight. AI-generated auth code frequently copies patterns from training data without understanding the security model. Row-level security disabled "temporarily" during development, left in production. Role checks that compare string literals and break the moment a field is renamed.
No error handling past the happy path. The AI wrote the success case. What happens when the payment provider returns a 402? What happens when the database connection drops mid-transaction? In vibe-coded apps, the answer is usually an unhandled promise rejection that surfaces as a blank screen.
Vendor lock-in to AI-generated patterns. When the AI chose to structure the data model a particular way, the vibe coder accepted it. Now the entire app is built around that structure. Migrating away from it requires understanding code the developer never read.
No tests. The tests are missing because the vibe coder never asked for them and the AI did not volunteer them. When something breaks in production, there is no test suite to catch regressions in the fix.

The Demo-to-Production Gap

AI tools are genuinely good at generating code that works against a happy path with clean inputs, a cooperative network, and a single concurrent user. That is exactly the condition under which a demo runs. Production is the opposite: malformed inputs, dropped connections, concurrent writes, edge cases that were never specified in the prompt.

The pattern plays out predictably. A vibe-coded app launches, looks polished, gets early users. Then: a user with a non-ASCII character in their name breaks the database query. A mobile user on a slow connection triggers a race condition in the state management. I have seen scenarios where API endpoints leak data across user accounts because authorization checks were missing or incomplete, a consequence of shipping code that was never reviewed for server-side enforcement. None of these are exotic failures. They are the basic consequence of shipping code you never read.

AI Makes Good Engineers Better: It Does Not Make Bad Engineers Unnecessary

This is the claim the vibe coding narrative inverts. The tools are real and the productivity gains are real. At webvise, I use Claude Code, Cursor, and multi-agent orchestration on every project I deliver. Engineers using Claude Code report meaningful time savings on tasks that would otherwise take days. The same tools in the hands of someone without engineering fundamentals produce a demo that cannot survive its first real user.

The engineer brings the difference to the tool. Engineering fundamentals are about understanding system boundaries, failure modes, security models, and data integrity. An engineer using Claude Code reads the generated authentication code and recognizes when it is wrong. An inexperienced developer accepts the suggestion and ships it.

Capability	Engineer + AI tools	Vibe coder + AI tools
Prototype speed	Fast	Fast
Reads generated code	Yes: spots errors, security issues	No: accepts and ships
Handles edge cases	Proactively specifies them in prompts	Discovers them in production
Security review	Built into the review loop	Not present
Can debug production failures	Yes: understands the codebase	No: black box they own
Scales beyond demo	Yes	Rarely

The Specific Risk for Business Software

Consumer hobby apps can absorb vibe coding's failure modes. If a personal finance tracker loses some data, it is annoying. If a B2B SaaS handling customer records, payment flows, or internal workflows ships with the auth and error handling problems described above, the consequences are legal, contractual, and reputational. GDPR liability applies regardless of how the code was generated; data-handling code requires review.

Several recent AI-assisted SaaS products have followed a similar pattern: impressive in a demo, acquired early customers on the promise, then hit a wall when the first enterprise prospect ran a security review or the first high-traffic day exposed the missing error handling. The founders are not fraudsters, they genuinely did not know what they had built.

What to Look for in an AI-Augmented Development Partner

If you are evaluating a development partner that claims to use AI tools, ask these questions:

Do they run automated tests on AI-generated code? If the answer is "we trust the AI output," walk away. Test coverage is how you catch the error handling the AI omitted.
Do they perform security reviews on generated authentication and authorization code? AI tools copy auth patterns from training data. Those patterns include real vulnerabilities from real codebases.
Can they explain the architecture of what they built? If a developer cannot walk you through the data model and explain why it is structured the way it is, they did not architect it. They accepted it.
Do they version-control their prompts alongside the code? Engineering discipline applied to AI tools means treating the prompt as part of the codebase, not a throwaway input.
Do they have a process for handling AI hallucinations? AI tools confidently generate incorrect API calls, deprecated methods, and nonexistent library functions. An experienced team has a review loop for this. A vibe coder discovers it at runtime.

The Right Frame: AI as Force Multiplier, Not Replacement

The vibe coding narrative is seductive because it is partially true. AI tools have genuinely lowered the floor for building software. A motivated non-engineer can ship a working prototype in a weekend. That is valuable for validation, for MVPs, for internal tooling with low stakes. The mistake is treating the floor as the ceiling, assuming that because you can get something running, you can get it running reliably at scale, securely, and maintainably.

The engineers who have benefited most from AI coding tools are the ones who use them to eliminate the tedious parts of engineering, boilerplate, scaffolding, repetitive refactors, while applying their judgment to the parts that matter: architecture, security, error handling, and production readiness. The AI accelerates the work. The engineer ensures it is correct.

webvise uses AI-augmented development on every project, Claude Code, Cursor, multi-agent pipelines, but with the engineering discipline that makes the output production-ready. If you are building software that needs to survive real users, real edge cases, and real security requirements, get in touch to see how the process works.

Development practices are aligned with ISO 27001 and ISO 42001 standards.