Vibe Coding Gets You to a Working Demo, But Engineering Is What Makes It Hold Up

Jared Lander
May 19
7 min read

AI coding agents are becoming useful interns, but teams still need judgment, taste, infrastructure and review before the code is something they can actually run.

A working app is not the same thing as a production system. Everyone knows that, at least until the app looks good enough to make them forget.

You describe what you want. The agent builds the scaffolding, wires up a database, creates an interface that is good enough for the demo, adds a few buttons and writes some API calls. Suddenly there is something on the screen that feels like software. Sometimes it even looks pretty good.

That is useful. It is also usually when the harder questions start.

Andrej Karpathy recently described today’s AI coding agents as “intern entities,” where humans still need to be in charge of the aesthetics, judgment, taste and oversight. I think that is a good way to put it. They are fast, helpful and capable of getting a surprising amount done. They also need direction, review and the occasional “why did you do it that way?” conversation.

We have seen this pattern in practice. An agent can generate a working application that looks impressive in a demo. Then you look closer and the authentication is barely there, the permissions model is basically vibes, logging is thin, deployment is unclear and nobody knows who owns the prototype after the meeting.

Sometimes the app has a login screen, but the permission check only runs in the frontend instead of being enforced by the backend API. The page looks protected, the endpoint is not. That kind of problem does not always show up in a five-minute walkthrough, but it matters pretty quickly once real data is involved.

I do not think that makes the tool bad. It means we should treat it like a very fast intern, not an experienced engineering team.

A rough draft is still a useful starting point

One of the better ways to think about generative AI is that it gets people from the blank page to a rough draft. That applies to writing, presentations and now software. The rough draft may be wrong in places; it may be awkward; it may need a lot of review. Still, having something concrete to react to is much better than spending three weeks in meetings describing the idea everyone thinks they agree on.

That matters in a few different roles. A product manager can prototype a workflow. An analyst can automate a reporting task that has been annoying the team for months. A founder can test an idea before hiring a full engineering team. And an engineer can generate boilerplate, explore an unfamiliar library or get a first pass at a feature without spending the morning on setup.

This is the part of vibe coding I find genuinely useful. It gives more people a way to express intent through software. You do not need to start with a blank editor and years of syntax memorized just to see whether an idea has shape.

But enterprises do not run on rough drafts for very long.

The second week is where this gets interesting

The first week of AI-assisted development is often fun. The second week is usually when the prototype starts touching the rest of the organization. Someone wants to connect real data. Someone asks whether the app can be shared with another team. Someone wants it deployed somewhere other than the laptop where it was built.

That is when the questions get more specific: who can actually access the application? Where do the data live? What happens when the input is wrong? Who reviews the generated code? Who owns it when it breaks?

This is where vibe coding starts to run out of road. The prototype can still be useful, but the standard changes once customers, compliance or internal operations are involved. A tool that was fine for a demo may need audit trails, CI/CD, observability, error handling, documentation, data governance and a maintenance plan.

It may also need to fit into the systems a company already has. That part is easy to underestimate. The app needs to respect API boundaries, data contracts, security reviews and the way teams actually deploy software. It has to survive changes six months from now, when the person who prompted the first version has moved on to something else.

None of that is glamorous. It is the stuff people only notice when it is missing.

The agent is not the whole process

I think the useful distinction is between vibe coding and agentic engineering.

Vibe coding is useful because it gets you to a first version quickly. You can describe the workflow, generate the app, adjust the interface and keep moving. Tools like Claude Code, Cursor and others are very good at this kind of loop. The problem starts when we confuse that loop with the whole engineering process.

The agent can write a lot of code. What matters after that is the review, the tests, the deployment path and the people who know what they are willing to own. A generated answer can be close and still not be something you want running against real data.

The practical pieces are not fancy: repo structure, coding standards, test generation, deployment scaffolding, internal agent guidance, permission patterns, secure data access and monitoring. It also means documentation that an agent can read as well as a person.

And then there is taste, which sounds soft but is not. Taste is knowing when an abstraction will survive. It is knowing when code is too clever, when a shortcut is about to become technical debt and when the happy path is hiding the real failure mode. The better the coding agent gets, the more this matters.

Cheap code makes review matter more

As code generation gets cheaper, judgment becomes more valuable. Experienced engineers matter because they know where systems fail. They know how data leaks happen, why permissions models are hard, why logs are useless unless they are designed around real incidents and why “it works locally” is not a deployment strategy.

They also know how to review AI-generated work, which is becoming its own skill. My friend Wes McKinney has been building in this direction with Roborev, which runs background reviews on agent-generated commits while the context is still fresh. I like that framing because the review has to stay close to the work. With agent-generated code, you are often checking assumptions. Did it create a hidden dependency? Did it duplicate logic that already exists somewhere else? Did it hardcode something that should be configuration? Did it quietly change a data contract? Did it handle the easy case and ignore the edge cases?

We have seen agents solve the same problem twice in two different parts of a codebase. Both versions work, at least for a while. Then a field name changes, one path gets updated, the other does not and suddenly the team is debugging what looks like a data issue. It is not the data. It is two slightly different versions of the same business logic that drifted apart.

That is the kind of problem a demo will not catch. The app still loads. The happy path still works. But now the team has to figure out which version is right, where else it was copied and whether the same pattern is hiding in three other places.

These are normal software problems. AI just gets us to them faster and sometimes in larger batches.

And this is not only an engineering point. Product leaders, designers, analysts, operators and domain experts also gain leverage when they know what good looks like. AI makes it easier to create a first version, but someone still has to define the optimal path, evaluate the output and understand the consequences of a bad answer.

A lot of the value is in the scaffolding

At Lander Analytics, this is where we see a lot of the useful work. Many organizations are already experimenting with AI coding tools. The more durable value comes from helping them build the scaffolding around those tools, so the work is easier to review, deploy and maintain.

That can mean internal agent guidelines, so the tool has some understanding of the organization’s architecture. It can also mean repo-level context files, reusable templates, testing patterns or deployment workflows. Sometimes the work is more basic: deciding which projects are safe to prototype quickly and which ones need a much tighter review loop.

In one project, the right answer may be to let people move quickly. In another, it may be to slow the agent down and put more review around the work. Both can be true in the same company.

A small internal dashboard used by three people does not need the same process as a customer-facing application touching sensitive data. A throwaway prototype does not need the same governance as a system that will be maintained for years. The point is not to make every AI-assisted project heavy. The point is to know which ones cannot afford to be light.

Vibe coding gives more people a way to turn ideas into working software. That is real progress. I do not want to undersell that. But once the idea leaves the demo, it has to deal with infrastructure, security reviews, real users and the ordinary messiness of production.

The demo gets you to the conversation. After that, the system has to hold up. At Lander Analytics, we help teams move quickly with AI without skipping the engineering work that makes software reliable.

Jared P. Lander

Founder and Chief Data Scientist

Lander Analytics

Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.

Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out at info@landeranalytics.com.

About the author: Jared P. Lander is Chief Data Scientist and founder of Lander Analytics, where he helps organizations build practical, measurable AI workflows grounded in strong data foundations.