Field study

What actually breaks in vibe-coded apps.

49 public threads where founders describe the moment their AI-built app broke. Lovable, Bolt, v0, Replit Agent, Cursor. Every thread checked against the archives, classified, and published so you can read them yourself. The tally is not what you would expect.

Threads: 49 verified, all linked
Where: Reddit, HN, official forums
Method: Archive-API verified, Jul 2026
Output: An 8-point pre-launch check
Published: Jul 4, 2026

The short version

The app falling over is not the main event

Ask a founder what they fear about shipping an AI-built app and they say scale. The database tipping over when real users arrive. That failure is real. It is also the smallest category in the data: of 49 verified threads about what actually went wrong, only 4 are mainly about performance.

What actually gets founders is money. The biggest cluster, 12 threads, is about credit burn and lock-in: fix loops that bill per attempt, and platforms that get expensive exactly when leaving gets hard. Security comes second, with 10 threads. Close behind, with 8 each: the AI breaking its own code and failing to repair it, and agents destroying data outright. The thing you were bracing for is at the bottom of the list.

I did not collect these threads to dunk on the tools. People ship real products with Lovable and Bolt every week. And the builders in these threads mostly are not incompetent. They hit the same traps in the same order because the traps stay invisible until real users arrive. Which means the traps are checkable in advance. That is the point of this page, and of the checklist at the end.

Where this comes from: 51 threads across Reddit, Hacker News, and the official Replit and Vercel forums, each checked against the Reddit archive or the HN API so the titles, dates, and vote counts are exact. 36 are from the last twelve months. 22 are from 2026. Every quote in this piece is from the last six months; the older entries stay in the corpus because they are the incidents this space still names. Two threads are context rather than failure reports, which is why I count 49. The full table is published on this site, linked at the bottom. Any claim here can be checked against it.

The tally

49 failure reports, classified

Primary classification per thread; most threads span two or more categories. Full per-thread table with quotes, dates, and engagement in the published dataset.

12lock-in and credit or token costs: fix loops that bill per attempt, and exits that get harder as the bill grows.

10security: row-level security enabled but not actually scoped, secrets shipped in the client bundle, admin escalation.

8 + 8the AI breaking its own code and being unable to fix it, and agents deleting databases or weeks of work.

4classic performance and scale collapse. The failure founders fear most is the one the data shows least.

The shape

Cost and security lead, not crashes

Across the classified threads, lock-in and credit burn (12) and security holes (10) are the most common ways vibe-coded apps break in production. Crashes barely register. Performance and scale, the failure founders fear most, sits near the bottom at 4.

Pattern one

The bill breaks before the app does

The most common story in the data is a slow spiral, and it goes like this. Something small breaks. You ask the AI to fix it. The fix breaks something else. And every attempt costs credits, because every attempt is metered. The bill grows while the app gets worse.

You can hear it in the threads. A long-time Bolt builder, sixteen days before I published this, listed where his projects stand: '"project too large" errors, AI losing context, inconsistent code, design breaks, infinite fix loops'. He now treats Bolt as scaffolding and exports early. The same month, on Vercel's own community forum, a v0 user described the loop from inside: 'It kept generating tasks, rebuilding files, and consuming credits without actually resolving the underlying problem.'

That is the first jaw of the trap. The second closes when you try to leave. The moment the spiral convinces you to migrate off, you discover what leaving costs. Threads asking how to export, hand the project to a developer, or get a build with no vendor dependency form a steady stream. The stated reason repeats: 'Your credit costs are skyrocketing, and you want to develop using cheaper or better dev tools.'

The experienced builders converge on a simple defense: a stop-loss rule. After two or three failed AI fix attempts, stop prompting. Export the code, reproduce the bug locally, fix it by hand or with a developer. The rule works because of how the pricing is shaped. Per-attempt billing makes persistence most expensive exactly when the AI is least likely to succeed.

Pattern two

Security that looks finished

The security threads share one property: the app looked done. Polished screens, working flows, paying users. Then someone looked underneath. An agency owner, on a client's finished tutoring app: 'we were able to change out profile permissions into an admin in less than 30 minutes.' An auditor, on a financial SaaS: 'The Supabase service role key was loaded in the public JavaScript bundle.' A systematic scan: 'We scanned 20,000 indie apps; 1 in 9 leaked their database keys.'

Most of these trace back to one mechanism. Your database has a feature that decides which rows each user is allowed to read; that feature is called row-level security. The AI switches it on as a flag but never writes the per-user rules. So any signed-in user can read any row. And it passes the demo every time, because the demo has one user. This is the same mechanism behind the May 2025 Lovable CVE that exposed users across 170 applications. It keeps happening: this January, a reviewer opened a 'finished' MVP from the browser console and reported 'I could see every user's email, payment status, and home address. No auth required.'

Here is what makes this dangerous. AI-generated code reads clean. It looks like senior output while carrying junior mistakes, and that defeats the only heuristic a non-technical founder has: does this look professional. So use a mechanical check instead. Log in as user A and try to read user B's data. Then search the JavaScript your site actually serves for keys. Both take minutes. Neither requires reading code.

Pattern three

The agent is the outage

The sharpest failures in the dataset were not caused by load or attackers. They were caused by the builder's own agent. The named incident is from July 2025. An agent deleted a company's production database during an explicit code freeze, then misreported what it had done. It spent days on the Hacker News front page. That could have been a one-off. It was not. This February, on Replit's own forum: 'My agent deleted a customers database twice without prompting.' The same month, a Manus user watched an agent roll a site back to its beginning on its own, wiping weeks of work past the restore window.

There is a slower version of the same failure. The agent does not delete your work; it degrades it, feature by feature. One Claude Code builder, also in February: 'Claude would build perfect auth. Then while adding payments, it would rewrite the auth code and everything broke.' A Cursor builder who shipped a full production app put the discipline bluntly: 'You HAVE to scope your prompts tightly or it will rewrite your codebase to fix a typo.' Now multiply that by every feature you add after launch. That is how months of work degrade release by release, until the founder is hiring a developer to rebuild.

The two defenses are boring, and they are absolute. One: real version control, plus backups of code and database that live off the platform, taken before any agent session. Every time. Two: hard separation, meaning the agent is physically unable to reach production data. Not instructed to leave it alone. Unable. The engineers in the incident threads repeat it like a chorus: an instruction is not a boundary. A missing credential is.

The check

Eight things to check before real users

This is what 49 threads compress into. Each item traces to specific threads in the dataset. 1: Row-level security actually scoped per row, on every table, verified by trying to read another user's data. 2: No secrets in the client bundle. Grep the deployed JavaScript, the bundle that shipped. 3: Database behavior under load: indexes, N+1 queries, rate limiting, tested at realistic row counts and concurrency. 4: Real version control and off-platform backups of code and database, before any agent touches anything.

5: Hard dev/prod separation. The agent cannot hold production credentials. 6: An exit plan from the platform, GitHub export and your own database, agreed while leaving is still cheap. 7: A stop-loss rule for AI debugging: two or three failed attempts means stop prompting and fix by hand. 8: The deploy pipeline verified end to end. Custom domain, production env vars, and proof of which commit is actually being served.

None of this needs a rewrite. None of it means abandoning the tool that got you here. It is an afternoon of verification standing between a working demo and the failure modes 49 other founders documented in public.

1
Scope row-level security per row
on every table, verified by actually trying to read another user's data
2
Keep secrets out of the client bundle
grep the deployed JavaScript, the bundle that actually shipped
3
Test database behavior under load
indexes, N+1 queries, rate limiting, at realistic row counts and concurrency
4
Take real version control and off-platform backups
of code and database, before any agent touches anything
5
Enforce hard dev/prod separation
the agent cannot hold production credentials
6
Agree an exit plan from the platform
GitHub export and your own database, while leaving is still cheap
7
Set a stop-loss rule for AI debugging
two or three failed attempts means stop prompting and fix by hand
8
Verify the deploy pipeline end to end
custom domain, production env vars, and proof of which commit is served

The eight checks that separate a working demo from a system that real users cannot take down. Each traces to specific threads in the dataset, and together they are an afternoon of verification, cheaper than the first incident.

What this data cannot say

Read the tally with survivorship in mind

People post when things break. The thousands of builders whose AI-built apps run quietly do not write threads about it. So this dataset measures the shape of failure: which ways it breaks, in what order. It stays silent on how often failure happens. Nothing here says vibe coding fails. It says that when it fails, it fails in these six ways, in roughly this order, and every one of them is checkable in advance. That is better news than it sounds: unpredictable failures cannot be prevented, and these can.

The bleakest entry in the dataset is a founder with 1,000 pounds of week-one revenue, 30 paying users, 16 edge functions, and no way to audit any of it. Nothing in that app is known to be broken. That is the situation this checklist is for, and if you are past it and something already broke, that gap between a working demo and a system that holds is the work I do. Describe what is happening and I will tell you where I would start. A first read never costs you anything.

The full dataset: 51 threads, classified, with quotes and dates For founders whose build outgrew the builder Tell me what's breaking

Sources

All 49 failure threads, their categories, exact quotes, dates, and engagement countsthe published dataset (this site)accessed 2026-07-04
The Bolt reliability list (June 2026): project-too-large errors, context loss, infinite fix loopsr/boltnewbuildersaccessed 2026-07-04
The v0 credit-consuming loop (June 2026)Vercel communityaccessed 2026-07-04
The credit-costs-skyrocketing migration motivation (March 2026)r/lovableaccessed 2026-07-04
The no-auth-required console read of a finished MVP (January 2026)r/vibecodingaccessed 2026-07-04
The scope-your-prompts postmortem from a shipped Cursor app (March 2026)r/cursoraccessed 2026-07-04
The 30-minute admin escalation on a client's finished appr/lovableaccessed 2026-07-04
Service-role key found in a financial SaaS's public JavaScript bundler/vibecodingaccessed 2026-07-04
The 20,000-app scan and the 1-in-9 leaked-keys figurer/Supabaseaccessed 2026-07-04
The CVE-2025-48757 row-level-security exposure across 170+ Lovable appsmattpalmer.ioaccessed 2026-07-04
The production-database deletion incident and its engineering discussionHacker Newsaccessed 2026-07-04
An agent deleting a paying customer's database twiceReplit forumaccessed 2026-07-04
The auth-rewritten-while-adding-payments failure moder/ClaudeAIaccessed 2026-07-04