When AI writes the code, who cleans up after it?

6 min readLanguagesentr
When AI writes the code, who cleans up after it?

I'm about to spend the next few weeks manually going through the ProductLog codebase. Frontend, backend, workers. Not adding features. Not fixing bugs. Reading every file, finding what's wrong, fixing it by hand.

This isn't because I skipped the safety nets. I have detailed agent rules, project conventions, structure documents. I have PR-reviewer agents. I have linters. I review changes carefully. All of it helped — the codebase would have been much worse without it. But the codebase still isn't where I want it to be. The code works, but it's not optimal. Not even close.

Here's what I'm actually finding.

DRY violations everywhere

A component lives in the codebase as a real, reusable thing. It's used in a few places, correctly. Then in another file, the same UI is rewritten inline. Then in a third file, it's written inline again but slightly differently.

There's no single source of truth for half the things that should have one. Which means when I want to change how that piece behaves, I have to change it in three places. If I miss one, the inconsistency stays. The codebase and the bundle both grow for no reason. The maintenance cost is permanent.

God files

Files that should have been three or four separate concerns are living as one 400-, 600-, 800-line file. Multiple responsibilities tangled together. SOLID violations on a scale that makes future changes risky — touching one part of the file means re-reading the whole thing to make sure nothing else breaks. I don't need to explain why this is bad to anyone who has worked in a codebase like this. The cost compounds every time you open the file.

Dead code

Something was tried. It didn't work. A different approach was found. The old approach stayed in the file, sometimes commented out, sometimes just unreferenced. Dead functions, dead components, dead routes, dead imports. Each one adds noise. Each one makes the next person reading the code spend a few seconds figuring out whether it matters. Multiplied across a codebase, those seconds become hours of confusion.

Comments and debug logs everywhere

Ten lines of code with two hundred lines of comments around them. Debug logs that were added during development and never removed. console.log calls that aren't even logging anything useful anymore, placeholders that someone meant to come back to.

Some of these comments are obvious — "// fetch the user" above a line that says fetchUser(). Some are stale, describing code that's been changed since. Some are entire paragraphs of explanation that should have been a single line, or no line at all.

All scopes, all severities

The violations aren't confined to one layer. Some are cosmetic, annoying but harmless. Some hurt maintainability, every change costs more than it should. Some hurt performance, duplicated logic running where one call would have done, or rendering work happening multiple times because the same component is implemented three different ways.

This is what accumulated despite all the safeguards. The rules are detailed. The reviews happened. The model still found a thousand small ways to drift from the intent.

Why review didn't catch it

I want to be honest about this part.

The reviews, both AI-driven and my own — work pull request by pull request. They catch what's wrong inside a single change. What they can't catch is the pattern that only becomes visible after a month of accumulation. A new inline component looks fine in its PR. The fifth one, four weeks later, also looks fine in its PR. Nobody is comparing it against the earlier four, because nobody is looking at the codebase as a whole during a code review.

This isn't unique to AI. The same thing happens with human developers. AI just produces more code, faster, so the accumulation rate is higher and the drift compounds sooner.

By the time the pattern problems are visible, the patterns are everywhere.

I tried letting AI clean it up

The obvious move was to have AI fix what AI wrote. I tried this. It didn't work.

What happened: the model would run for a long time, claim to have made changes, and at the end I'd find that some things were genuinely cleaned up, some things were left untouched, and some things were "cleaned" by being replaced with code that looked different but had the same problems. The work appeared to be happening. The output wasn't reliable.

I think this is structural. The model is good at generating new code that matches a pattern. It's bad at holding a large existing codebase in mind and making consistent decisions about what to keep, what to merge, what to delete. The judgment that cleanup requires is exactly the judgment models are worst at across long contexts.

So now I'm doing it by hand.

What manual cleanup actually means

The plan is three passes, frontend, backend, workers — over the next few weeks. Each pass is on the ProductLog roadmap as its own item.

For each file:

  1. Read it top to bottom

  2. Note the violations (duplicates, god-file structure, dead code, excess comments and logs)

  3. Fix them, one by one

  4. Verify tests still pass

  5. Move to the next file

There's no shortcut here. I'm not going in with a single sweeping refactor. I'm cleaning one file at a time, accepting that some files will take an hour and some will take five.

The goal isn't perfection. It's getting the codebase to a state I'd be willing to defend if another maker opened it. Right now I wouldn't. After this pass, I should.

What I'm not doing

Not throwing it out and starting over. The code works. The tests work. The product is live. A rewrite trades real, debugged code for new bugs I haven't found yet. That's the worst possible trade.

Not blaming AI. I shipped the code. I approved the pull requests. The model wrote what I let it write. The rules I had in place clearly weren't tight enough, and the safeguards weren't catching the cross-file patterns. That's a process failure on my side, not a failure of the tools.

Not promising it won't happen again. It will. The next phase of building ProductLog will involve more AI-written code, and some of that code will accumulate the same problems. The answer isn't to stop using AI. It's to schedule cleanup passes regularly, the same way you'd schedule any other maintenance.

Why I'm writing this now

Because the honest post about this doesn't really exist yet. Every other take I've seen is either a panicked rejection of AI tools or a triumphant claim that AI-assisted development is fine if you just prompt it right. Both are lies. The truth is in the middle: AI is useful, it accelerates a lot of work, and it leaves behind a mess that someone has to clean up. The mess is the part nobody talks about.

I'm starting the cleanup this week. I'll post a follow-up after each of the three passes is done, with what I actually found and what I changed. The post you're reading is the field report from before the work starts. The next ones will be after.

#AIcoding #productlog #refactoring

Comments

No comments yet. Be the first to comment!

Sign in to leave a comment.