Who Owns AI-Generated Code When It Breaks?

A clear guide to why developers still own the output and how to review AI-written code without fooling yourself.

May 16, 2026

If a service falls over at three in the morning because somebody merged an AI suggestion into a hot code path, the pager does not go to the model vendor. It goes to your team.

I do not mean that as moral sense. I mean it in the boring operational sense that shows up when customers are unhappy and security starts asking questions. The team that ships the system still owns the system. A model may change how the bug got there. It does not change whose name ends up in the incident report.

The vendors tell you this themselves, just in more legalse language. GitHub’s responsible-use docs say users assume the risks associated with generated code, including security vulnerabilities, bugs, and IP infringement, and that suggestions need review and validation before acceptance. The FTC has been blunter: there is “no AI exemption from the laws on the books.” That is close enough to the whole baseline.

You can use AI to write code. You cannot use AI to outsource responsibility.

This matters because AI changes the economics of production faster than it changes the habits that keep production safe. We are generating more code, more wrappers, more tests, more YAML, more migration scaffolding, and more “good enough to merge” pull requests than most teams can really absorb. The output looks competent. That is exactly why this gets expensive.

The legal baseline is less mysterious than people pretend

A lot of teams still talk about AI coding tools as if the hard question is some future debate about authorship. That question is interesting. It is not the urgent one. The urgent one is simpler: if your company ships bad behavior, leaks data, violates a license, or opens a security hole, your company is still first in line when the bill arrives.

I am being careful here on purpose. I am not saying tool providers never carry risk. Contracts, product claims, sector rules, and the courts still matter, and that story will keep changing. I am saying the practical starting point is not completely in the dust. Your review process, your audit trail, your security team, and your regulator are not going to accept “the assistant suggested it” as an answer with real legal weight.

That is also why vendor disclaimers matter more than people admit. They are not decorative text. They are the product telling you where the risk really is. If the documentation says generated output may be wrong, insecure, or infringing, and that a human must review it, the meaning is simple: the tool helps write the code. It does not take over the risk.

For senior engineers, this is very clear. The professional obligation is still the same one we already had with human-written code. If we merge it, we own it. AI just makes that obligation easier to forget in conversation and harder to work with in reality.

The real problem is plausible code at industrial scale

The risk is not that AI spits out obviously ridiculous Java. Most of the time it does not. The risk is that it produces code that looks normal enough to pass a rushed review.

In a Java shop, that is the dangerous middle. A Quarkus REST endpoint that seems fine until the authorization path gets weird. A Hibernate query wrapper that almost uses the safe pattern but leaves one string concatenation in the wrong place. A transaction boundary that behaves in local tests and then turns brittle under load. A JWT flow that is tidy, readable, and just wrong enough to matter.

We have data now, and the data is not looking good. Hammond Pearce and coauthors found that roughly 40% of GitHub Copilot’s generated programs in their security study were vulnerable. Veracode’s 2025 GenAI Code Security Report found that 45% of tested AI-generated code samples failed security checks, with Java standing out as the riskiest language in that dataset at 72%. If you write enterprise Java for a living, that last number is concerning.

The second half of the problem is verification. Sonar’s 2026 developer survey found that 96% of developers do not fully trust AI-generated code, yet only 48% say they always verify it before committing. That gap is the whole story in one number pair. We know the output isn’t great. We ship it faster anyway.

This is why “plausible” is more dangerous than “wrong.” Obviously broken code triggers scrutiny. Plausible code triggers velocity. AI is very good at plausible. That is its charm and its tax.

There is a quieter problem behind the security discussion too: keeping a clear paper trail. GitHub’s docs do not just warn about bugs. They also make it clear that IP risk stays with you. That matters more than many teams want to hear. If a generated snippet carries license baggage or looks too much like code you should not have used, your model vendor is not the one explaining it in the compliance review.

Enterprise Java sits in an awkward middle

This lands especially hard in enterprise Java because the code we generate is rarely isolated. It is attached to persistence layers, auth flows, messaging, external systems, and long-lived operational promises. A small bad guess in that kind of code does not stay small for long.

That is one reason the result matter so much for Java. Enterprise teams are not mostly asking AI to invent novel compiler theory. They are asking it to generate controllers, mappers, query layers, migrations, integration glue, retry wrappers, and configuration-heavy service code. In other words, the boring parts. The boring parts are where many real incidents begin.

Quarkus makes this sharper, not softer. Its fast feedback loops and clean developer experience are exactly why generated code can feel trustworthy after a quick local run. But a fast local run is not a safety proof. It tells you the endpoint starts. It does not tell you the generated transaction behavior is right under load, the authorization path is safe across edge cases, or the retry logic will not turn a partial outage into a full one.

This is why I do not find “it compiled and the tests passed” especially comforting in AI-heavy Java codebases. The systems are too connected, the defaults are too consequential, and the long tail of operational behavior is where the bill shows up.

Regulation is catching up, but it is not a single switch

The Europeans have been tinking about all this from a legal perspective since a while already. And the first thing you stumble over is probably the EU AI Act.

It did not arrive all at once. Chapters I and II, including definitions and AI literacy obligations, started being relevant on February 2, 2025. Governance rules and general-purpose AI model obligations started applying on August 2, 2025. The bigger enforcement wave starts on August 2, 2026. Some rules for high-risk systems inside regulated products do not land until August 2, 2027.

That timeline matters because a standard coding assistant is not automatically a high-risk AI system just because it writes code. Teams sometimes hear “AI Act” and imagine that every autocomplete window instantly turns into a regulated medical device. That is not the current position. The real pressure starts when generated code ends up inside a regulated product, a safety-relevant workflow, or any system with hard safety rules around it. Then documentation, oversight, and a clear paper trail stop sounding like governance boilerplate and start looking like evidence.

The separate AI Liability Directive proposal is also worth clearing up, because people still talk about it as if it is around the corner. It is not. The proposal was withdrawn by the Commission on October 6, 2025. So the near-term European story is simpler than people make it sound: read the AI Act and the new Product Liability Directive together. The new Product Liability Directive, Directive (EU) 2024/2853, explicitly covers software and AI systems, and Member States have to sign and act on it by December 9, 2026. The language is legal and unplesant to read, but the effect is easy to understand: if required safety rules were ignored, the side bringing the case gets a stronger argument that the software was defective. In plain English, if AI-generated behavior lands inside a regulated product or safety obligation, sloppy process gets a lot harder to defend in court.

I am being careful with “can” on purpose. This is how the rules fit together, not a magic sentence that says “AI equals liability.” Still, it is close enough to shape engineering behavior now. And the point is not that there is a ton of paperwork. The interesting aspect is that it includes the software development process as part of the paper trail. Which means, we will have to adapt the SDLC too and ensure it carries auditing and responsibility documentation too.

What responsible teams do differently

The first change you should make is simple. AI-generated code should get harder reviews, not easier ones.

That sounds backward until you remember the failure mode. Human reviewers are used to reading code for logic mistakes, incomplete edge handling, naming weirdness, and intent drift. AI-generated code deserves all of that plus pattern risk. You are reviewing something that may be locally polished and globally naive, statistically good at common shapes and statistically bad at the exact invariants your system cares about.

The second change is keeping a clear trail. If a significant block of code came from an assistant, say so in the pull request (or git notes) and say who owns it now. I do not mean a simple disclaimer. I mean a useful note. Which tool was used? Which parts were heavily generated? Who reviewed the transaction boundaries, auth assumptions, dependency choices, and data handling? Future you will not remember. Incident review definitely will not guess it correctly from vibes.

The third change is that automation has to stay in the loop after generation. Static analysis, secrets scanning, dependency analysis, and license checks are not anti-AI rituals. They are the minimum viable skepticism for high-volume code generation. If AI is making authorship cheaper, your guardrails need to get cheaper and louder too.

The fourth change is that rules belong close to the tool. Custom instructions, repository rules, banned dependency patterns, template constraints, secure defaults, architectural linting, required checks before merge, and path-specific review policies all matter more now. A handbook that says “please review carefully” is a wish.

A harness that makes bad output harder to suggest and harder to merge is governance.

Most teams do not need a grand AI policy before they can act. They need to start with a small sentence: Every AI-generated code path that reaches production must have a human who can explain what it does, why it is safe enough, and what the failure mode looks like. If nobody on the team can do that, the correct response is not “ship it and monitor.” The correct response is “rewrite it until somebody can.”

Judgment is the scarce thing

I do not think AI makes senior engineers less important. I think it makes unserious ownership much more visible. And I wrote about this before in other words.

For years, authorship and accountability were close enough together that we could treat them as basically the same thing. You wrote the code, so you owned the code. AI breaks that overlap. Now code can be authored by a model, accepted by a developer, reviewed by someone skimming a diff, and deployed by a pipeline that does not care who’s thought that was. The authorship story gets fuzzy fast.

The responsibility story does not. The engineer who signs off, the team that ships, and the company that runs or sells the system still own what it does. That is not old-fashioned. It is the only part of the deal that still works when authorship gets strange.

The developers who build real reputations in this phase will not be the ones who accepted the most suggestions or generated the most files. They will be the ones whose names mean somebody checked the hard parts. In the AI era, that is not a sentimental virtue. It is the job.

Discussion about this post

Ready for more?