The Hidden Cost of AI Coding for Senior Java Developers
AI can generate code faster than we can safely review it, and enterprise teams are starting to feel the strain
I write here a lot. Almost every day.
That does something to you after a while. Daily blogging sounds like a publishing habit, but it becomes more than that. It becomes a way of moving through the day. A bug report is not just a bug report anymore. A strange benchmark result is not just a number. A comment in a meeting stays with you because you know there is probably a bigger idea inside it. You start collecting fragments all day long.
That has been my default mode for some time now. Always watching a little bit. Always thinking about what something means. Always carrying one half-finished thought into the next hour.
And AI tools fit into that mindset a bit too well.
They are useful. Really useful. I use them for research, reframing, rough drafts, structure, code exploration, and all those moments where the blank page or the blank editor stares back longer than it should. They help me get moving. They help me get unstuck. They help me cover more ground.
But they also make it harder to stop.
That is the part I keep coming back to.
The old work had more friction. You got tired in obvious ways. You wrote the code yourself, line by line. You wrote the draft yourself, paragraph by paragraph. At some point your hands were done, your focus was gone, or your patience just ran out. The day had a natural edge to it.
Now the machine keeps offering one more round.
One more rewrite. One more explanation. One more refactor. One more code path to inspect. One more branch to explore. One more quick pass before you close the laptop.
So the day stretches.
Not always in hours. Sometimes it stretches in your head. You walk away from the screen, but part of your attention is still inside the loop. You are still reviewing. Still comparing. Still half-working. Denis Stetskov’s recent piece, The Human Cost of 10x: How AI Is Physically Breaking Senior Engineers, landed for me because it gave language to that feeling. He argues that AI does not remove the human bottleneck. It increases the amount of material flowing toward the same limited human attention, and the result is a very physical kind of exhaustion.
I think he is right. And for Java developers, I think the problem is even sharper than it first appears.
In our world, wrong things often look respectable.
That is one reason enterprise Java has survived so long. The ecosystem is mature. The frameworks are stable. The conventions are strong. The code usually has shape. Even when something is off, it often still compiles, still starts, still passes a surprising amount of testing, and still looks like it belongs.
That is exactly what makes this new kind of work so tiring.
The code generator gives you something clean. The assistant suggests something plausible. The framework absorbs a lot of rough edges. The Quarkus service still boots. The Spring application still answers requests. The endpoint still returns JSON. Nothing looks obviously broken. But something under the surface has shifted. A transaction boundary moved. A retry now duplicates side effects. A mapper dropped a field that matters for audit. A service layer now owns logic that should have stayed somewhere else. The code is not nonsense. It is believable.
And believable wrong code is expensive.
This is where I think the “10x productivity” language starts to fall apart. In real Java systems, the hard part was never mostly typing. The hard part is understanding what the system is allowed to do, what it must never do, and what ugly-looking code is actually protecting you from some old production lesson nobody wrote down.
AI helps with production. It does not remove interpretation.
If anything, it moves more of the day into interpretation.
That shift is starting to show up in research too. METR’s randomized study with experienced open source developers found that when those developers used early-2025 AI tools, they actually took 19% longer to complete their tasks. What makes that result so interesting is not only the slowdown. It is that the developers expected the opposite, and even after finishing, many still felt faster than they really were. METR later reiterated that earlier result when announcing a follow-up experiment in 2026.
That one hit me hard.
Because it matches something I suspect a lot of us already know in our bodies before we know it in words. The machine makes you feel momentum. It keeps the screen moving. It keeps the possibilities coming. It reduces the pain of starting. But the screen moving is not the same as the work actually getting done faster.
Sometimes it means the opposite.
Sometimes it means you are now supervising more candidate solutions, more partial fixes, more plausible explanations, and more semantically risky code than you would have produced on your own. The local effort goes down. The global responsibility goes up.
And that is senior engineer work in a sentence.
What makes this harder to talk about is that the role itself is changing. Recent research from Google, Developer Productivity in the Age of Generative AI: A Psychological Perspective, frames this as a shift from coder to conductor. The developer becomes less of a direct builder and more of an orchestrator of machine-generated work. Anthropic’s internal research points in a similar direction. Engineers reported becoming broader, more full-stack, and more willing to work in unfamiliar areas, but they also raised concerns around skill development, collaboration, and what happens to deeper technical learning when more of the first draft comes from the tool.
“Conductor” sounds nice at first. Senior, strategic, elevated.
But conducting is not light work.
It means evaluating, ranking, rejecting, steering, correcting, and keeping a mental model intact while something faster than you keeps generating options. You may write fewer lines yourself, but you make more decisions. You may touch more systems in a day, but the cost is that your head is carrying more unfinished judgment.
That is the tiredness I notice now. Not just doing. Monitoring.
There is another part of the research that I think matters even more for enterprise teams, and it gets less attention than it should. MIT Sloan’s summary of recent findings showed that when developers got access to AI coding tools, coding time went up, project management time went down, and peer collaboration dropped by nearly 80%.
That number should make every engineering leader stop for a minute.
A lot of enterprise software survives because knowledge is social. Nobody completely understands the whole bank, the whole logistics backend, the whole insurance platform, or the whole IAM story. The reason those systems stay alive is not that one brilliant person holds it all together. The reason is overlap. Shared context. Repeated conversations. Code reviews that feel annoying until they save you. Architecture discussions that feel slow until they prevent six months of drift.
If AI pushes more work into private loops of prompt, accept, patch, and move on, then some of that overlap disappears. At first that can feel efficient. Fewer interruptions. Faster drafts. Less talking. But some of what disappears is not noise. Some of it is engineering memory.
That is a high price to pay for smoother local flow.
And then there is the part I still think we have not really learned how to describe well. Working with AI is mentally tiring in a different way because we keep trying to treat it like a collaborator, even though it is not a collaborator in the human sense.
Human teams are messy, but they have continuity. You know the teammate who always worries about migrations. You know the architect who will ask about failure modes. You know the reviewer who catches every security issue. Real people have habits, intentions, and patterns. You build rough mental models of them, and those models help you work together.
With AI, that instinct does not go away. We still try to model the other side. We still ask ourselves: can I trust this answer, is it guessing, is it rushing, is it overconfident, is it missing context, is it being clever in the wrong way? Human-AI interaction research around theory of mind points directly at this problem. The CHI 2024 workshop paper on Theory of Mind in Human-AI Interaction and the IBM Research summary both point to the same tension: humans naturally attribute roles, intentions, and mental states to AI systems, but those mental models do not map cleanly, and that mismatch creates friction.
That makes a lot of sense to me.
Because some of the exhaustion is not just code review volume. It is the energy spent trying to figure out what kind of partner the tool is being today. Careful or lazy. Helpful or slippery. Grounded or improvising. You are not just reviewing output. You are continuously calibrating trust.
That is work too.
At some point this stops feeling like a workflow discussion and starts feeling physical. Denis anchored his piece in the Neuron paper by Jie Zheng and Markus Meister, and the Caltech Magazine write-up is useful if you want the more readable version. The point is simple enough: deliberate human reasoning is slow, narrow, and serial. AI increases how much material can be produced. It does not increase how much material a human can deeply understand.
That is where the body enters the story.
The output gets cheaper. The judgment does not.
And if you are already the kind of person who lives in an always-on mode, that becomes very hard to manage. I feel this in writing. Daily publishing is not just a content habit. It trains your attention to remain open all day. Every release note looks like a possible post. Every benchmark looks like an argument. Every thread looks like something you should probably respond to. AI amplifies that tendency. It makes drafting easier. It makes exploring easier. It makes continuing easier.
It weakens the natural stop signs.
I think a lot of developers feel the same thing now in code. There is always one more experiment because the cost of trying is lower. There is always one more branch because the assistant can scaffold it. There is always one more test file, one more comparison, one more rewrite, one more generated explanation of why the generated code did what it did.
The old bottleneck was production speed.
The new bottleneck is discernment.
That is why I do not think the right response is either blind enthusiasm or easy cynicism. These tools are useful. Sometimes they are genuinely great. They help me. They help many people. But the cost of using them well is not where the marketing usually puts it. The cost is not just subscription price, model choice, or prompt quality.
The cost is sustained human judgment.
For Java teams, that means we need to get a lot more serious about protecting review energy, protecting shared context, and separating visible output from actual engineering throughput. It also means being honest that some of the fatigue people feel is not a personal weakness or bad time management. It is the natural result of asking one human mind to supervise far more plausible work than it used to create on its own. A broader version of that same argument also shows up in Harvard Business Review’s piece on how AI intensifies work.
That is the part I would add to Denis’s argument from where I sit.
AI did not remove the human cost. It moved it up the stack.
And once you start working that way every day, you feel it everywhere.
The original piece gave that feeling a sharp frame. I think the next step for our world, especially in Java and enterprise software, is to admit that syntactically safe and operationally plausible code can still be semantically wrong. That gap is where the pressure lives. That gap is where the review burden grows. That gap is where the always-on mindset quietly stops being a habit and starts becoming a condition.


