Why AI Makes Software Engineering Harder, Not Easier
Lessons from Waterfall, Agile, and modern AI tooling for Java developers
AI-assisted development feels new. The speed is new. The fluency is new. The confidence with which code appears in your editor is new.
The problem it exposes is not.
For more than sixty years, software engineering has wrestled with a single, unresolved challenge: translating human intent into machine execution. Every major methodology shift, from Waterfall to Agile to DevOps, has been an attempt to close that gap. AI does not eliminate it. It reopens it, at scale.
What we are experiencing today with AI coding assistants is not a revolution in software engineering. It is the return of the specification problem, disguised as productivity.
To use tools like IBM Bob well, Java developers need to understand this lineage. Otherwise, we will repeat the same mistakes. Just faster.
The Long Arc: Why This Problem Never Went Away
The software crisis formally entered the engineering vocabulary in 1968 at the NATO Conference in Garmisch. Projects were late, unreliable, and vastly more expensive than planned. The diagnosis was not “bad programmers.” It was unclear intent.
Early engineering responses were rational. If mistakes were expensive to fix later, they should be eliminated early. Barry Boehm’s Cost of Change curve formalized this belief. Waterfall, the V-Model, IEEE 830 specifications, UML, and formal methods all followed from the same premise: if intent can be frozen, correctness can be guaranteed.
That premise failed repeatedly.
The problem was never rigor. It was ambiguity. Natural language specifications were precise in appearance but vague in meaning. Formal methods were precise but unusable at scale. UML promised blueprints but collapsed under its own weight. Agile flipped the table and embraced uncertainty, replacing documents with conversation and ran into communication limits as systems and teams grew.
Every era traded one failure mode for another.
AI-assisted development continues this long-standing trend.
“The primary determinant of agent success is not the underlying model, but how context is constructed, maintained, and deliberately compacted over time.”
HumanLayer, Advanced Context Engineering for Coding Agents (source)
That sentence could have been written in 1975.
All of this leads to one clear conclusion: AI changes how fast we work, but not what good engineering requires.
AI can write code very quickly. But it does not understand intent unless we make that intent explicit. When intent is unclear, the AI does not stop or ask questions. It continues and makes reasonable guesses. These guesses often look correct, but they can encode the wrong assumptions. Unlike before, this is even more critical as we continue to remove critical measures like team meetings or coffee breaks to discuss issues and findings. AI is able to drive into the wrong direction with confidence at unprecedented speed.
Developers need to be very clear about constraints, rules, and expectations before delegating work to an AI assistant.
The following ten tips are giving you a first level of practical guidance. They show how Java developers can work with AI assistants in a controlled and reliable way, starting today.
Start With Constraints, Not Code
Let’s make this clear again: When you prompt an AI assistant, you are performing requirements engineering. The difference is speed, not substance.
Loose prompts reproduce the oldest failure mode in our industry: underspecified intent. The model fills gaps probabilistically, just as junior developers once did when handed vague requirements. While inexperienced developers might have at least picked the right direction, AI can mercilessly phantasize solutions.
Java developers should recognize this immediately. Constraints are architecture. Without them, correctness is a mere accident.
Use AI for Mechanical Work, Not Architectural Judgment
Every historical attempt to “automate design” failed for the same reason: design is contextual.
Waterfall tried to lock it upfront. UML tried to encode it visually. Model-Driven Engineering tried to generate it. None survived contact with reality.
AI is different only in speed.
Use AI assisted coding tools mainly where the industry has always succeeded: mechanical transformations, pattern completion, and repetitive scaffolding. Keep architectural judgment where it has always belonged: With humans accountable for outcomes.
This is also why experience is the most important factor when using AI tools well. Senior developers have learned patterns and risks over many years. They know which solutions look correct but cause problems later. AI does not have this understanding. It does not know the history of the system, past failures, or why some decisions were made. Much of this knowledge is never written down. It exists as team habits, code review rules, and shared understanding built over time. Less experienced developers may trust AI output too much because it looks complete and confident. More experienced developers treat AI output as a first draft and check it carefully. This difference, more than the quality of the prompt or the tool, decides whether AI helps or creates hidden risk.
I tools feel like superpowers. They remove effort, speed up work, and make difficult tasks look easy. This creates a dangerous illusion: that power automatically means control.
There is a well-known pattern in stories where someone gains great abilities but does not yet have the skill, discipline, or judgment to use them well. A good example is the movie Chronicle. A group of teenagers suddenly gain telekinetic powers. At first, everything feels exciting and harmless. But because they lack experience and limits, small mistakes quickly turn into serious damage. The problem is not the power itself. The problem is using it without understanding consequences.
AI-assisted development follows the same pattern. The tool can generate large amounts of code, refactor systems, or migrate APIs in minutes. But speed hides mistakes. When developers treat AI as a superpower instead of a tool, they stop checking assumptions. They trust confident output. They skip careful review. This is how subtle architectural problems enter a codebase quietly.
Senior developers avoid this trap because they do not focus on what the tool can do, but on what can go wrong. They use AI to remove mechanical work, not to replace judgment. Without that discipline, AI does not make teams faster. It makes them faster at creating future problems.
Narrow the Scope or Invite Hallucination
Large, unconstrained requests fail for the same reason large requirement documents failed: they mix intent, mechanism, and policy into a single artifact.
HumanLayer gives this failure mode a modern name: Frequent Intentional Compaction. Historically, we called it decomposition. (source)
Break migrations, refactors, and feature work into bounded tasks. One responsibility at a time. This is not an AI trick. It is basic systems engineering, rediscovered.
This approach is proven to work because it matches how modern software is already built and maintained. We use small pull requests instead of large code drops. We split systems into modules and services instead of one large codebase. We write focused tests that verify one behavior at a time. Each of these practices exists to reduce complexity and make problems easier to reason about. AI tools benefit from the same structure. When a task is small and well defined, the AI has less room to guess and less need to invent missing details. The result is not only better AI output, but also code that is easier for humans to review, test, and maintain.
Decomposition only works if the resulting tasks are the right size. Too large, and ambiguity returns. Too small, and the work loses meaning.
Modern software engineering already solved this problem long before AI. Agile teams learned that effective work units are user stories, not technical chores. A good user story describes one observable behavior, one user goal, and one clear outcome. It is small enough to understand fully, but large enough to matter.
The same rule applies when working with AI assistants. An optimal task has a clear “done” condition that can be verified. For example, “migrate all persistence code” is too large. “Replace this repository method to use the new API and keep existing behavior” is usually the right size. If a task cannot be reviewed in a single pull request, it is probably too big for reliable AI execution.
A practical guideline is this: if you cannot explain the task in a few sentences without using “and then,” it is too large. If you cannot describe how to test or verify the result, it is not ready to be delegated to AI. User stories work well here because they naturally force clarity: who is affected, what changes, and what stays the same.
This is why AI works best when aligned with existing practices like small pull requests, clear acceptance criteria, and incremental delivery. These practices reduce risk for humans. They reduce hallucination for AI.
Treat AI Output as a Draft, Not a Decision
The V-Model taught us a painful lesson: verification is not validation.
AI-generated code can perfectly satisfy an implied specification and still be wrong for the system. The danger is that AI output looks finished. It carries an aura of correctness.
Java developers must consciously downgrade AI output to draft status. Read it as you would a pull request. Question its assumptions. Look for what it does not guarantee.
A useful rule is this: AI-generated code should follow the same or even stricter review standards as any human-written pull request. In many open-source projects and mature engineering teams, including practices documented by Google Engineering, reviewers focus less on how code was written and more on whether it is understandable, testable, and safe to change. This includes checking that each change has a clear purpose, that the code is easy to reason about, and that it does not mix unrelated concerns. These rules exist to protect the system over time. AI output should not bypass them just because it was fast to produce.
When reviewing AI-generated code, it helps to ask simple, well-known review questions:
Can another developer understand this change without extra explanation?
Is the behavior covered by tests or clear acceptance criteria?
Does this change introduce hidden complexity or new assumptions?
Open-source communities rely on these questions because reviewers often lack full context. That makes them especially useful for AI output, which also lacks shared background and tribal knowledge. Treating AI code as a draft and reviewing it with these standard practices keeps responsibility with the team, not the tool.
Externalize Context or Lose It
IEEE 830 failed not because it was too detailed, but because it became write-only. Agile failed at scale because knowledge became tribal.
The modern answer is neither monoliths nor conversations. It is persistent, structured context.
AI assistants fail for the same reason large software systems fail: important knowledge exists only in people’s heads.
Modern software engineering learned this lesson the hard way. Teams discovered that undocumented decisions, hidden assumptions, and informal rules do not scale. When people leave, rotate teams, or revisit code months later, this “invisible context” is lost. AI tools have the same problem, but faster. If context is not written down, the AI will guess.
Externalizing context means writing down the rules that guide decisions, not repeating them in every conversation. This is not new. It follows well-known documentation practices that already work in successful software projects.
This is not user documentation and not API reference material. It is also not classic architecture documentation like UML diagrams or system overviews. What we are externalizing here is operational and decision context:
What is allowed and what is not
Which patterns are preferred
Which areas are sensitive or risky
Which trade-offs were already decided
In modern projects, this type of knowledge is usually captured in lightweight, close-to-code documents. Several documentation patterns are widely accepted today because they solve exactly this problem:
Architecture Decision Records (ADRs)
ADRs document why a decision was made, not just what the system looks like. They record context, constraints, and consequences. This prevents teams from breaking important rules without understanding the reason. Agent context files play a similar role, but focus on guiding behavior instead of human discussion.
Docs-as-Code
Modern teams keep documentation in the same repository as the code, written in simple formats like Markdown, reviewed through pull requests. This keeps documentation up to date and visible. Files like AGENT.md or agent rules work best when treated the same way: versioned, reviewed, and changed intentionally.
Living Specifications
Practices like BDD, OpenAPI, and AsyncAPI succeed because they are executable or enforceable. They reduce ambiguity by making expectations testable. Agent rules follow the same idea: they are not descriptions, they are constraints that guide behavior. But we’ll talk about this later.
Context Compression: Less, but More Important
Externalizing context does not mean writing everything down. In fact, too much detail is harmful.
One of the key lessons from large systems is that context must be compressed. Only information that influences decisions should be included. Historical background, implementation trivia, or outdated alternatives add noise. Noise makes both humans and AI less effective.
Good context answers a small set of questions:
What must never change?
What usually should be preferred?
Where is this system fragile?
If a piece of information does not help answer these questions, it likely does not belong in agent-facing context.
Another common mistake is mixing dependency details and low-level architecture into context meant to guide behavior.
Large systems depend on many libraries, services, and frameworks. Listing all of them creates the illusion of completeness, but it pollutes context. AI does not need to know every dependency. It needs to know which dependencies are sensitive, which are stable, and which are risky to change.
The same applies to architecture. High-level structure is useful. Deep internal wiring is not. When context becomes a mirror of the full system, it stops being guidance and becomes noise. Both humans and AI perform better when context highlights importance, not exhaustiveness.
This is why effective context documentation is selective by design. It protects attention. It protects judgment. And it scales better than memory or conversation ever could.
Large Migrations Fail for the Same Reason They Always Did
When developers say “the assistant couldn’t fix our migration,” they are echoing a complaint from the 1980s. Large migrations fail not because they are technically hard, but because they combine too many changes into a single act. This has been true since the earliest days of software engineering.
“Big bang” migrations are psychologically attractive. They promise a clean break from the past. No legacy constraints. No compromises. A fresh start with modern frameworks, better structure, and fewer mistakes. For teams that have lived with a difficult codebase for years, this promise is emotionally powerful. Starting over feels simpler than carefully changing what already exists.
In practice, this approach fails for predictable reasons. A big rewrite removes not only bad decisions, but also undocumented knowledge. Edge cases that were fixed years ago disappear. Performance assumptions are lost. Integration details that were never written down break silently. What looks like cleanup is often the deletion of hard-won experience.
This is why large migrations have always succeeded only when they are decomposed. Waterfall failed when it tried to freeze everything at once. Agile succeeded by accepting incremental change. The same rule applies to AI-assisted work. When a migration is treated as one large task, AI tools are forced to guess intent across many layers at the same time. That is where hallucination and subtle regressions appear.
A more stable approach blends agentic modernization with targeted rewriting. AI agents are very effective at mechanical, well-defined transformations: namespace changes, API updates, repetitive refactors, or adapting code to new frameworks while preserving behavior. Rewriting, when it is needed, should be limited to small, isolated areas where intent is clear and behavior can be verified.
In other words, do not choose between rewriting and modernization. Use both, deliberately. Let agents handle incremental, bounded changes that preserve existing behavior. Use human-led rewrites only where the old design no longer matches the business model and can be replaced in a controlled way.
This blended approach works because it respects reality. Large systems do not change all at once. They evolve.
Agent Rules Are the New Specification
Traditional specifications describe what a system should do.
Agent rules describe how work is allowed to happen.
This difference matters. Agent rules do not replace requirements, architecture diagrams, or business specifications. They define the operational boundaries for an AI collaborator. In that sense, they form a new kind of specification layer: not for the software itself, but for the agent acting inside the software project.
Files such as AGENTs.md, Claude Skills, .bobmodes, or similar rule sets are not architecture documentation. They do not explain system structure or data flow. Instead, they are closer to:
coding guidelines,
contribution rules,
and safety or quality constraints.
Their purpose is to make implicit rules explicit.
For example:
“Do not refactor public APIs without explicit approval.”
“Prefer existing domain types over introducing new ones.”
“Avoid performance optimizations unless a benchmark exists.”
These rules rarely appear in UML diagrams or API specifications, but they strongly influence every code change. In many teams, they are enforced informally through experience, senior review, or tribal knowledge. Agent rules turn these informal constraints into something visible and enforceable. Several approaches are emerging, each with different trade-offs.
Project-Level Rule Files (AGENTs.md, .bobmodes)
These live next to the code and describe what the agent may or may not do. Their strength is clarity and proximity. They evolve with the repository and can be reviewed like any other change. Their weakness is that they require discipline. If rules are vague or too broad, they lose force.
Capability-Based Systems (Claude Skills)
Claude Skills, from Anthropic, take a more formal approach. A skill defines what an agent can do, what inputs it expects, and what outputs it produces. This resembles an API contract for agent behavior. It supports reuse and composition.
However, skills also have limits. They describe capabilities, not judgment. A skill can say how to refactor code, but not whether refactoring is appropriate in a specific situation. Skills can also move complexity into configuration. As the number of skills grows, managing them can become as difficult as managing services in a distributed system.
Modes and Guardrails
Some tools provide modes such as read-only, analysis-only, or refactor-only. These are useful safety switches, but they are coarse. They control what kind of action is allowed, not why or under which conditions. They work best as guardrails, not as full specifications.
In the past, unclear rules led to slow human mistakes. With AI, unclear rules lead to fast, confident actions that propagate immediately.
Agent rules reduce this risk by making boundaries explicit before action happens. They do not remove responsibility from developers. They make responsibility visible.
“Implicit assumptions are the enemy of reliable agent behavior. Durable agents require explicit operating rules.”
— HumanLayer
That statement is equally true for humans.
Agent rules are the new specification because they capture operational intent: what must never happen, what should happen by default, and when human judgment is required. Used well, they turn AI from a risky shortcut into a controlled collaborator. Used poorly, they become another abstraction that hides complexity instead of managing it.
Shift Effort From Writing Code to Verifying Intent
AI has changed the cost structure of software development. Writing code is now cheap. Verifying correctness is not.
This reverses a long-standing balance in software engineering. In the past, implementation was the slow part, and verification could often keep up. Today, AI can generate large amounts of code faster than a human can review or fully understand it. As a result, verification becomes the limiting factor.
This is not a new problem. The V-Model made a clear distinction decades ago: verification checks whether the system matches the specification, while validation checks whether the specification matches real needs. AI does not remove this distinction. It makes ignoring it more dangerous.
Modern teams must therefore invest more effort in tests, acceptance criteria, and explicit invariants. If intent cannot be verified, it should not be delegated. AI works best when it operates inside a framework where correctness can be checked automatically or reviewed in small, bounded steps.
The practical rule is simple: if you cannot clearly explain how to verify the result, the task is not ready for AI. Speed without verification does not create productivity. It creates delayed failure.
Use AI to Explain Code You Already Trust
Executable specifications such as TDD, BDD, and contracts succeeded because they connected intent directly to verification. They reduced ambiguity by forcing expectations to be precise and testable. If the test passed, the intent was at least partially preserved. If it failed, the mismatch was visible.
Asking AI to explain code that is already known to work follows the same principle. The goal is not to generate new behavior, but to extract and externalize existing intent. When the AI explains trusted code, it makes hidden assumptions visible. It surfaces invariants that were never written down. It exposes constraints that live only in the structure of the code.
Research on specification failures shows that many bugs are not caused by wrong code, but by misunderstood intent. A practical rule follows from this: if AI cannot produce a clear and accurate explanation of trusted code, the problem is not the AI. The problem is that the intent was never made explicit. AI simply reveals that gap. Used this way, explanation becomes a diagnostic tool. It helps teams identify where documentation, tests, or constraints should exist but do not.
This approach turns AI from a code generator into a specification probe. It does not replace tests or reviews. It complements them by making intent visible before changes are made.
Optimize for Cognitive Load, Not Velocity
AI-assisted development optimizes for speed. Human cognition does not. From a psychological perspective, this mismatch is dangerous. The human brain has limited capacity for attention, working memory, and decision-making. When tools increase output speed without reducing mental effort, cognitive load rises quickly. This leads to fatigue, shallow reasoning, and eventually burnout.
Research in cognitive psychology shows that people perform best when task complexity and information flow stay within a manageable range. When too much information arrives too fast, the brain switches from careful reasoning to pattern matching and shortcuts. This is useful for survival, but risky for software engineering. Subtle mistakes are overlooked. Assumptions go unchallenged. Confidence replaces understanding.
AI-assisted tooling amplifies this risk. Generated code appears quickly, often complete and well-structured. This creates an illusion of progress. Developers are tempted to move on before fully understanding what changed. Over time, this leads to a growing gap between what the system does and what the team actually understands.
Burnout is a natural consequence of this pattern. Constant high-speed interaction with AI creates continuous decision pressure: accept or reject, tweak or regenerate, move on or review. There are few natural pauses. The brain stays in a state of constant evaluation. This is mentally exhausting, even if the work feels productive.
There are also longer-term psychological effects. Developers may lose confidence in their own understanding. They may feel responsible for outcomes they did not fully design. Over time, this can reduce motivation and increase anxiety, especially when failures occur in production.
The solution is not to slow down AI. It is to protect human attention.
One effective strategy is deliberate pacing. Introduce intentional pauses between generation and review. Treat AI output as something to be studied, not immediately acted upon. This gives the brain time to switch from reactive mode to analytical mode.
Another strategy is task separation. Use AI for generation, but reserve review, verification, and decision-making for focused sessions. Avoid mixing fast generation with critical judgment in the same moment. This mirrors proven practices in safety-critical domains, where execution and verification are separated.
Limiting scope also reduces cognitive load. Smaller tasks are easier to understand and verify. This aligns with decomposition, small pull requests, and incremental delivery. These practices protect not only system quality, but mental health.
Finally, make intent visible. Clear constraints, tests, and documentation reduce the need to hold everything in working memory. When intent is externalized, the brain can focus on reasoning instead of remembering.
Optimizing for cognitive load does not reduce productivity. It preserves it over time. Velocity without mental sustainability leads to faster failure. Sustainable engineering respects both the limits of machines and the limits of the human mind.
Where This Is Going: Intent-Based Programming
AI did not solve the hardest problem in software. It solved syntax.
The deeper problem has always been translation: turning human intent into something a machine can execute and verify. That problem remains. What has changed is the speed at which mistakes can now propagate.
We are moving away from “writing code” as the central activity and toward designing intent. Domain models, invariants, rules, and scenarios are becoming the primary artifacts. Code increasingly becomes an implementation detail, produced and changed by machines, but constrained by human-defined boundaries.
This shift is not complete, and many parts are still unclear.
We do not yet have a shared understanding of how existing documentation maps cleanly to agent context. Architecture diagrams, ADRs, tests, and coding guidelines all capture intent in different ways, but how these artifacts should be combined, prioritized, or translated for agents is still an open question. What should be enforced? What should be advisory? What should be invisible to the agent? These boundaries are not yet standardized.
We also need to define the human protection layer. AI-assisted development increases cognitive load, accelerates decision pressure, and can hide responsibility behind fluent output. Guardrails are needed not only to protect systems, but to protect people: their attention, judgment, and long-term ability to reason about complex software. This is an engineering concern, not a personal one.
Other questions remain open as well. How much formality is enough to guide AI without overwhelming humans? Where should verification happen, and how much can be automated safely? How do teams preserve understanding when code changes faster than people can absorb it?
The pendulum has not stopped swinging. It has narrowed.
The lesson from 1968 still applies: there is no silver bullet. AI will not eliminate ambiguity. It will hide it unless intent is made explicit and verifiable.
Java developers are well positioned for this transition. We have lived through heavy specifications, agile corrections, executable tests, and contract-driven systems. The discipline that kept large systems alive before AI is the same discipline that makes AI useful now.
The software crisis did not disappear. It evolved.
Not because machines are unintelligent, but because human intent is difficult to express with precision.
And that has always been the real challenge.


