AI Coding Agents in Enterprise Java: The Context Problem

How AGENTS.md, Airails skills, and managed policies define the real control surface of AI-assisted development.

Mar 17, 2026

Over the last two years, I’ve used almost every serious coding assistant I could get my hands on. GitHub Copilot, Claude Code, Antigravity, Gemini CLI, Cursor, Codex, internal prototypes. And lately, a lot of time with IBM Bob and Bobshell in real Java repositories.

At first, I used them like most developers do. Small prompts. “Generate a REST controller.” “Refactor this method.” “Add validation.” It felt like autocomplete on steroids. Fast. Impressive. Sometimes scary good.

But the real learning started when I stopped using them as chat tools and started letting them act. Let them explore the repository. Let them fix issues. Let them run builds. Let them touch multiple modules in a large Quarkus project. That’s where things got interesting.

Sometimes they were brilliant. Other times they got lost in my own codebase. They ran the wrong Maven profile. They triggered the full integration test suite for a tiny change. They modified an entity directly from the controller layer. They tried to “improve” things that were actually architectural guardrails.

And I realized something important.

The problem was not the model. The problem was the environment I dropped it into.

Working with IBM Bob and Bobshell especially made this very clear. When you give an agent full repository access, it behaves like a very eager junior developer. It reads a lot. It tries to follow every instruction it sees. It over-explores. It over-tests. It over-thinks.

First, these systems are extremely sensitive to context. Not just prompts. Context.

Second, more instructions do not make them better. They often make them worse.

Third, repository-level context files are not documentation. They are control surfaces. Once I started treating AGENTS.md and CLAUDE.md like architecture instead of README files, the behavior changed dramatically. Costs dropped. Exploration became more focused. Builds became faster. And the code stayed within the boundaries I actually care about.

That’s what this article is about.

From Prompts to Persistent Context

We have moved from chat-based interaction to autonomous agents that:

Read issues
Explore repositories
Edit multiple files
Run builds
Execute tests
Open pull requests

But these agents start every run without memory. They have no built-in awareness of your:

Maven profiles
Internal libraries
Quarkus Extension capabilities
DevUI or DevMCP
Security policies
Naming conventions

Repository-level context files were created to fill that gap. The most visible example is AGENTS.md, a Markdown file stored at the root of the repository. It functions as a predictable entry point for coding agents.

To prevent fragmentation across vendors, the Agentic AI Foundation, co-founded by organizations such as OpenAI and Anthropic under the umbrella of the Linux Foundation, governs the evolution of the AGENTS.md standard. The goal is portability. One format, many tools.

This matters for enterprise Java. You do not want one context file for Copilot, another for Claude, another for an internal tool. You want a stable contract between your repository and any compliant agent.

The specification is intentionally simple. It is Markdown. No strict schema. No required sections. The complexity comes from how it is used.

The Efficacy Paradox: More Context, Worse Results

A key empirical study, published as arXiv:2602.11988, evaluated whether repository-level context files actually improve coding agent performance. The researchers built a benchmark called AGENTbench using real-world repositories and pull requests.

They tested three scenarios:

No context files
LLM-generated context files
Human-written context files

The result is counterintuitive.

LLM-generated context files reduced task success rates by up to three percent compared to no context at all. They also increased inference cost by more than 20 percent.

Human-written context performed slightly better than baseline. Around four percent improvement. But again, with increased reasoning steps and higher token usage.

The conclusion is simple. More context does not mean better performance. In many cases, it means distraction.

Why does this happen?

When an agent reads a long CLAUDE.md that describes every architectural detail, every testing rule, every formatting guideline, it tries to honor all of them. It explores more files. It runs more tests. It verifies global constraints before solving the local issue.

For a Java monorepo with thousands of classes, this becomes expensive quickly. The agent is behaving like an overly cautious junior engineer on their first day. It reads everything. It touches too much. It slows down.

In large Spring Boot systems, this translates into longer build cycles, excessive test execution, and inflated API costs.

Why Enterprise Java Is a Special Case

Enterprise Java systems amplify this problem.

You have:

Multi-module Maven or Gradle builds
Heavy dependency trees
Spring Boot auto-configuration
Annotation-driven behavior
JPA and Hibernate with lazy loading
Testcontainers-based integration tests
Custom security filters
Corporate proxy and artifact repositories

An unguided agent in this environment can easily:

Run mvn clean install repeatedly
Execute full integration test suites unnecessarily
Modify configuration classes incorrectly
Break dependency injection boundaries
Introduce N+1 query issues in JPA

Java is structured. It is typed. It compiles. That helps. But it also means build and runtime cycles are expensive.

So the role of AGENTS.md in a Java repository is not to teach Java. The models already know Java and Quarkus. The role is to constrain behavior inside your architectural boundaries.

This is a very important shift in thinking.

The Agency Efficiency Principle

Another line of research introduces what is called the Agency Efficiency Principle. The idea is simple. Autonomous reliability does not come from adding more instructions. It comes from carefully limiting them.

Large language models can follow a limited number of consistent instructions before performance degrades. Internal system prompts already consume a portion of that capacity. Your repository-level context is competing for the remaining attention budget.

For enterprise Java teams, this leads to a practical rule:

Keep repository context files under 150 to 200 lines. Preferably closer to 100.

Every line must answer this question:

If we remove this line, will the agent make a systemic, unrecoverable mistake?

If the answer is no, remove it.

Now here’s the practical problem. If we keep AGENTS.md under 150 lines, where do all the shared conventions go? We still need consistency. We still need patterns. We still need documentation rules.

This is where tools like Airails become interesting.

Airails: “Put Your AI on Rails” with Reusable Skills (and Why This Matters for Java)

One thing that helped me a lot recently is realizing that repository context files (AGENTS.md, CLAUDE.md) are only one layer of control. They are repo-scoped. But many of the rules I care about are not repo-specific at all. They are patterns I want everywhere, across multiple projects, and across multiple assistants.

That’s where airails.dev fits in nicely.

Airails positions itself as “Guidelines for Agentic development” with the simple idea: put your AI on rails. It does this with an ecosystem of downloadable “skills” and “agents” you can install into your coding agent’s skills directory. The site is very explicit about the goal: reuse conventions, enforce boundaries, and stop repeating yourself across repos.

Skills are a better place for “how we build Java” than your repo file

If you look at airails, it provides skill packages that are directly Java-relevant, for example:

A MicroProfile server skill that leans into BCE (Boundary-Control-Entity) architecture for services
Java CLI related skills (source-file mode, scripts), and even static web skills (if you have a mixed repo)
Documentation skills focused on consistent diagrams (Mermaid, draw.io) and README writing

The important architectural point is this: skills let you keep your repository context file small.

Instead of putting “how we structure services”, “how we draw diagrams”, “how we document modules”, “how we do Java CLI packaging” into every single AGENTS.md, you keep the repo file minimal and reference your installed skills as the shared baseline.

That aligns perfectly with the empirical findings we discussed earlier: long context files increase exploration and cost, and can reduce success rates.

Agents and subagents: “roles” you can standardize

Airails also links to specialized subagent definitions (it points to agents.md and an example for Java builds using zb). The idea is to give the assistant a role with a narrow scope—for example “Java build agent”, “documentation agent”, “diagram agent”.

In enterprise Java, this maps well to reality: builds, tests, docs, and refactors are different jobs. When one general agent tries to do all of it at once, it tends to over-explore and over-run things.

The practical pattern I recommend (especially for Maven + Quarkus)

If you adopt airails-style skills, your repo-level context file can become very small and very sharp:

State only the repo-specific constraints (module boundaries, forbidden dependencies, required profiles, and the “one true” build/test commands).
Point the agent to your installed skills for everything that is generic and reusable (BCE conventions, diagram style, documentation conventions).

This gives you the best of both worlds:

Minimal AGENTS.md (less token drag, less instruction dilution)
Strong conventions that follow you across projects and assistants (Copilot, Claude Code, Codex, Bob/Bobshell), because they are installed once and reused

What Not to Put in AGENTS.md

Once you move shared conventions into reusable skills, something interesting happens. Your repository context file becomes much smaller. And that forces a hard but healthy question: what absolutely does not belong in AGENTS.md anymore?

Do Not Put Formatting Rules

Many teams add rules like:

Use two-space indentation
Use single quotes
Sort imports alphabetically
Use Lombok annotations consistently

This is a mistake.

LLMs are probabilistic systems. They should not act as deterministic linters. Formatting and style must be enforced by tools such as:

Spotless
Checkstyle
Biome
Prettier (for frontend)

Instead of explaining formatting rules in AGENTS.md, write one line:

Run mvn spotless:apply before finalizing changes.

This offloads deterministic work to deterministic tools.

Do Not Document What the Model Already Knows

Do not write:

Use @Service for business logic
Use @RestController for REST endpoints
Use constructor injection in Spring

The model has seen millions of Spring Boot projects. It already knows this.

Your context file must focus on project-specific constraints. Not textbook Spring knowledge.

What to Put in AGENTS.md for Java

Now let’s focus on high-value constraints.

1. Build Coordination Rules

In Maven-based systems, agents often run commands sequentially. That is slow.

Your context file should explicitly define the build strategy:

Always use parallel builds: mvn clean install -T 1C -DskipTests
Batch commands into one execution step
Never run full integration tests unless explicitly required

This prevents repeated JVM startups and redundant builds.

2. Layered Architecture Boundaries

If your project uses a strict layered architecture:

web → service → repository
No direct access from web to entity
DTO mapping required at service boundary

State this explicitly.

For example:

Controllers must never access JPA entities directly. Always map entities to DTOs via the UserMapper interface before returning responses.

This prevents the agent from leaking persistence models into REST APIs.

3. Transaction and Concurrency Constraints

If you rely on @Transactional semantics, state where transactions are allowed:

Only service layer methods may be annotated with @Transactional
Repositories must not define transactional boundaries

Also define concurrency rules:

Use optimistic locking with @Version for entities handling financial data
Never use java.util.Random for security-sensitive operations

These are architectural guardrails. They reduce the risk of subtle data corruption.

4. JPA Performance Constraints

LLMs frequently generate naive repository methods that cause N+1 query problems.

If your system requires explicit fetch strategies, document this:

When loading Order entities with associated OrderItems, always use JOIN FETCH or defined @EntityGraph. Do not rely on default lazy loading in loops.

This single instruction can prevent major performance regressions.

Progressive Disclosure in Large Monorepos

In large Java monorepos, a single root AGENTS.md is not enough. But you also cannot dump everything into it.

The solution is progressive disclosure.

Root File: The Map

The root AGENTS.md should describe:

High-level module structure
Build strategy
Non-negotiable global constraints
Where to find deeper rules

It is a map. Not the territory.

Nested Context Files

Place additional AGENTS.md files inside submodules:

/backend/AGENTS.md
/frontend/AGENTS.md
/security/AGENTS.md

Agents read the nearest file first. Closer files override broader rules.

This keeps the context window focused.

If the agent works in a React frontend package, it does not need JPA constraints. If it modifies a Spring Boot microservice, it does not need frontend ESLint rules.

This isolation aligns with the Agency Efficiency Principle. Only load what is relevant.

Enterprise Governance with Managed Policies

At scale, repository-level context is not enough.

Developers can edit or delete AGENTS.md. Teams can drift. Security requirements can be ignored accidentally.

Some agent platforms support managed policies. These are global context files deployed at the operating system level, for example in system directories on macOS, Linux, or Windows.

These policies:

Apply to every repository
Cannot be bypassed by local configuration
Override conflicting repository-level rules

For enterprise Java teams, managed policies are where you enforce:

Secret management rules
Prohibition of hardcoded credentials
Mandatory use of internal Maven repositories
Licensing constraints
Logging sanitization requirements

But the same rule applies. Keep it short. Under 200 lines. Preferably much less.

A bloated managed policy degrades performance across every agent interaction in the company.

Architectural Implications for Java Teams

If you are new to AI-assisted development, here is the key shift:

Context files are not documentation. They are control surfaces.

They shape how an autonomous system explores your repository. They influence cost, latency, and correctness.

In enterprise Java ecosystems, the most effective context files:

Remove style and formatting noise
Enforce architectural boundaries
Constrain build behavior
Define performance-sensitive rules
Use progressive scoping in monorepos
Separate repository rules from global enterprise mandates

Strategic Takeaways

For architects and technical leads in Java organizations, this topic is not about writing better prompts. It is about designing a stable interface between your codebase and probabilistic systems.

Here are the principles to apply:

Less context improves reliability.
Deterministic concerns belong in CI/CD tools.
Architectural boundaries must be explicit.
Performance constraints must be encoded, not assumed.
Governance must exist above the repository level.

AI coding agents can accelerate enterprise Java development. But only if we treat context engineering as an architectural discipline.

Otherwise, we recreate a familiar problem in a new form: complex systems failing silently because no one defined the boundaries clearly enough.

The difference now is that the junior engineer making mistakes is not human. It is a model with a large context window and no memory.

Discussion about this post

Ready for more?