The Java Agent Skills Kit

A repo-local way to give Bob repeatable workflows for architecture review, modernization, dependency risk, LSP navigation, and test-first change in large Java systems.

Apr 28, 2026

I have become suspicious of long lists of agent commands.

They look useful at first. /trace. /explain. /review. /find-usages. Nice names. But after the third command you usually discover the problem: the command is only a prompt with a better hat. It does not know how your team slices modules, where the scary dependencies live, which migration rules are allowed, or what “done” means in your build.

For Java work, that is too thin.

Java codebases are old in a very specific way. They have Maven parents, Gradle convention plugins, generated sources, architectural exceptions from 2017, batch jobs nobody wants to touch, and one service that still makes javax.* appear in places where Jakarta should live. An agent can help there, but only if the workflow carries the boring project knowledge with it.

That is where I now prefer skills over one-off prompts.

In Bob, a skill is a reusable instruction set with a SKILL.md file and optional supporting files. The Bob Skills docs describe them as project-level or global workflows under .bob/skills/ or ~/.bob/skills/, with a name, description, and instructions. Bob loads the instructions when the skill is activated, and supporting files can sit next to it. That small filesystem detail matters. It means the workflow can be reviewed, versioned, and improved like code.

A command list is a useful appendix. The useful article is the one step earlier: which Java workflows are worth encoding as skills in the first place.

So the shape below keeps ten things in play, but not as ten magic commands. Some are skills. Some are tools that make the skills less stupid. That distinction matters.

The Small Stack

The ten pieces are simple:

Skills describe reusable Java workflows.
Slash commands make the workflow easy to invoke.
Modes constrain what the agent can do.
JBang scripts collect Java-specific facts locally.
LSP4J-MCP gives the agent IDE-grade Java symbols, definitions, and references.
Context7 MCP brings current framework docs into the workflow.
Snyk MCP brings dependency and security findings into the workflow.
ArchUnit turns architecture review into executable rules.
OpenRewrite turns modernization into reproducible dry-run patches.
Tests and dry runs decide whether the output is real.

Bob’s slash command docs show the lightweight part: a command is a Markdown file in .bob/commands/ or ~/.bob/commands/, with optional front matter like description and argument-hint. That is good for the entry point. It is not where I would hide a 60-step migration workflow.

Bob’s custom modes are the permission side. A project mode can restrict tools and even file edits by regex. For Java architecture work, that is useful. I can give the agent read access, command execution, and permission to edit tests or docs, while blocking random production edits until the plan has survived a review.

JBang fills the gap between “LLM reasoning” and “repository reality.” The JBang docs put the core idea plainly: run Java scripts without creating a full project, and pull dependencies when needed. For a Java team, this is much nicer than teaching every developer a new scripting language just so the agent can inspect pom.xml.

MCP is the connector layer. IBM’s Bob MCP tutorial describes MCP as a way to connect Bob to external tools at global or project level, with STDIO for local servers and SSE for HTTP-based servers. I use that for things like current framework docs, security scanners, ticket systems, and internal knowledge. The important part is not that the model can “browse.” The important part is that the workflow says which external tool is allowed to answer which question.

Here is the shape I like in a repository:

.bob/
  commands/
    java-boundaries.md
    modernize-jakarta.md
    dependency-risk.md
    trace-flow.md
    java-symbols.md
  skills/
    java-boundary-review/
      SKILL.md
      archunit-template.java
    java-modernization-slice/
      SKILL.md
      rewrite-recipes.md
    java-dependency-risk/
      SKILL.md
      snyk-triage.md
    java-symbol-navigation/
      SKILL.md
      lsp4j-mcp-config.md
    java-flow-trace/
      SKILL.md
      scripts/
        flow_index.java
    java-test-gap/
      SKILL.md
      test-patterns.md
  custom_modes.yaml

This is boring. Good. Boring means I can explain it to a team lead without using the word “autonomous” more than once.

Skill 1: Java Boundary Review

This is the first skill I would add to a serious Java repo.

A weak version of this workflow is /architect-review. That sounds useful, but it leaves too much open. Review what architecture? Against which rule? With which exceptions? A better skill turns architecture review into a path toward executable architecture tests.

Put this in .bob/skills/java-boundary-review/SKILL.md:

---
name: java-boundary-review
description: Review Java package and module boundaries, then turn stable findings into ArchUnit tests
---

You review Java architecture boundaries in this repository.

Workflow:

1. Identify the build tool and module structure.
2. Read existing architecture tests, package conventions, and module names before making claims.
3. Inspect production dependencies across controllers, services, domain, persistence, messaging, and clients.
4. Separate three categories:
   - confirmed boundary violations
   - intentional exceptions already documented in code or tests
   - suspicious dependencies that need human confirmation
5. Propose ArchUnit rules only for confirmed, stable boundaries.
6. Prefer adding or updating tests under `src/test/java`.
7. Do not change production code during this skill unless explicitly asked.
8. Finish with:
   - rules added or proposed
   - violations found
   - exceptions to document
   - commands run

The supporting file archunit-template.java can hold the house style:

package com.acme.architecture;

import com.tngtech.archunit.junit.AnalyzeClasses;
import com.tngtech.archunit.junit.ArchTest;
import com.tngtech.archunit.lang.ArchRule;

import static com.tngtech.archunit.lang.syntax.ArchRuleDefinition.noClasses;
import static com.tngtech.archunit.library.dependencies.SlicesRuleDefinition.slices;

@AnalyzeClasses(packages = "com.acme")
class ArchitectureTest {

    @ArchTest
    static final ArchRule controllers_do_not_access_repositories =
            noClasses()
                    .that().resideInAPackage("..controller..")
                    .should().accessClassesThat().resideInAPackage("..repository..");

    @ArchTest
    static final ArchRule application_slices_are_cycle_free =
            slices()
                    .matching("com.acme.(*)..")
                    .should().beFreeOfCycles();
}

The ArchUnit user guide already has the pieces: package checks, layer checks, cycle checks, and JUnit integration. The skill should not ask the model to invent architecture out of vibes. It should ask the model to find a rule that can become a test. I have written about ArchUnit before. Go check it out:

Guard Your Code: Enforcing Architecture Boundaries in Quarkus with ArchUnit

Markus Eisele

September 23, 2025

Read full story

That changes the conversation.

Instead of:

“This coupling might be risky.”

I want:

“Controllers currently access repositories in three places. Two are legacy admin endpoints. One is new. If the intended rule is controller -> service -> repository, this ArchUnit test captures it and fails on those three classes.”

That is a real review. It leaves a scar in the build.

The slash command can stay tiny:

---
description: Review Java architecture boundaries for a module or package
argument-hint: <module-or-package>
---

Use the java-boundary-review skill for $1.
Do not edit production code.
Prefer executable ArchUnit rules over prose findings.

Slash commands are the handle. Skills are the tool.

Skill 2: Modernize One Slice, Not the Whole Empire

Modernization is where agents become dangerous in a very polite way. They can rename imports, update dependencies, change tests, and produce a diff that looks coherent until your CI spends 40 minutes explaining that it is not.

For Java, a modernization skill should start with a slice:

one Maven module
one Gradle subproject
one package family
one migration class, such as javax.* to jakarta.*
one framework jump, such as a Spring Boot minor upgrade

It should also use tools that already know Java migrations. OpenRewrite is the obvious one. Its Maven plugin docs distinguish rewrite:run, which applies changes locally, from rewrite:dryRun, which previews changes and writes a patch under target/site/rewrite/rewrite.patch. That is exactly the kind of gate an agent needs.

Here is the skill:

---
name: java-modernization-slice
description: Plan and execute a bounded Java modernization using OpenRewrite dry runs, build output, and human approval
---

You modernize one bounded Java slice at a time.

Rules:

1. Never start by editing production code.
2. Identify the exact slice: module, package, dependency, or recipe.
3. Read the build files and current Java/framework versions.
4. Use Context7 or official docs for version-sensitive API behavior.
5. Prefer OpenRewrite dry runs before hand edits.
6. Show the dry-run diff summary and risk ledger before applying changes.
7. Only run `rewrite:run` or edit files after approval.
8. After changes, run the smallest relevant build or test command.

Output:

- target slice
- current version facts
- recipe or manual change plan
- dry-run result
- files expected to change
- risks and rollback path
- verification command and result

A Jakarta migration command might look like this:

mvn -U org.openrewrite.maven:rewrite-maven-plugin:dryRun \
  -Drewrite.recipeArtifactCoordinates=org.openrewrite.recipe:rewrite-migrate-java:LATEST \
  -Drewrite.activeRecipes=org.openrewrite.java.migrate.jakarta.JavaxMigrationToJakarta

The point is not that OpenRewrite will solve every migration. It will not. The point is that the agent starts from a reproducible mechanical pass, then reasons over the patch. That is better than letting it freehand 300 imports across a monorepo because it “understood the intent.”

The skill should also know when to stop:

Stop conditions:

- The dry-run patch touches modules outside the requested slice.
- The build uses generated sources the recipe does not parse.
- The migration changes public API types.
- Tests fail in unrelated modules.
- The required framework version is unclear from official docs.

This is the kind of sentence that makes a skill useful. It gives the agent permission to be boring and careful.

Skill 3: Dependency Risk Triage

Dependency upgrades are where “just update the version” goes to become a work item with meetings.

You need more than CVE severity. You need to know whether the vulnerable code is reachable, whether the dependency is direct or transitive, whether the fix changes runtime behavior, and whether your framework already manages the version through a BOM.

Snyk’s MCP server is useful here because it brings scanner output into the agent workflow. The Snyk MCP docs say the server is started through the Snyk CLI with supported transports like snyk mcp -t stdio or snyk mcp -t sse. In a local Java repo, STDIO is usually the clean default.

Example MCP configuration:

{
  "mcpServers": {
    "snyk": {
      "command": "snyk",
      "args": ["mcp", "-t", "stdio"]
    }
  }
}

Do not commit tokens into project config. Use environment variables or user-level settings for secrets. I know that sentence is boring. It is also the sentence that saves the next incident review 20 minutes.

Now the skill:

---
name: java-dependency-risk
description: Triage Maven or Gradle dependency risk using dependency trees, Snyk findings, and Java runtime context
---

You triage dependency risk for Java services.

Workflow:

1. Identify whether the project uses Maven, Gradle, or both.
2. Generate the dependency tree for the requested module.
3. Run the configured security scanner through MCP if available.
4. Separate direct dependencies from transitive dependencies.
5. Check whether versions are managed by a BOM, parent POM, platform, or convention plugin.
6. For each finding, explain:
   - affected dependency path
   - direct owner in this repo
   - runtime reachability assumption
   - safest fix option
   - test or smoke check required
7. Do not upgrade a BOM casually. Propose it as a separate change unless the user asked for it.
8. End with a one-screen merge recommendation.

The command can run a local tree first:

mvn -q -DskipTests dependency:tree -DoutputFile=target/dependency-tree.txt

For Gradle:

./gradlew dependencies --configuration runtimeClasspath

The agent then has scanner output, dependency ownership, and build context in one place. That is where it can help. Raw vulnerability lists are often just anxiety in JSON form.

Skill 4: Java Symbols Through LSP4J-MCP

Search is useful. Java symbol search is better.

grep can find text. It cannot reliably tell you which overloaded method is called, where an interface method is implemented, or whether a reference comes from production code, generated code, or a test fixture. Large Java repositories are full of names that repeat because Java developers are not paid by the unique noun.

LSP4J-MCP is a Java MCP server that wraps Eclipse JDTLS through LSP4J and exposes Java IDE features as MCP tools. Its README lists tools for find_symbols, find_references, find_definition, document_symbols, and find_interfaces_with_method. It needs Java 21, Maven, and JDTLS, then runs as a local MCP server against a workspace path.

That makes it a strong fit for an agent skill:

---
name: java-symbol-navigation
description: Use LSP-backed Java symbol navigation before reasoning about references, definitions, and implementations
---

You use Java language-server facts before making claims about symbols.

Use this skill when work involves:

- finding usages of a class, method, field, or constructor
- understanding interface implementations
- locating definitions across a multi-module workspace
- preparing a refactor
- checking whether a method is dead, overloaded, or framework-called

Workflow:

1. Confirm LSP4J-MCP is configured for the current workspace.
2. Use symbol tools before falling back to text search.
3. For a method-level question, resolve the definition first, then references.
4. Separate production references, test references, generated code, and framework entry points.
5. Treat reflection, annotations, serialization, configuration keys, and dependency injection as possible non-LSP references.
6. End with exact files and symbols, not only package-level prose.

Example MCP configuration, adapted from the LSP4J-MCP README:

{
  "mcpServers": {
    "java-lsp": {
      "command": "java",
      "args": [
        "-jar",
        "/path/to/LSP4J-MCP/target/lsp4j-mcp-1.0.0-SNAPSHOT.jar",
        "/path/to/your/java/project",
        "jdtls"
      ],
      "env": {
        "LOG_FILE": "/tmp/lsp4j-mcp.log"
      }
    }
  }
}

The skill should still be humble. LSP sees Java symbols; it does not see every runtime edge. CDI injection, Spring wiring, Jackson properties, JPA entity names, Kafka topic strings, and reflection can still matter. So the rule is: use LSP before grep, then use grep for the things the language server cannot know.

The slash command stays small:

---
description: Navigate Java symbols using the configured LSP MCP server
argument-hint: <symbol-or-file-location>
---

Use the java-symbol-navigation skill for $1.
Prefer LSP4J-MCP symbol tools before text search.
When reporting references, separate production, test, generated, and framework-driven usages.

This is one of the few MCP servers that feels directly Java-native. It gives the agent a piece of the IDE, not another pile of text.

Skill 5: Flow Trace from Code, Not Diagrams

Every large Java system has at least one diagram that is technically still a diagram and emotionally a fossil.

When I ask an agent to trace a flow, I do not want it to wander the repository narrating filenames. I want an index of entry points and handoffs:

REST controllers
message listeners
scheduled jobs
transactional boundaries
repository calls
HTTP clients
domain events

This is a good job for a small JBang script. The script does not understand the whole program. It just extracts the boring evidence so the agent can reason from something repeatable.

Put this under .bob/skills/java-flow-trace/scripts/flowIndex.java:

///usr/bin/env jbang "$0" "$@" ; exit $?

import java.nio.file.Files;
import java.nio.file.Path;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class flowIndex {

    private static final Pattern INTERESTING = Pattern.compile(
            "@(Path|GET|POST|PUT|DELETE|PATCH|GetMapping|PostMapping|PutMapping|DeleteMapping|KafkaListener|JmsListener|Scheduled|Transactional)\\b([^\\n]*)");

    private static final Pattern TYPE = Pattern.compile(
            "(class|interface|record|enum)\\s+([A-Za-z0-9_]+)");

    public static void main(String[] args) throws Exception {
        Path root = args.length == 0 ? Path.of("src/main/java") : Path.of(args[0]);
        if (!Files.exists(root)) {
            System.err.println("Path does not exist: " + root);
            System.exit(1);
        }

        try (var files = Files.walk(root)) {
            files.filter(path -> path.toString().endsWith(".java"))
                    .forEach(flow_index::printSignals);
        }
    }

    private static void printSignals(Path file) {
        try {
            String source = Files.readString(file);
            Matcher signal = INTERESTING.matcher(source);
            if (!signal.find()) {
                return;
            }

            String typeName = "unknown";
            Matcher type = TYPE.matcher(source);
            if (type.find()) {
                typeName = type.group(2);
            }

            System.out.println();
            System.out.println("## " + file);
            System.out.println("type: " + typeName);

            signal.reset();
            while (signal.find()) {
                System.out.println("- " + signal.group().trim());
            }
        } catch (Exception e) {
            System.err.println("Skipping " + file + ": " + e.getMessage());
        }
    }
}

The skill wraps it:

---
name: java-flow-trace
description: Trace a Java business flow across REST, messaging, scheduling, transactions, and persistence
---

You trace one named business flow through Java code.

Workflow:

1. Ask for the flow name if it is missing.
2. Run `jbang .bob/skills/java-flow-trace/scripts/flow_index.java` for the relevant source root.
3. Use `java-symbol-navigation` when the flow depends on overridden methods, interface implementations, or ambiguous call sites.
4. Search code for flow vocabulary, endpoint paths, event names, DTO names, and database terms.
5. Build a narrative from entry point to state change.
6. Mark inferred steps clearly.
7. Separate runtime facts from static-code guesses.
8. List files that matter for a debugger session.
9. End with missing observability: logs, metrics, spans, or correlation IDs that would make this flow easier to prove.

That last step is important. A static trace should make the next production trace less terrible. Otherwise you only created nicer archaeology.

Skill 6: Test Gap Before Implementation

Agents are very good at producing code that passes the tests you forgot to write.

This skill runs before implementation. Its job is to translate a change request into the smallest useful test surface. It should not write a giant test plan. It should identify the tests that would catch the most likely wrong implementation.

---
name: java-test-gap
description: Turn a Java change request into focused tests before implementation starts
---

You identify test gaps before Java implementation.

Workflow:

1. Read the change request and affected code.
2. Identify the observable behavior, not only the classes to edit.
3. List existing tests that already cover the behavior.
4. Propose the smallest missing tests:
   - one happy path if missing
   - one failure path
   - one boundary or retry/idempotency case if relevant
5. Prefer tests that fail before the implementation.
6. Do not mock across boundaries that the production code does not own.
7. If the system uses Quarkus, Spring Boot, Testcontainers, WireMock, REST Assured, Mockito, or AssertJ already, follow the existing style.
8. Stop after tests unless the user explicitly asks for implementation.

Example request:

POST /orders should be idempotent when the client retries with the same Idempotency-Key.

A weak agent starts editing OrderService.

This skill should first ask for or create a test that proves the contract:

@Test
void retryWithSameIdempotencyKeyDoesNotCreateSecondOrder() {
    String key = UUID.randomUUID().toString();

    given()
            .header("Idempotency-Key", key)
            .contentType("application/json")
            .body("""
                    {"sku":"coffee-beans","quantity":2}
                    """)
            .when()
            .post("/orders")
            .then()
            .statusCode(201)
            .body("status", equalTo("ACCEPTED"));

    given()
            .header("Idempotency-Key", key)
            .contentType("application/json")
            .body("""
                    {"sku":"coffee-beans","quantity":2}
                    """)
            .when()
            .post("/orders")
            .then()
            .statusCode(200)
            .body("status", equalTo("ACCEPTED"));

    given()
            .when()
            .get("/orders")
            .then()
            .body("findAll { it.sku == 'coffee-beans' }.size()", equalTo(1));
}

Maybe your project uses repository assertions instead of HTTP assertions. Fine. The skill should follow the project. The important part is the sequence: contract first, implementation second.

Skill 7: Framework Docs Without Hallucinated APIs

Java frameworks move slower than frontend frameworks, but they still move. Quarkus extensions change names. Spring Boot upgrades change defaults. Jakarta migrations remove old comfort imports. LangChain4j APIs evolve. The agent’s memory is not a dependency management strategy.

For framework-specific work, I use a docs-grounded skill:

---
name: java-framework-docs-check
description: Check current framework documentation before changing Java framework APIs, extension names, or configuration
---

You verify version-sensitive Java framework facts before implementation.

Use this skill when work touches:

- Quarkus extensions or configuration
- Spring Boot starters or auto-configuration
- Jakarta EE namespace migrations
- Hibernate behavior
- LangChain4j APIs
- MCP server/client setup
- build plugin versions

Workflow:

1. Identify the framework and version from the repository.
2. Use configured documentation MCP tools or official docs for current behavior.
3. Quote only short facts; link to the source.
4. Compare docs against existing code.
5. Mark anything inferred from code separately from documented behavior.
6. Only then propose code or configuration changes.

Context7 is one useful MCP option here. Its client docs show both remote and local MCP configuration, including a remote endpoint at https://mcp.context7.com/mcp and local npx setup. I would not make the whole team depend on one docs server blindly, but I do want the skill to say: “For version-sensitive framework work, fetch current docs first.”

This is especially useful for examples that look right because they were right two releases ago. Those are the worst examples. They have excellent camouflage.

A Mode That Keeps the Agent Honest

Skills describe the job. Modes control the blast radius.

For architecture and modernization, I like a mode that can read everything, run commands, use MCP, and edit only tests, docs, and Bob workflow files until explicitly told otherwise.

Example .bob/custom_modes.yaml:

customModes:
  - slug: java-architect
    name: Java Architect
    roleDefinition: >
      You are a senior Java architect working on large Maven and Gradle systems.
      You prefer executable tests, small migration slices, and clear rollback paths.
    whenToUse: >
      Use for architecture review, modernization planning, dependency triage,
      symbol navigation, flow tracing, and test-gap analysis.
    customInstructions: >
      Do not edit production Java code until a plan, risk list, and verification
      command have been shown. Prefer tests and reproducible tool output over
      prose-only advice.
    groups:
      - read
      - command
      - mcp
      - - edit
        - fileRegex: ^(\.bob/|docs/|src/test/|.*ArchitectureTest\.java$).*
          description: Workflow files, docs, and tests only

You can loosen this later. Start tight. Agents are easier to trust when they cannot turn a code review into an unplanned migration.

The Commands Become Small

Once the skills exist, slash commands do not need to be clever.

.bob/commands/modernize-jakarta.md:

---
description: Modernize one Java slice from javax to Jakarta
argument-hint: <module-or-package>
---

Switch to java-architect mode if available.
Use the java-modernization-slice skill for $1.
Target migration: javax to Jakarta.
Start with OpenRewrite dry run.
Do not apply changes until the dry-run summary and risk ledger are shown.

.bob/commands/dependency-risk.md:

---
description: Triage dependency risk for one Java module
argument-hint: <module>
---

Use the java-dependency-risk skill for $1.
Generate the dependency tree first.
Use Snyk MCP if configured.
Separate direct, transitive, BOM-managed, and plugin-managed dependencies.

.bob/commands/trace-flow.md:

---
description: Trace a Java business flow through code
argument-hint: <flow-name>
---

Use the java-flow-trace skill for $1.
Run the flow index script if present.
Mark static-code inferences clearly.
End with missing observability.

.bob/commands/java-symbols.md:

---
description: Resolve Java symbols, definitions, and references through LSP4J-MCP
argument-hint: <symbol-or-file-location>
---

Use the java-symbol-navigation skill for $1.
Use LSP4J-MCP first, then text search for annotation, reflection, config, and serialization edges.
Report exact files and symbols.

This is much less impressive than a giant prompt. It is also much easier to maintain.

How I Would Roll This Out

I would not add 20 skills.

Start with four:

java-boundary-review
java-modernization-slice
java-test-gap
java-symbol-navigation

Run them on real work for two weeks. Every time the agent makes a bad assumption, improve the skill. Every time the agent needs the same file or checklist, add it as a supporting file. Every time a workflow becomes too broad, split it.

Then add java-dependency-risk, java-flow-trace, and java-framework-docs-check.

The skill should stay small enough that a developer can read it in a code review. If the skill becomes a 900-line manifesto, it is now a second application with worse tooling. Congratulations, you invented enterprise promptware.

What Makes a Skill Good

A good Java agent skill has these properties:

It names one job
It starts from repository evidence
It has stop conditions
It uses existing Java tools before model guesses
It produces tests, patches, or commands, not only prose
It separates facts from inference
It keeps secrets and credentials out of project files
It can be reviewed by the team

That is the difference between a prompt and an engineering workflow.

A prompt says:

Review this service for architecture issues.

A skill says:

Read module structure, inspect dependencies, compare against existing tests, propose ArchUnit rules, edit only tests, run the relevant Maven command, and stop if the rule would encode an unclear exception.

The second one is less magical. That is why I trust it more.

Where Bob Shell Still Fits

Bob Shell is still useful. I like terminal agents for repository work because the terminal is where Java projects tell the truth. mvn test. gradle build. jbang scripts/flow_index.java. git diff. No UI romance. Just output.

The mistake is making the shell command the unit of reuse. For serious Java work, the unit of reuse should be the skill. The command starts it. The shell gathers evidence. MCP fills in external facts. Tests decide whether the change survived contact with reality.

That stack is not glamorous. It is practical.

And practical is what I want near a 400-module Java repository.

The Main Point

Agent skills are not a productivity hack for typing faster. They are a way to put senior engineering habits into a repeatable shape:

trace from evidence
change one slice
dry-run migrations
turn architecture into tests
check security in context
verify framework facts before editing
write the test before the implementation when behavior matters

That is the part Java teams need.

Not another list of commands.

A small set of reviewed workflows that make the agent behave like it has worked in this codebase before.

Guard Your Code: Enforcing Architecture Boundaries in Quarkus with ArchUnit

Discussion about this post

Ready for more?