Managing AI Tool Sprawl with LangChain4j, Quarkus, and Ollama

A practical walkthrough of how tool search reduces prompt overhead and keeps larger internal assistants easier to reason about as catalogs grow.

May 07, 2026

Five tools is easy. You hand the model the full list, it picks one, and you move on. Fifty tools is where it gets annoying. At that point you are not really debugging prompts anymore. You are debugging context geometry: how much tool metadata fits in the window, whether the model still pays attention to the user question, and how much latency you burn shipping the same catalog on every hop.

LangChain4j calls the escape hatch tool search. Keep the first request smaller, let the model ask for matching tools, and then continue as usual. In library terms this goes through ToolSearchStrategy (still marked @Experimental in the LangChain4j line that quarkus-langchain4j-bom pins for your platform version). Quarkus LangChain4j gives you declarative assistants with @RegisterAiService, while programmatic AiServices.builder() still unlocks toolSearchStrategy(...). In this tutorial we build NovaDeck, a fictional ops surface with fifty deterministic tools, wire one Ollama chat model twice (full catalog vs tool search), and add a trace ring buffer plus a tiny benchmark endpoint so you can show the difference with numbers instead of vibes.

Prerequisites

You should already be comfortable running Quarkus in dev mode and calling JSON endpoints with curl or something similar. I keep this sample on a local Ollama daemon so the whole article stays reproducible without cloud API keys.

JDK 21 installed (the project uses maven.compiler.release 21)
Quarkus CLI installed, or Maven with the Quarkus plugin as in the generated project
Ollama installed and running locally (ollama serve), with a tool-capable chat model pulled
Podman or Docker is not required for this module unless you lean on Dev Services; the guide assumes a host Ollama on port 11434
Git to clone the repository with all the sources. I haven’t included them here. This is just the high level walkthrough.

Project setup

Create the Quarkus skeleton from the repo root. If you already nested the module somewhere else, adjust the path and keep going:

quarkus create app dev.novadeck:novadeck-tool-search \
  --package-name=dev.novadeck \
  --extensions='rest-jackson,io.quarkiverse.langchain4j:quarkus-langchain4j-ollama' \
  --java=21 \
  --no-code \
  --batch-mode \
  -DplatformVersion=3.35.1

The --no-code flag skips the greeting resource, which keeps the package tree clean. Passing io.quarkiverse.langchain4j:quarkus-langchain4j-ollama on the same command does two useful things at once: it adds the extension, and it adds the quarkus-langchain4j-bom import to dependencyManagement. That means quarkus-langchain4j-ollama lands in <dependencies> without a manual <version>, which is the same result you get from code.quarkus.io when you tick the LangChain4j Ollama extension. When you bump the Quarkus platform (quarkus.platform.version), the LangChain4j BOM moves with it.

Implementation

Deterministic tool beans

NovaDeck keeps the fake ops data boring on purpose. I want deterministic tool calls here, not fake realism with side effects. Each domain lives in its own package under dev.novadeck.tools.*, every tool method returns a short string, and IDs flow through NovaDeckIds so traces line up across runs. The catalog lands at fifty @Tool methods total (NovaDeckToolCounts.TOTAL_TOOLS), which is enough to feel big without dragging in a database.

Example shape (IncidentTools):

@ApplicationScoped
public class IncidentTools {

    @Tool("List active incidents filtered by severity (SEV1..SEV4). Returns a short summary line.")
    public String listActiveIncidents(String severity) {
        return "incidents[severity=" + severity + "]: SEV2-db-cache-exhaustion(id="
                + NovaDeckIds.incidentId("sev2") + "), SEV3-api-latency-spike(id="
                + NovaDeckIds.incidentId("sev3") + ")";
    }
    // ... additional incident tools ...
}

Write the descriptions like terse release notes: nouns, verbs, constraints. SimpleToolSearchStrategy (the keyword implementation we wire shortly) scores matches from those strings, so they need to earn their keep.

Fixed-catalog assistant (`@RegisterAiService`)

The control path does the simple thing: expose every tool through Quarkus LangChain4j’s declarative wiring. NovaDeckChatMemoryProviderSupplier hands each request a MessageWindowChatMemory so multi-hop tool loops behave, and @ToolBox enumerates the six tool beans.

@RegisterAiService(
        modelName = "fixed",
        chatMemoryProviderSupplier = NovaDeckChatMemoryProviderSupplier.class)
@ApplicationScoped
public interface FixedOpsAssistant {

    @SystemMessage("""
            You are NovaDeck, an internal operations copilot. Answer with grounded tool calls.
            Prefer tools over guessing. Keep final answers short unless asked for detail.
            """)
    @UserMessage("{prompt}")
    @ToolBox({
            IncidentTools.class,
            DeployTools.class,
            BillingTools.class,
            FleetTools.class,
            AuditTools.class,
            UtilityTools.class
    })
    String ask(String prompt);
}

This interface is the baseline. The search variant only matters if it beats this on something real.

Tool-search assistant (`AiServices.builder`)

The search path is where the library still makes you drop to code. Annotation-only wiring is not enough today, so we build an AiServices proxy ourselves. SearchAssistantClient injects the same six tool beans, the @ModelName("search") chat model, and a SimpleToolSearchStrategy capped with maxResults(12). Everything sits behind TracingToolSearchStrategy, which logs each search round and pushes structured entries into ToolSearchTraceRegistry.

 SimpleToolSearchStrategy inner = SimpleToolSearchStrategy.builder()
                .maxResults(12)
                .build();
        TracingToolSearchStrategy tracing = new TracingToolSearchStrategy(inner, traceRegistry);
        this.delegate = AiServices.builder(SearchOpsAssistant.class)
                .chatModel(chatModel)
                .chatMemoryProvider(memoryProvider)
                .tools(incidentTools, deployTools, billingTools, fleetTools, auditTools, utilityTools)
                .toolSearchStrategy(tracing)
                .maxSequentialToolsInvocations(24)
                .build();

Embedding-heavy alternative: swap in VectorToolSearchStrategy with an injected EmbeddingModel when keyword search stops tracking your vocabulary. The trade-off is less determinism and one more dependency to tune.

REST surface and benchmark harness

The HTTP layer stays small on purpose. FixedChatResource and SearchChatResource expose POST /api/fixed/chat and POST /api/search/chat with JSON { "prompt": "..." }. BenchmarkResource (POST /api/bench) runs both assistants back-to-back for runs iterations (default 1, capped at 5) and returns millisecond arrays plus the catalog count. TraceResource (GET /api/trace/recent) exposes the ring buffer so you can grab screenshots or inspect what the search path actually did.

Configuration

The interesting properties live in src/main/resources/application.properties:

quarkus.application.name=novadeck-tool-search
quarkus.langchain4j.timeout=120s

quarkus.langchain4j.fixed.chat-model.provider=ollama
quarkus.langchain4j.search.chat-model.provider=ollama

quarkus.langchain4j.ollama.fixed.base-url=${novadeck.ollama.base-url:http://localhost:11434}
quarkus.langchain4j.ollama.search.base-url=${novadeck.ollama.base-url:http://localhost:11434}
quarkus.langchain4j.ollama.fixed.chat-model.model-name=${novadeck.ollama.model:llama3.2}
quarkus.langchain4j.ollama.search.chat-model.model-name=${novadeck.ollama.model:llama3.2}

quarkus.langchain4j.ollama.fixed.chat-model.temperature=0.2
quarkus.langchain4j.ollama.search.chat-model.temperature=0.2

Failure modes worth naming explicitly:

novadeck.ollama.base-url wrong — every assistant call fails fast through the LangChain4j HTTP client. The stack trace points at the Ollama JAX-RS client, not JDBC.
timeout too low — large prompts plus fifty-tool serialization can blow through optimistic defaults. 120s is deliberately forgiving for local dev while still bounded.

Keep both named models (fixed, search) on identical weights. Otherwise you are benchmarking model choice, not tool wiring.

Verification

At this point we are checking the claim from the opening, not chasing a benchmark trophy. The fixed assistant always carries the full tool catalog. The search assistant adds a discovery step so it can work with a smaller set of tools after that first round. The commands below make that visible.

From novadeck-tool-search/, run the tests first:

./mvnw test

You should get BUILD SUCCESS. That tells you the REST layer, the assistant wiring, and the synthetic tool catalog all compile before you start looking at model behavior.

Then start dev mode once Ollama is listening:

./mvnw quarkus:dev

Start with the control case

Before you touch the search path, the trace buffer should be empty:

curl -s http://localhost:8080/api/trace/recent

Expected output:

[]

That is the control check. TracingToolSearchStrategy only records work done through /api/search/chat, so the fixed assistant should leave nothing behind here.

Run the same prompt through both paths

Now send the same incident question to both assistants:

curl -s -X POST http://localhost:8080/api/fixed/chat \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"List active incidents with severity SEV2"}'

curl -s -X POST http://localhost:8080/api/search/chat \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"List active incidents with severity SEV2"}'

The wording of the replies can vary because the model is still a model. What should not vary is the grounding. Both answers should stay in the incident domain, and the fixed path may mention both SEV2 and SEV3 because listActiveIncidents("SEV2") returns a synthetic string that contains both incidents.

If you call /api/trace/recent again after /api/fixed/chat, it should still return []. After /api/search/chat, the newest trace entry should show that a search round happened and that the matched working set is smaller than the full catalog:

curl -s http://localhost:8080/api/trace/recent | jq '.[0] | {searchableToolCountAtSearch, matchedToolNames}'

One run looks roughly like this:

{
  "searchableToolCountAtSearch": 50,
  "matchedToolNames": [
    "listActiveIncidents",
    "getIncident"
  ]
}

The exact tool names can vary a bit with model behavior, but two things should hold. The search round started from the full catalog of 50 tools, and the matched set is much smaller than that full list. In this sample it is capped at 12 by maxResults(12).

Read the benchmark for the right signal

The harness is still useful, but not as a universal speed claim. It measures wall clock. It does not measure prompt-token savings directly.

curl -s -X POST http://localhost:8080/api/bench \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"Compare staging deployments","runs":2}'

One run on my M4 Pro with local Ollama looked like this:

{
  "catalogToolCount": 50,
  "prompt": "Compare staging deployments",
  "runs": 2,
  "fixedMillis": [2039, 2407],
  "searchMillis": [3790, 6078]
}

Repeated runs on the same setup told the same basic story: the search path was slower on wall clock. That is not a contradiction. The search assistant pays for a search round before it gets to the work tools, and on a local model that extra hop can cost more than the smaller first request saves.

That is why I read this harness as a sanity check, not a verdict. The durable win here is prompt budget. The fixed assistant starts every request with all 50 tool descriptions in play. The search path can narrow the matched set and leave more room for the user prompt and the rest of the conversation. That does not guarantee lower latency on your laptop, but it is still the more useful fix once the catalog gets large enough.

Conclusion

NovaDeck solves the problem we started with by separating tool discovery from tool execution. The fixed assistant is easy to wire, but once the catalog grows it forces every turn to carry too much metadata. ToolSearchStrategy adds one more step and does not promise a faster local benchmark, but it gives the model a smaller working set, which is the real escape hatch when context budget, tool sprawl, and answer quality start fighting each other.

The full source for this walkthrough lives in my Github Repository.

Discussion about this post

Ready for more?