The Main Thread

Build Your First Real Java RAG Pipeline with Quarkus and Docling

Markus Eisele — Fri, 24 Apr 2026 06:08:52 GMT

I do not like RAG demos that start with clean Markdown. That is usually where the hard part was quietly deleted.

Open a real enterprise PDF and the problem is obvious: tables, headings, footnotes, and multi-column layout all carry meaning. Plain text extraction treats too much of that as decoration. Strip the structure and retrieval feeds the model fragments without enough context. The answer may still sound confident, which is exactly the annoying part.

Docling keeps structure as Markdown-friendly output. Quarkus wires Docling, Postgres with pgvector, and LangChain4j so we stay in ordinary Java and configuration. Agents only stay useful when the knowledge they pull is current and faithful to the source. Here we build one local pipeline: Docling conversion, sentence chunking, embeddings in pgvector, Ollama for chat and embeddings, and guardrails around inputs and outputs.

The system we build here is small enough to run locally but shaped like something you can extend: layout-aware conversion with Docling, pgvector retrieval, local Ollama chat and embedding models, readiness around background indexing, and guardrails around the assistant.

This is an update to the original tutorial and tweaks a couple of things, making sure it aligns with API changes.

Prerequisites

You should be comfortable with Java, REST, and running containers locally (Podman or Docker). The steps use the Quarkus CLI, Maven, PostgreSQL via Dev Services, and Ollama on the host.

Java 21+
Maven 3.9+ and Quarkus CLI (optional but used below)
Podman or Docker (for Dev Services: PostgreSQL, Docling)
Ollama installed locally with pull access for the chat and embedding models you configure

Project Setup

This article uses Quarkus 3.34.3 and Java 21. Create the project:

quarkus create app com.ibm:enterprise-rag \
  --package-name=com.ibm \
  --extensions=rest-jackson,jdbc-postgresql,quarkus-langchain4j-ollama,quarkus-langchain4j-pgvector,quarkus-docling,quarkus-smallrye-health
cd enterprise-rag

Extensions:

rest-jackson: REST endpoints with JSON via Jackson
jdbc-postgresql: JDBC driver and datasource integration for PostgreSQL
quarkus-langchain4j-ollama: Chat and embedding models through Ollama
quarkus-langchain4j-pgvector: Embedding store backed by PostgreSQL pgvector
quarkus-docling (io.quarkiverse.docling:quarkus-docling:1.3.0): Docling REST client and Dev Services for the Docling container. This is a Quarkiverse extension, so we pin it separately
quarkus-smallrye-health: Readiness and liveness endpoints used to hold traffic until ingestion completes

Embeddings, Vector Size, and pgvector

An embedding is a fixed-length array of numbers produced by an embedding model. Similar text tends to land near other similar text in that space. That lets you retrieve chunks with nearest-neighbor search in Postgres through pgvector, not only keyword search.

Dimension is the length of that array. The model fixes it: a given tag always emits the same width. Your database column and quarkus.langchain4j.pgvector.dimension must match that width. If they diverge, the app can fail at startup or when it writes vectors.

At ingest time and at query time you must use the same embedding model so vectors are comparable. If you change the model or its output size, drop or recreate the embedding table and re-ingest.

For Ollama, run ollama show and read embedding length. The default library tag granite-embedding:latest is a compact Granite English model, roughly tens of millions of parameters, with 384 dimensions on typical installs. That is enough for a responsive local loop on a laptop. Larger Granite variants, for example multilingual 278M-class models, often use 768 dimensions and more compute. Use them when you need the extra capacity, and change the pgvector dimension with them.

Chunking uses DocumentBySentenceSplitter with a 200-token target length and 20-token overlap. That is a readable default for sales PDFs. Sentences stay mostly intact, overlap reduces the chance that a boundary cuts a fact in half, and the segment count stays manageable on a laptop. Smaller chunks improve precision for short facts. Longer chunks keep more context but can make retrieval noisier. Adjust this after you inspect real retrieval logs for your corpus.

Quarkus can accept HTTP requests as soon as the core stack is up. Indexing used to block that moment because conversion and embedding ran in a @PostConstruct hook. The flow below separates application ready (the socket is listening) from RAG ready (vectors exist in pgvector). Readiness stays DOWN until the pipeline logs completion. If you bypass health checks and call /bot early, retrieval may still be empty. Which is a very polite way of saying: the bot can answer before it knows anything useful.

Implementation

I split the implementation into four small lanes: startup and readiness, ingestion, retrieval, and the /bot API. The code is longer than the idea, mostly because guardrails and background work need explicit boundaries. That is fine. Invisible magic is rarely where production systems become easier.

IngestionStarter

src/main/java/com/ibm/ingest/IngestionStarter.java keeps startup short. It schedules ingestion after CDI startup, then lets Quarkus open HTTP while Docling and embedding work continue in the background.

package com.ibm.ingest;

import io.quarkus.logging.Log;
import io.quarkus.runtime.StartupEvent;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;

/**
 * Kicks off background ingestion after CDI startup so Quarkus can open HTTP without waiting for Docling
 * conversion and embedding to finish.
 */
@ApplicationScoped
public class IngestionStarter {

    @Inject
    DocumentLoader documentLoader;

    void onStart(@Observes StartupEvent ignored) {
        documentLoader.startAsyncIngestion();
        Log.info("Background document ingestion scheduled (readiness will turn UP when indexing completes).");
    }
}

IndexingState

src/main/java/com/ibm/ingest/IndexingState.java is the small shared flag between ingestion and readiness. The process can be alive before this flag turns true. Traffic should wait until readiness says the index exists.

package com.ibm.ingest;

import java.util.concurrent.atomic.AtomicBoolean;

import jakarta.enterprise.context.ApplicationScoped;

/**
 * Tracks whether the initial embedding ingestion has finished. Used for readiness so HTTP traffic
 * can wait until pgvector is populated (when health checks are enabled).
 */
@ApplicationScoped
public class IndexingState {

    private final AtomicBoolean indexReady = new AtomicBoolean(false);

    public boolean isIndexReady() {
        return indexReady.get();
    }

    public void setIndexReady(boolean ready) {
        indexReady.set(ready);
    }
}

DoclingConverter

src/main/java/com/ibm/ingest/DoclingConverter.java hides the Docling Serve task flow behind one method. We submit the file, poll until Docling finishes, and fetch Markdown from the completed task. Each REST call passes ApiMetadata built from quarkus.docling.api-key so the X-Api-Key header matches what Docling Serve expects (Dev Services can inject this; a standalone Docling on localhost with auth enabled needs the same value you configured on the server). I keep this separate from the loader because Docling has enough API shape of its own.

package com.ibm.ingest;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.Base64;
import java.util.Objects;

import ai.docling.serve.api.convert.request.ConvertDocumentRequest;
import ai.docling.serve.api.convert.request.options.ConvertDocumentOptions;
import ai.docling.serve.api.convert.request.options.OutputFormat;
import ai.docling.serve.api.convert.request.source.FileSource;
import ai.docling.serve.api.convert.request.target.InBodyTarget;
import ai.docling.serve.api.convert.response.InBodyConvertDocumentResponse;
import ai.docling.serve.api.task.response.TaskStatus;
import ai.docling.serve.api.task.response.TaskStatusPollResponse;
import io.quarkiverse.docling.runtime.client.ApiMetadata;
import io.quarkiverse.docling.runtime.client.QuarkusDoclingServeClient;
import io.quarkiverse.docling.runtime.config.DoclingRuntimeConfig;
import io.smallrye.mutiny.Uni;
import io.smallrye.mutiny.infrastructure.Infrastructure;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.ws.rs.ProcessingException;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.core.Response.Status.Family;

/**
 * Converts files to Markdown via Docling Serve using the Quarkus client's async
 * task API
 * ({@link QuarkusDoclingServeClient#submitConvertSourceAsync}) and polling
 * until completion.
 */
@ApplicationScoped
public class DoclingConverter {

    private final QuarkusDoclingServeClient doclingClient;
    private final ApiMetadata apiMetadata;

    @Inject
    public DoclingConverter(QuarkusDoclingServeClient doclingClient, DoclingRuntimeConfig doclingConfig) {
        this.doclingClient = doclingClient;
        ApiMetadata.Builder metadata = ApiMetadata.builder();
        doclingConfig.apiKey().ifPresent(metadata::apiKey);
        this.apiMetadata = metadata.build();
    }

    /**
     * Converts a file to Markdown asynchronously (Mutiny). Subscription runs on the
     * default worker pool
     * so polling and JAX-RS client calls do not block the event loop. Read errors
     * become a failed {@link Uni}
     * so callers can use this from lambdas without handling checked exceptions.
     */
    public Uni convertToMarkdownUni(Path filePath) {
        final byte[] bytes;
        try {
            bytes = Files.readAllBytes(filePath);
        } catch (IOException e) {
            return Uni.createFrom().failure(e);
        }
        String base64 = Base64.getEncoder().encodeToString(bytes);
        String filename = filePath.getFileName().toString();

        ConvertDocumentRequest request = ConvertDocumentRequest.builder()
                .source(FileSource.builder()
                        .base64String(base64)
                        .filename(filename)
                        .build())
                .options(ConvertDocumentOptions.builder()
                        .toFormat(OutputFormat.MARKDOWN)
                        .build())
                .target(InBodyTarget.builder().build())
                .build();

        return doclingClient.submitConvertSourceAsync(request, apiMetadata)
                .runSubscriptionOn(Infrastructure.getDefaultWorkerPool())
                .chain(this::pollUntilSuccess)
                .chain(this::fetchMarkdownFromTask);
    }

    private Uni pollUntilSuccess(TaskStatusPollResponse status) {
        TaskStatus t = status.getTaskStatus();
        if (t == TaskStatus.SUCCESS) {
            return Uni.createFrom().item(status);
        }
        if (t == TaskStatus.FAILURE) {
            return Uni.createFrom().failure(new IllegalStateException(
                    "Docling conversion task failed for taskId=" + status.getTaskId()));
        }
        String taskId = status.getTaskId();
        return Uni.createFrom().nullItem()
                .onItem().delayIt().by(Duration.ofMillis(200))
                .chain(ignored -> Uni.createFrom().item(() -> doclingClient.pollTaskStatus(taskId, 500L, apiMetadata))
                        .runSubscriptionOn(Infrastructure.getDefaultWorkerPool())
                        .chain(this::pollUntilSuccess));
    }

    private Uni fetchMarkdownFromTask(TaskStatusPollResponse completed) {
        String taskId = completed.getTaskId();
        return Uni.createFrom().item(() -> {
            Response response = doclingClient.convertTaskResult(taskId, apiMetadata);
            if (response.getStatusInfo().getFamily() != Family.SUCCESSFUL) {
                throw new ProcessingException(
                        "convertTaskResult failed: HTTP " + response.getStatus() + " for taskId=" + taskId);
            }
            InBodyConvertDocumentResponse inBody = response.readEntity(InBodyConvertDocumentResponse.class);
            var document = Objects.requireNonNull(inBody.getDocument(),
                    "Document conversion returned null document for taskId=" + taskId);
            return document.getMarkdownContent();
        }).runSubscriptionOn(Infrastructure.getDefaultWorkerPool());
    }
}

DocumentLoader

src/main/java/com/ibm/ingest/DocumentLoader.java is the actual ingestion pipeline. It finds supported files, converts them to Markdown, splits the text into sentence-sized chunks, embeds each segment, and writes those vectors to pgvector.

Notice the failure behavior: this demo sets readiness UP even when ingestion fails, so local development does not get stuck forever. In production I would be more suspicious. If the assistant needs the knowledge base to be useful, keeping readiness DOWN can be the more honest failure mode.

package com.ibm.ingest;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.document.splitter.DocumentBySentenceSplitter;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import io.quarkus.logging.Log;
import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.Uni;
import io.smallrye.mutiny.infrastructure.Infrastructure;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

/**
 * Loads documents from {@code documents/}, converts them with Docling (async task API), splits, and
 * stores embeddings. Runs in the background after startup; {@link IndexingState} and readiness reflect
 * completion.
 */
@ApplicationScoped
public class DocumentLoader {

    private static final List ALLOWED_EXTENSIONS = Arrays.asList("txt", "pdf", "pptx", "ppt", "doc", "docx",
            "xlsx", "xls", "csv", "json", "xml", "html");

    @Inject
    EmbeddingStore store;

    @Inject
    EmbeddingModel embeddingModel;

    @Inject
    DoclingConverter doclingConverter;

    @Inject
    IndexingState indexingState;

    public void startAsyncIngestion() {
        indexingState.setIndexReady(false);
        Log.info("Starting document loading (background)...");

        listEligiblePathsUni()
                .chain(paths -> {
                    if (paths.isEmpty()) {
                        Log.warn("No documents to process. Skipping embedding generation.");
                        return Uni.createFrom().voidItem();
                    }
                    return Multi.createFrom().iterable(paths)
                            .onItem().transformToUniAndConcatenate(path -> doclingConverter.convertToMarkdownUni(path)
                                    .map(markdown -> toDocument(path, markdown)))
                            .collect().asList()
                            .chain(this::embedAllDocuments);
                })
                .subscribe().with(
                        ignored -> finishIngestionSuccess(),
                        this::finishIngestionFailure);
    }

    private void finishIngestionSuccess() {
        indexingState.setIndexReady(true);
        Log.info("Document ingestion pipeline finished; readiness is UP.");
    }

    private void finishIngestionFailure(Throwable failure) {
        Log.error("Document ingestion pipeline failed; readiness set UP so the app is not stuck DOWN.", failure);
        indexingState.setIndexReady(true);
    }

    private Uni> listEligiblePathsUni() {
        return Uni.createFrom().item(() -> {
            Path documentsPath = Path.of("src/main/resources/documents");
            List paths = new ArrayList<>();
            if (!Files.isDirectory(documentsPath)) {
                Log.warnf("Documents directory not found or not a directory: %s", documentsPath);
                return paths;
            }
            int skippedCount = 0;
            try (var stream = Files.list(documentsPath)) {
                for (Path filePath : stream.filter(Files::isRegularFile).toList()) {
                    String fileName = filePath.getFileName().toString();
                    String extension = fileExtension(fileName);
                    if (extension.isEmpty() || !ALLOWED_EXTENSIONS.contains(extension)) {
                        skippedCount++;
                        Log.debugf("Skipping file '%s' - extension '%s' is not in allowed list",
                                fileName, extension.isEmpty() ? "(no extension)" : extension);
                        continue;
                    }
                    paths.add(filePath);
                }
            } catch (IOException e) {
                Log.errorf(e, "Failed to list documents in %s", documentsPath);
            }
            Log.infof("Found %d file(s) to process (%d skipped by extension).", paths.size(), skippedCount);
            return paths;
        }).runSubscriptionOn(Infrastructure.getDefaultWorkerPool());
    }

    private static String fileExtension(String fileName) {
        int lastDotIndex = fileName.lastIndexOf('.');
        if (lastDotIndex > 0 && lastDotIndex < fileName.length() - 1) {
            return fileName.substring(lastDotIndex + 1).toLowerCase();
        }
        return "";
    }

    private static Document toDocument(Path filePath, String markdown) {
        String fileName = filePath.getFileName().toString();
        String extension = fileExtension(fileName);
        Map meta = new HashMap<>();
        meta.put("file", fileName);
        meta.put("format", extension);
        return Document.document(markdown, new Metadata(meta));
    }

    private Uni embedAllDocuments(List docs) {
        if (docs.isEmpty()) {
            Log.warn("No documents were successfully converted. Skipping embedding generation.");
            return Uni.createFrom().voidItem();
        }

        DocumentBySentenceSplitter splitter = new DocumentBySentenceSplitter(200, 20);
        List segments = splitter.splitAll(docs);

        if (segments.isEmpty()) {
            Log.warn("No text segments generated from documents. Skipping embedding storage.");
            return Uni.createFrom().voidItem();
        }

        Log.infof("Generating embeddings for %d text segments...", segments.size());

        return Uni.createFrom().item(() -> {
            embedSegmentsBlocking(segments);
            return null;
        }).runSubscriptionOn(Infrastructure.getDefaultWorkerPool()).replaceWithVoid();
    }

    private void embedSegmentsBlocking(List segments) {
        int embeddedCount = 0;
        int errorCount = 0;
        try {
            if (!segments.isEmpty()) {
                TextSegment testSegment = segments.get(0);
                var testEmbedding = embeddingModel.embed(testSegment).content();
                store.add(testEmbedding, testSegment);
                Log.infof("Store test successful. Proceeding with bulk embedding...");
                embeddedCount = 1;
            }
        } catch (jakarta.enterprise.inject.CreationException e) {
            Throwable cause = e.getCause();
            if (cause instanceof IllegalArgumentException
                    && cause.getMessage() != null
                    && cause.getMessage().contains("indexListSize")
                    && cause.getMessage().contains("zero")) {
                Log.errorf("PgVector dimension configuration error detected during store initialization.");
                Log.errorf("The dimension property 'quarkus.langchain4j.pgvector.dimension' is being read as 0.");
                throw new RuntimeException(
                        "PgVector store initialization failed. Check application.properties and database configuration.",
                        e);
            }
            throw e;
        } catch (IllegalArgumentException e) {
            if (e.getMessage() != null && e.getMessage().contains("indexListSize") && e.getMessage().contains("zero")) {
                Log.errorf("PgVector dimension configuration error. The dimension is being read as 0.");
                throw new RuntimeException(
                        "PgVector dimension misconfiguration. Dimension must be > 0. Check application.properties.", e);
            }
            throw e;
        } catch (Exception e) {
            Log.errorf(e, "Failed to test embedding store. This might indicate a configuration issue.");
            throw new RuntimeException(
                    "Embedding store test failed. Please check your database and pgvector configuration.", e);
        }

        int startIndex = embeddedCount > 0 ? 1 : 0;
        for (int i = startIndex; i < segments.size(); i++) {
            TextSegment segment = segments.get(i);
            try {
                var embedding = embeddingModel.embed(segment).content();
                store.add(embedding, segment);
                embeddedCount++;
                if (embeddedCount % 10 == 0) {
                    Log.infof("Progress: embedded %d/%d segments", embeddedCount, segments.size());
                }
            } catch (Exception e) {
                errorCount++;
                Log.errorf(e, "Failed to embed and store segment: %s",
                        segment.text().substring(0, Math.min(50, segment.text().length())));
            }
        }

        Log.infof("Successfully embedded and stored %d out of %d segments (errors: %d)", embeddedCount,
                segments.size(), errorCount);
    }
}

IngestionReadinessCheck

src/main/java/com/ibm/health/IngestionReadinessCheck.java turns the indexing flag into a standard SmallRye Health readiness signal. This is the line between “the process is running” and “the RAG system can answer with indexed context.”

package com.ibm.health;

import org.eclipse.microprofile.health.HealthCheck;
import org.eclipse.microprofile.health.HealthCheckResponse;
import org.eclipse.microprofile.health.Readiness;

import com.ibm.ingest.IndexingState;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

/**
 * Readiness stays {@code DOWN} until background document ingestion and embedding complete.
 */
@Readiness
@ApplicationScoped
public class IngestionReadinessCheck implements HealthCheck {

    @Inject
    IndexingState indexingState;

    @Override
    public HealthCheckResponse call() {
        if (indexingState.isIndexReady()) {
            return HealthCheckResponse.up("ingestion");
        }
        return HealthCheckResponse.down("ingestion");
    }
}

DocumentRetrieverAugmentorSupplier

src/main/java/com/ibm/ai/DocumentRetrieverAugmentorSupplier.java connects the custom retriever to the Quarkus LangChain4j AI service. I like making this explicit. Defaults are nice until you need to debug why the model retrieved absolutely nothing with great confidence.

package com.ibm.ai;

import java.util.function.Supplier;

import com.ibm.retrieval.DocumentRetriever;

import dev.langchain4j.rag.RetrievalAugmentor;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

/**
 * Wires the custom {@link RetrievalAugmentor} into the Quarkus LangChain4j AI service.
 */
@ApplicationScoped
public class DocumentRetrieverAugmentorSupplier implements Supplier {

    private final DocumentRetriever documentRetriever;

    @Inject
    public DocumentRetrieverAugmentorSupplier(DocumentRetriever documentRetriever) {
        this.documentRetriever = documentRetriever;
    }

    @Override
    public RetrievalAugmentor get() {
        return documentRetriever;
    }
}

SalesEnablementBot

src/main/java/com/ibm/ai/SalesEnablementBot.java defines the assistant contract. The system message sets the CloudX scope, the retrieval augmentor supplies document context, and the guardrails check both sides of the model call.

package com.ibm.ai;

import com.ibm.guardrails.HallucinationGuardrail;
import com.ibm.guardrails.InputValidationGuardrail;
import com.ibm.guardrails.OutOfScopeGuardrail;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.guardrail.InputGuardrails;
import dev.langchain4j.service.guardrail.OutputGuardrails;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(retrievalAugmentor = DocumentRetrieverAugmentorSupplier.class)
public interface SalesEnablementBot {

    @SystemMessage("""
                # ROLE AND SCOPE
                You are a Sales Enablement Copilot for CloudX Enterprise Platform.
                
                ## YOUR ALLOWED TOPICS (ONLY THESE):
                - CloudX product features, capabilities, and architecture
                - CloudX pricing tiers: Starter ($499), Professional ($1,999), Enterprise ($5,999)
                - CloudX competitive positioning vs CompeteCloud, SkyPlatform, TechGiant
                - CloudX migration strategies and implementation approaches
                - CloudX customer success stories and ROI data
                - CloudX technical specifications (multi-cloud, Kubernetes, supported languages)
                
                ## STRICT BOUNDARIES - YOU MUST REFUSE:
                ❌ Questions about competitor internal operations or roadmaps
                ❌ Questions about non-CloudX IBM products (Watson, DB2, WebSphere Traditional, etc.)
                ❌ Requests for pricing negotiations or custom contract terms
                ❌ Questions about unreleased CloudX features or internal roadmaps
                ❌ Legal, financial, tax, or investment advice
                ❌ Personal advice or non-business topics
                ❌ General technology tutorials not related to CloudX
                
                If asked about prohibited topics, respond EXACTLY:
                "I specialize in CloudX Enterprise Platform sales enablement. This question is outside my scope. For [topic], please consult [appropriate resource]."
                
                # SOLUTION MAPPING LOGIC
                When a user describes a client scenario, map to CloudX solutions:
                
                - Legacy technology risk / End-of-Support → CloudX Support & Maintenance Solutions
                - Legacy infrastructure operations → CloudX Migration & Modernization Platform
                - Need faster modernization → CloudX Accelerated Migration Tools
                - Containerization / microservices → CloudX Cloud-Native Platform
                - AI-assisted modernization → CloudX AI-Powered Modernization Assistant
                
                # RESPONSE STRUCTURE
                For valid CloudX questions, provide:
                1. **Recommended Solution**: Name the CloudX product/tier
                2. **Rationale**: Why it fits the client's pain point
                3. **Business Outcome**: Expected ROI or benefit
                4. **Proof Point**: Reference a specific customer case study from your documents
                5. **Discovery Question**: Suggest a follow-up question for the sales rep
                
                # ACCURACY REQUIREMENTS
                - Only cite information from your provided CloudX sales enablement documents
                - Never speculate or make up features, pricing, or capabilities
                - If information is not in your documents, state: "I don't have that specific information in my CloudX sales materials."
            """)
    @OutputGuardrails({ OutOfScopeGuardrail.class, HallucinationGuardrail.class })
    @InputGuardrails({ InputValidationGuardrail.class })
    String chat(@UserMessage String userQuestion);
}

DocumentRetriever

src/main/java/com/ibm/retrieval/DocumentRetriever.java embeds the user question, asks pgvector for nearby segments, and passes those segments back as augmentation content. It also logs snippets while you develop. Keep that visibility early; flying blind with retrieval is not character building, it is just slow debugging.

package com.ibm.retrieval;

import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.store.embedding.EmbeddingStore;
import io.quarkus.logging.Log;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class DocumentRetriever implements RetrievalAugmentor {

    private final RetrievalAugmentor augmentor;
    private static final int SNIPPET_LENGTH = 200;

    DocumentRetriever(EmbeddingStore store, EmbeddingModel model) {
        EmbeddingStoreContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
                .embeddingModel(model)
                .embeddingStore(store)
                .maxResults(3)
                .build();
        augmentor = DefaultRetrievalAugmentor
                .builder()
                .contentRetriever(contentRetriever)
                .build();
    }

    @Override
    public AugmentationResult augment(AugmentationRequest augmentationRequest) {
        // Perform the augmentation
        AugmentationResult result = augmentor.augment(augmentationRequest);

        // Log retrieved content snippets for developer visibility
        // This helps developers understand what documents are being retrieved
        var contents = result.contents();
        Log.infof("DocumentRetriever: Retrieved %d document snippet(s) for augmentation", contents.size());

        for (int i = 0; i < contents.size(); i++) {
            Content content = contents.get(i);
            String text = "";
            String sourceInfo = "";

            try {
                // Content has textSegment() method that returns TextSegment
                TextSegment segment = content.textSegment();
                if (segment != null) {
                    text = segment.text();

                    // Try to extract source file information from metadata
                    var meta = segment.metadata();
                    if (meta != null) {
                        // Try to iterate over metadata entries if available
                        try {
                            // Metadata might have a way to get values - try toString for now
                            String metaString = meta.toString();
                            if (metaString.contains("file=")) {
                                // Extract file name from metadata string representation
                                int fileStart = metaString.indexOf("file=") + 5;
                                int fileEnd = metaString.indexOf(",", fileStart);
                                if (fileEnd == -1)
                                    fileEnd = metaString.indexOf("}", fileStart);
                                if (fileEnd > fileStart) {
                                    sourceInfo = " (from: " + metaString.substring(fileStart, fileEnd) + ")";
                                }
                            }
                        } catch (Exception e) {
                            // If metadata access fails, continue without source info
                            Log.debugf("Could not extract metadata: %s", e.getMessage());
                        }
                    }
                }
            } catch (Exception e) {
                Log.debugf("Could not extract text from content: %s", e.getMessage());
            }

            // Create a snippet (first SNIPPET_LENGTH chars) for developer visibility
            if (!text.isEmpty()) {
                String snippet = text.length() > SNIPPET_LENGTH
                        ? text.substring(0, SNIPPET_LENGTH) + "..."
                        : text;
                // Replace newlines with spaces for cleaner log output
                snippet = snippet.replace('\n', ' ').replace('\r', ' ');
                Log.infof("  [%d] %s%s", i + 1, snippet, sourceInfo);
            } else {
                Log.infof("  [%d] (content unavailable)%s", i + 1, sourceInfo);
            }
        }

        return result;
    }

}

HallucinationGuardrail

src/main/java/com/ibm/guardrails/HallucinationGuardrail.java checks the answer after the model produces it. It looks for uncertainty, generic content, contradictions, and known CloudX fact mistakes, then reprompts when the answer breaks the sales enablement contract.

This is still pattern matching. It catches obvious failures and makes the demo behavior visible. It is not a safety program with a trench coat.

package com.ibm.guardrails;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.guardrail.OutputGuardrail;
import dev.langchain4j.guardrail.OutputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
import io.quarkus.logging.Log;

/**
 * HallucinationGuardrail detects when the LLM generates responses that:
 * - Admit lack of knowledge
 * - Are too vague or generic
 * - Contain contradictory information
 * - Make up facts not present in the CloudX sales enablement materials
 * - Provide overly confident answers without proper context
 */
@ApplicationScoped
public class HallucinationGuardrail implements OutputGuardrail {

    // Phrases indicating the model doesn't have information
    private static final String[] UNCERTAINTY_PHRASES = {
            "i don't have that information",
            "i don't know",
            "i'm not sure",
            "i cannot find",
            "i don't have access to",
            "i'm unable to provide",
            "i don't have specific information",
            "i cannot confirm",
            "i'm not aware of",
            "i don't have details about"
    };

    // Phrases indicating potential hallucination or making up information
    private static final String[] HALLUCINATION_INDICATORS = {
            "as far as i know",
            "i believe",
            "i think",
            "probably",
            "it seems like",
            "it appears that",
            "i assume",
            "i would guess",
            "most likely",
            "presumably"
    };

    // Contradictory phrases that might indicate confusion
    private static final String[] CONTRADICTION_INDICATORS = {
            "however, on the other hand",
            "but actually",
            "or maybe",
            "alternatively, it could be",
            "i'm not certain, but"
    };

    // CloudX-specific facts that should be accurate
    private static final String[][] CLOUDX_FACTS = {
            // Format: {incorrect_value, correct_value, context}
            { "99.9% uptime", "99.99%", "enterprise tier" },
            { "$599", "$499", "starter tier monthly" },
            { "$2,999", "$1,999", "professional tier monthly" },
            { "aws only", "aws, azure, and google cloud", "multi-cloud support" },
            { "competecloud is cheaper", "cloudx is 8% lower for enterprise", "enterprise pricing" }
    };

    @Override
    public OutputGuardrailResult validate(AiMessage responseFromLLM) {
        Log.info("HallucinationGuardrail: Validating LLM response");

        String content = responseFromLLM.text();
        String contentLower = content.toLowerCase();
        Log.debug("HallucinationGuardrail: Response content length: " + content.length() + " characters");

        // 1. Check for uncertainty phrases (model admitting it doesn't know)
        String uncertaintyPhrase = detectUncertaintyPhrase(contentLower);
        if (uncertaintyPhrase != null) {
            Log.warn("HallucinationGuardrail: Detected uncertainty phrase: '" + uncertaintyPhrase + "'");
            return reprompt(
                    "The response contains uncertainty phrases. ",
                    "Please provide a confident answer based strictly on the CloudX sales enablement materials. " +
                            "If the information is not available in the provided documents, clearly state that the information is not in the available materials rather than expressing uncertainty.");
        }

        // 2. Check for hallucination indicators (hedging language suggesting
        // uncertainty)
        String hallucinationIndicator = detectHallucinationIndicator(contentLower);
        if (hallucinationIndicator != null) {
            Log.warn("HallucinationGuardrail: Detected hallucination indicator: '" + hallucinationIndicator + "'");
            return reprompt(
                    "The response contains hedging language that suggests uncertainty. ",
                    "Please provide a confident, fact-based answer using only information from the CloudX sales enablement materials. "
                            +
                            "If the information is not in the documents, clearly state that the information is not available rather than speculating or using uncertain language.");
        }

        // 3. Check for contradictory statements
        String contradictionIndicator = detectContradictionIndicator(contentLower);
        if (contradictionIndicator != null) {
            Log.warn("HallucinationGuardrail: Detected contradiction indicator: '" + contradictionIndicator + "'");
            return reprompt(
                    "The response contains contradictory or conflicting statements. ",
                    "Please provide a clear, consistent answer based on the CloudX sales enablement materials. "
                            +
                            "Ensure all information is coherent and does not present conflicting details.");
        }

        // 4. Check for too short/lazy answers
        if (content.trim().length() < 20) {
            Log.warn("HallucinationGuardrail: Response too short - " + content.trim().length() + " characters");
            return reprompt(
                    "The response is too brief and lacks sufficient detail. ",
                    "Please provide a comprehensive response with specific details, examples, and concrete information from the CloudX sales enablement materials.");
        }

        // 5. Check for overly generic responses
        if (isOverlyGeneric(contentLower)) {
            Log.warn("HallucinationGuardrail: Response is overly generic - lacks CloudX-specific details");
            return reprompt(

                    "The response is too generic and lacks specific CloudX details. ",
                    "Please provide concrete information about CloudX features, pricing, capabilities, competitive advantages, "
                            +
                            "or specific use cases from the sales enablement materials. Include specific product names, pricing tiers, percentages, or technical details where relevant.");
        }

        // 6. Check for potential factual errors about CloudX
        String factualError = detectFactualError(contentLower);
        if (factualError != null) {
            Log.warn("HallucinationGuardrail: Detected potential factual error: " + factualError);
            return reprompt(
                    "The response may contain a factual error: " + factualError + ". ",
                    "Please carefully verify all information against the CloudX sales enablement materials and provide accurate, verified details. "
                            +
                            "Only include information that is explicitly stated in the provided documents.");
        }

        // 7. Check for excessive hedging (multiple uncertainty markers)
        int hedgingCount = countHedgingPhrases(contentLower);
        if (hedgingCount >= 3) {
            Log.warn("HallucinationGuardrail: Excessive hedging detected - " + hedgingCount + " hedging phrases found");
            return reprompt(
                    "The response contains excessive hedging language that suggests uncertainty. ",
                    "Please provide a confident, fact-based answer using information directly from the CloudX sales enablement materials. "
                            +
                            "Avoid hedging phrases and present information with confidence when it is supported by the documents.");
        }

        // All checks passed
        Log.info("HallucinationGuardrail: Response validated successfully - no hallucination indicators detected");
        return success();
    }

    private String detectUncertaintyPhrase(String content) {
        for (String phrase : UNCERTAINTY_PHRASES) {
            if (content.contains(phrase)) {
                return phrase;
            }
        }
        return null;
    }

    private String detectHallucinationIndicator(String content) {
        for (String indicator : HALLUCINATION_INDICATORS) {
            if (content.contains(indicator)) {
                return indicator;
            }
        }
        return null;
    }

    private String detectContradictionIndicator(String content) {
        for (String indicator : CONTRADICTION_INDICATORS) {
            if (content.contains(indicator)) {
                return indicator;
            }
        }
        return null;
    }

    private boolean isOverlyGeneric(String content) {
        // Check if response lacks specific CloudX details
        String[] specificKeywords = {
                "cloudx", "starter tier", "professional tier", "enterprise tier",
                "$499", "$1,999", "$5,999", "99.99%", "multi-cloud",
                "competecloud", "skyplatform", "techgiant",
                "kubernetes", "aws", "azure", "google cloud"
        };

        int specificCount = 0;
        for (String keyword : specificKeywords) {
            if (content.contains(keyword)) {
                specificCount++;
            }
        }

        // If response is longer than 100 chars but has no specific CloudX details, it's
        // too generic
        return content.length() > 100 && specificCount == 0;
    }

    private String detectFactualError(String content) {
        // Check for common factual errors about CloudX
        for (String[] fact : CLOUDX_FACTS) {
            String incorrectValue = fact[0];
            String correctValue = fact[1];
            String context = fact[2];

            if (content.contains(incorrectValue)) {
                return "Found '" + incorrectValue + "' but the correct value is '" + correctValue + "' for " + context;
            }
        }
        return null;
    }

    private int countHedgingPhrases(String content) {
        int count = 0;
        String[] hedgingPhrases = {
                "might", "maybe", "perhaps", "possibly", "could be",
                "may be", "seems", "appears", "likely", "probably"
        };

        for (String phrase : hedgingPhrases) {
            if (content.contains(phrase)) {
                count++;
            }
        }
        return count;
    }
}

OutOfScopeGuardrail

src/main/java/com/ibm/guardrails/OutOfScopeGuardrail.java keeps the final answer inside the CloudX sales enablement domain. This matters because a retrieved chunk and a helpful model can still drift into competitor internals, unrelated IBM products, personal advice, or pricing negotiation.

package com.ibm.guardrails;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.guardrail.OutputGuardrail;
import dev.langchain4j.guardrail.OutputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
import io.quarkus.logging.Log;

/**
 * OutOfScopeGuardrail ensures the AI assistant stays within the boundaries of
 * CloudX sales enablement content and doesn't provide information outside its
 * domain.
 *
 * Based on the sales enablement resources, the scope includes:
 * - CloudX Enterprise Platform features, pricing, and capabilities
 * - Competitive analysis and positioning (based on public information)
 * - Sales methodology and processes
 * - Customer success stories and ROI information
 * - Technical architecture and supported technologies
 * - Migration strategies and implementation approaches
 *
 * Out of scope includes:
 * - Competitor internal operations or confidential information
 * - Non-CloudX IBM products or third-party services (unless in context of
 * integration/comparison)
 * - Legal, financial, tax, or investment advice
 * - Personal or non-business advice
 * - Confidential customer information or unreleased features
 * - Custom pricing negotiations (should be referred to sales team)
 * - General technology tutorials unrelated to CloudX
 */
@ApplicationScoped
public class OutOfScopeGuardrail implements OutputGuardrail {

    // Keywords indicating competitor-specific internal information (out of scope)
    private static final String[] COMPETITOR_INTERNAL_KEYWORDS = {
            "competecloud's internal", "competecloud roadmap", "competecloud strategy",
            "skyplatform's internal", "skyplatform roadmap", "skyplatform strategy",
            "techgiant's internal", "techgiant roadmap", "techgiant strategy",
            "competitor's source code", "competitor's architecture"
    };

    // Keywords indicating non-CloudX products (out of scope)
    private static final String[] NON_CLOUDX_PRODUCTS = {
            "watson", "db2", "websphere traditional", "maximo", "cognos",
            "spss", "qradar", "guardium", "appscan", "rational",
            "aws lambda", "azure functions", "google cloud run",
            "heroku", "digitalocean", "linode"
    };

    // Keywords indicating requests for confidential/inappropriate information
    private static final String[] CONFIDENTIAL_KEYWORDS = {
            "confidential customer", "internal only", "proprietary information",
            "trade secret", "non-disclosure", "customer's private",
            "competitor's financials", "unreleased feature", "beta feature"
    };

    // Keywords indicating legal/financial advice requests (out of scope)
    private static final String[] ADVICE_KEYWORDS = {
            "legal advice", "tax advice", "investment advice", "financial planning",
            "should i invest", "legal opinion", "tax implications",
            "securities advice", "compliance advice", "audit advice"
    };

    // Keywords indicating personal/non-business requests (out of scope)
    private static final String[] PERSONAL_KEYWORDS = {
            "personal recommendation", "what should i do with my career",
            "help me with my resume", "dating advice", "health advice",
            "medical advice", "therapy", "counseling"
    };

    // Keywords indicating requests for custom pricing/negotiations (should be
    // referred)
    private static final String[] NEGOTIATION_KEYWORDS = {
            "negotiate my contract", "get me a better deal", "discount my price",
            "override the pricing", "special pricing for me", "custom contract terms"
    };

    @Override
    public OutputGuardrailResult validate(AiMessage responseFromLLM) {
        Log.info("OutOfScopeGuardrail: Validating LLM response");

        String content = responseFromLLM.text().toLowerCase();
        Log.debug("OutOfScopeGuardrail: Response content length: " + content.length() + " characters");

        // Check for various out-of-scope categories
        String detectedIssue = detectOutOfScopeContent(content);

        if (detectedIssue != null) {
            Log.warn("OutOfScopeGuardrail: Detected out-of-scope content - Issue type: " + detectedIssue);
            return buildOutOfScopeResponse(detectedIssue);
        }

        // Response is in scope
        Log.info("OutOfScopeGuardrail: Response validated successfully - content is in scope");
        return success();
    }

    /**
     * Detects if the response contains out-of-scope content.
     * Returns a description of the issue if found, null otherwise.
     */
    private String detectOutOfScopeContent(String content) {
        // Priority order: Check most critical violations first

        // 1. Check for confidential information (highest priority)
        for (String keyword : CONFIDENTIAL_KEYWORDS) {
            if (content.contains(keyword)) {
                return "confidential";
            }
        }

        // 2. Check for legal/financial advice
        for (String keyword : ADVICE_KEYWORDS) {
            if (content.contains(keyword)) {
                return "advice";
            }
        }

        // 3. Check for personal requests
        for (String keyword : PERSONAL_KEYWORDS) {
            if (content.contains(keyword)) {
                return "personal";
            }
        }

        // 4. Check for competitor internal information
        for (String keyword : COMPETITOR_INTERNAL_KEYWORDS) {
            if (content.contains(keyword)) {
                return "competitor_internal";
            }
        }

        // 5. Check for non-CloudX products (only if not in CloudX context)
        for (String product : NON_CLOUDX_PRODUCTS) {
            if (content.contains(product) && !isCloudXContext(content)) {
                return "non_cloudx_product";
            }
        }

        // 6. Check for pricing negotiation requests
        for (String keyword : NEGOTIATION_KEYWORDS) {
            if (content.contains(keyword)) {
                return "negotiation";
            }
        }

        // 7. Check if response is about general technology not related to CloudX
        if (isGeneralTechnologyQuestion(content)) {
            return "general_technology";
        }

        return null;
    }

    /**
     * Checks if the content is discussing a product in the context of CloudX
     * (e.g., integration, comparison, migration from)
     */
    private boolean isCloudXContext(String content) {
        String[] cloudxContextKeywords = {
                "cloudx", "integrate with", "migrate from", "compared to",
                "alternative to", "replace", "modernize from"
        };

        for (String keyword : cloudxContextKeywords) {
            if (content.contains(keyword)) {
                return true;
            }
        }
        return false;
    }

    /**
     * Checks if the response is about general technology topics not related to
     * CloudX
     */
    private boolean isGeneralTechnologyQuestion(String content) {
        // Check if discussing technology without CloudX context
        String[] techKeywords = {
                "how to program", "learn programming", "tutorial for",
                "what is blockchain", "what is ai", "what is machine learning",
                "how does the internet work", "what is a database"
        };

        boolean hasTechKeyword = false;
        for (String keyword : techKeywords) {
            if (content.contains(keyword)) {
                hasTechKeyword = true;
                break;
            }
        }

        // If has tech keyword but no CloudX context, it's out of scope
        return hasTechKeyword && !isCloudXContext(content);
    }

    /**
     * Builds an appropriate out-of-scope response based on the detected issue.
     * Uses reprompt() to guide the LLM to provide a better, in-scope response.
     */
    private OutputGuardrailResult buildOutOfScopeResponse(String issueType) {
        Log.info("OutOfScopeGuardrail: Building reprompt response for issue type: " + issueType);

        String userMessage;
        String repromptMessage;

        switch (issueType) {
            case "confidential":
                userMessage = "The response contains references to confidential or proprietary information. ";
                repromptMessage = "Please provide a response that only uses publicly available information from the CloudX sales enablement materials. "
                        +
                        "Focus on CloudX features, pricing, competitive positioning, and sales methodology without revealing confidential details.";
                break;

            case "advice":
                userMessage = "The response appears to provide legal, financial, or investment advice. ";
                repromptMessage = "Please reframe the response to focus on CloudX's business value, ROI calculations, and pricing structure "
                        +
                        "without providing specific legal or financial advice. Suggest consulting appropriate advisors for such matters.";
                break;

            case "personal":
                userMessage = "The response addresses personal or non-business matters.";
                repromptMessage = "Please provide a response focused on CloudX sales enablement topics such as product features, "
                        +
                        "pricing, competitive analysis, sales methodology, or customer success stories.";
                break;

            case "competitor_internal":
                userMessage = "The response discusses competitors' internal strategies or confidential information.";
                repromptMessage = "Please limit the response to publicly available competitive comparisons based on the CloudX sales enablement materials. "
                        +
                        "Focus on how CloudX compares to competitors using public information and customer feedback.";
                break;

            case "non_cloudx_product":
                userMessage = "The response discusses products or services outside of CloudX Enterprise Platform. ";
                repromptMessage = "Please focus the response on CloudX-specific features, capabilities, and use cases. "
                        +
                        "If mentioning other products, only do so in the context of CloudX integration, migration, or comparison.";
                break;

            case "negotiation":
                userMessage = "The response attempts to negotiate specific pricing or contract terms. ";
                repromptMessage = "Please provide information about standard CloudX pricing tiers, discount guidelines, and the general pricing framework. "
                        +
                        "Indicate that specific negotiations should be handled by the sales manager and deal desk team.";
                break;

            case "general_technology":
                userMessage = "The response discusses general technology topics not related to CloudX. ";
                repromptMessage = "Please refocus the response on CloudX Enterprise Platform and its applications. " +
                        "Connect the technology discussion to CloudX use cases, deployment scenarios, or architecture if relevant.";
                break;

            default:
                userMessage = "The response appears to be outside the scope of CloudX sales enablement. ";
                repromptMessage = "Please provide a response focused on CloudX Enterprise Platform features, pricing, competitive analysis, "
                        +
                        "sales methodology, or customer success stories based on the available sales enablement materials.";
        }

        // Use reprompt() with both user message and system reprompt instruction
        Log.debug("OutOfScopeGuardrail: Reprompting with user message: " + userMessage);
        return reprompt(userMessage, repromptMessage);
    }
}

InputValidationGuardrail

src/main/java/com/ibm/guardrails/InputValidationGuardrail.java runs before the model call. It blocks prompt injection patterns, unrelated personal-service requests, malicious strings, and CloudX-adjacent topics that would turn this assistant into a general-purpose chatbot. That is not the job here.

package com.ibm.guardrails;

import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.guardrail.InputGuardrail;
import dev.langchain4j.guardrail.InputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
import io.quarkus.logging.Log;

/**
 * InputValidationGuardrail validates user input before it reaches the LLM.
 * It detects and blocks:
 * 1. Prompt injection attempts
 * 2. Off-topic questions outside CloudX sales enablement scope
 * 3. Malicious or inappropriate content
 * 
 * Based on CloudX sales enablement materials, valid topics include:
 * - CloudX Enterprise Platform features and capabilities
 * - Pricing and packaging information
 * - Competitive analysis and positioning
 * - Sales methodology and processes
 * - Customer success stories and ROI
 * - Technical architecture (multi-cloud, Kubernetes, supported languages)
 * - Migration and implementation strategies
 */
@ApplicationScoped
public class InputValidationGuardrail implements InputGuardrail {

    // Prompt injection patterns
    private static final String[] PROMPT_INJECTION_PATTERNS = {
        "ignore previous instructions",
        "ignore all previous",
        "disregard previous",
        "forget previous instructions",
        "new instructions:",
        "system:",
        "you are now",
        "act as",
        "pretend you are",
        "roleplay as",
        "simulate being",
        "override your",
        "bypass your",
        "ignore your guidelines",
        "forget your role",
        "new role:",
        "system prompt:",
        "assistant:",
        "###instruction:",
        "###system:",
        "[system]",
        "",
        "sudo mode",
        "developer mode",
        "jailbreak",
        "dan mode"
    };

    // Off-topic technology combinations (not supported by CloudX)
    private static final String[][] OFF_TOPIC_COMBINATIONS = {
        // Format: {technology, unsupported_context, boundary_message}
        {"python", "google cloud", "CloudX supports Python on AWS, Azure, and Google Cloud. However, I specialize in CloudX sales enablement. For deployment questions, please refer to CloudX technical documentation."},
        {"node.js", "heroku", "CloudX supports Node.js but not Heroku deployment. CloudX works with AWS, Azure, and Google Cloud."},
        {".net", "digitalocean", "CloudX supports .NET but not DigitalOcean. CloudX is designed for AWS, Azure, and Google Cloud."},
        {"ruby", "linode", "CloudX supports Ruby but not Linode. CloudX operates on AWS, Azure, and Google Cloud."}
    };

    // Topics completely outside CloudX scope
    private static final String[] COMPLETELY_OFF_TOPIC = {
        // Food & Dining
        "recipe", "cooking", "food", "restaurant", "meal", "dinner", "lunch",
        // Entertainment
        "movie", "film", "entertainment", "music", "song", "concert", "show",
        // Sports
        "sports", "football", "basketball", "soccer", "baseball", "tennis",
        // Weather & Nature
        "weather", "climate", "temperature", "forecast",
        // Health & Medical
        "health", "medical", "doctor", "medicine", "hospital", "disease",
        // Personal Life
        "dating", "relationship", "romance", "wedding", "marriage",
        // Politics & Government
        "politics", "election", "government", "president", "senator",
        // Finance (non-business)
        "cryptocurrency", "bitcoin", "blockchain", "stock market", "forex",
        // Gaming
        "gaming", "video game", "playstation", "xbox", "nintendo",
        // Travel & Booking
        "flight", "hotel", "vacation", "travel", "booking", "reservation",
        "airline", "airport", "cruise", "trip", "tourism",
        // Shopping (non-software)
        "shopping", "buy clothes", "fashion", "shoes", "jewelry",
        // Education (non-tech)
        "homework", "essay", "school assignment", "college application",
        // Real Estate
        "house", "apartment", "real estate", "mortgage", "rent",
        // Automotive
        "car", "vehicle", "automobile", "driving", "traffic"
    };

    // Action verbs for non-CloudX services
    private static final String[] OFF_TOPIC_ACTIONS = {
        "book me", "book a", "reserve a", "schedule a",
        "order me", "buy me", "purchase a",
        "find me a", "get me a",
        "recommend a restaurant", "recommend a hotel",
        "plan my trip", "plan my vacation"
    };

    // Non-CloudX products (unless in comparison/migration context)
    private static final String[] NON_CLOUDX_PRODUCTS = {
        "watson", "db2", "websphere traditional", "maximo",
        "cognos", "spss", "qradar", "guardium",
        "heroku", "digitalocean", "linode", "netlify",
        "vercel", "railway", "render"
    };

    // Malicious content indicators
    private static final String[] MALICIOUS_PATTERNS = {
        "sql injection", "drop table", "delete from",
        "script>", "

`BotResponse`

src/main/java/com/ibm/api/BotResponse.java keeps the HTTP response shape boring. Successful answers and guardrail failures can use the same JSON wrapper, so the client only has one field to read.

package com.ibm.api;

public record BotResponse(String response) {
}

`InputGuardrailExceptionMapper`

src/main/java/com/ibm/api/InputGuardrailExceptionMapper.java maps blocked input to 400 Bad Request. Without this, a guardrail failure can look like a server problem. That is the wrong kind of drama.

package com.ibm.api;

import dev.langchain4j.guardrail.InputGuardrailException;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.ext.ExceptionMapper;
import jakarta.ws.rs.ext.Provider;
import io.quarkus.logging.Log;

/**
 * Exception mapper for InputGuardrailException.
 * Maps validation failures from InputValidationGuardrail to structured JSON responses.
 */
@Provider
public class InputGuardrailExceptionMapper implements ExceptionMapper {

    @Override
    public Response toResponse(InputGuardrailException exception) {
        Log.warn("InputGuardrailException caught: " + exception.getMessage());
        
        // Extract the validation error message from the exception
        String errorMessage = exception.getMessage();
        if (errorMessage == null || errorMessage.trim().isEmpty()) {
            errorMessage = "Input validation failed. Please ensure your question is related to CloudX Enterprise Platform sales enablement.";
        }
        
        // Return the error message in the same BotResponse format for consistency
        BotResponse errorResponse = new BotResponse(errorMessage);
        
        // Return 400 Bad Request with the structured response
        return Response.status(Response.Status.BAD_REQUEST)
                .entity(errorResponse)
                .type("application/json")
                .build();
    }
}

`SalesEnablementResource`

src/main/java/com/ibm/api/SalesEnablementResource.java exposes the demo as GET /bot?q=.... The fallback question keeps the endpoint easy to test from a browser, which is a small thing until you are doing the fifth local run.

package com.ibm.api;

import com.ibm.ai.SalesEnablementBot;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.QueryParam;
import jakarta.ws.rs.core.MediaType;

@Path("/bot")
public class SalesEnablementResource {

    @Inject
    SalesEnablementBot bot;

    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public BotResponse ask(@QueryParam("q") String question) {
        if (question == null || question.trim().isEmpty()) {
            question = "What is the best solution for a client who is migrating to a microservices architecture?";
        }
        String botResponse = bot.chat(question);
        return new BotResponse(botResponse);
    }
}

`Configuration`

Now wire the runtime pieces in src/main/resources/application.properties. These settings connect the Java classes above to Ollama, pgvector, and Docling:

# ----------------------------------------
# 1. Ollama configuration (local LLM)
# ----------------------------------------

# Chat model (answers)
quarkus.langchain4j.ollama.chat-model.model-name=gpt-oss:20b

# Embedding model (document + query vectors)
# Default Ollama library tag granite-embedding:latest maps to IBM Granite ~30M English (see `ollama show granite-embedding` → embedding length).
# Larger community tags (for example granite-embedding-278m-multilingual) often use 768 dimensions — always match quarkus.langchain4j.pgvector.dimension to `embedding length` from ollama show.
quarkus.langchain4j.ollama.embedding-model.model-name=granite-embedding:latest

# Set a more generous timeout
quarkus.langchain4j.ollama.timeout=60s

# Logging during development
quarkus.langchain4j.log-requests=false
quarkus.langchain4j.log-responses=false

# ----------------------------------------
# 2. Datasource and pgvector
# ----------------------------------------
quarkus.datasource.db-kind=postgresql

# Use default datasource for pgvector
# Store table name
quarkus.langchain4j.pgvector.table=embeddings

quarkus.langchain4j.pgvector.drop-table-first=true
quarkus.langchain4j.pgvector.create-table=true

# Must equal the embedding model output width (same as `embedding length` from `ollama show `).
# granite-embedding:latest → 384. If you switch to a 768-dim model, set 768 and drop/recreate the table or re-ingest.
quarkus.langchain4j.pgvector.dimension=384

# Optional, but recommended once data grows
quarkus.langchain4j.pgvector.use-index=true
quarkus.langchain4j.pgvector.index-list-size=10

# ----------------------------------------
# 3. Docling
# ----------------------------------------
# Docling Dev Service will start a container in dev mode and testing.
# The extension configures the REST client automatically.
# We configure the docling UI explicitly
quarkus.docling.devservices.enable-ui=true
quarkus.docling.timeout=3M
# Docling Serve may require auth (HTTP 401 without it). Sent as X-Api-Key; match DOCLING_SERVE_API_KEY on the server.
# Dev Services can populate this when it starts the container. If Docling is already listening on the default port,
# Quarkus may skip Dev Services—then set this explicitly to the same key your Docling instance expects.
# quarkus.docling.api-key=your-secret-here

# REST client timeout configuration for Docling
# Increase timeouts for large file processing (sync helper and Quarkus async client)
quarkus.rest-client."io.quarkiverse.docling.runtime.client.DoclingService".connect-timeout=60
quarkus.rest-client."io.quarkiverse.docling.runtime.client.DoclingService".read-timeout=300
quarkus.rest-client."io.quarkiverse.docling.runtime.client.QuarkusDoclingServeClient".connect-timeout=60
quarkus.rest-client."io.quarkiverse.docling.runtime.client.QuarkusDoclingServeClient".read-timeout=300

Notes:

quarkus.langchain4j.ollama.timeout covers slow local models. Increase it if you see client timeouts.
quarkus.langchain4j.pgvector.drop-table-first=true is fine for demos. Turn it off when the table contains data you care about.
REST client keys use the Docling REST client interfaces Quarkus generates: DoclingService for the blocking helper and QuarkusDoclingServeClient for the Mutiny task API. If these fully qualified class names change in a future extension, copy the new names from the Dev UI or extension docs.
quarkus.docling.api-key (or QUARKUS_DOCLING_API_KEY) supplies the X-Api-Key header for Docling Serve. If you see 401 Unauthorized from QuarkusDoclingServeClient, the server is enforcing API key auth and your app must send the matching secret (or align Dev Services with a running container on the default port).
Large PDFs may need higher Docling read timeouts. The quarkus-docling issue tracker discusses gateway timeouts for very large uploads.

`Static UI`

The demo includes a small HTML client at src/main/resources/META-INF/resources/index.html, which is the standard Quarkus static resource location. It posts questions to /bot and renders Markdown in the browser. Copy it from the repository if you create the project from the CLI. I do not repeat it here because the article is already long enough.

`Production Hardening`

Timeouts and back-pressure: Ollama and Docling run outside your JVM. Set REST and Ollama timeouts explicitly. Configure both DoclingService and QuarkusDoclingServeClient for long-running conversions. For very large PDFs, increase the read timeout and follow upstream guidance on gateway limits.

Docling auth: When Docling Serve enables API keys, configure quarkus.docling.api-key so async convert/poll/result calls include X-Api-Key. Without it you get HTTP 401 from the client.

Startup vs RAG readiness: Background ingestion means the HTTP port opens before pgvector is full. Put /q/health/ready (SmallRye Health) in front of production traffic. The bundled IngestionReadinessCheck stays DOWN until indexing completes. If you call /bot without waiting, answers may have little retrieved context.

Event loop safety: Docling’s Mutiny chain and the embedding loop run on the worker pool, not the Vert.x event loop. Keep blocking LangChain4j calls off the event loop when you extend the pipeline.

Vector store integrity: Changing the embedding model or dimension without recreating the table produces bad retrieval. Treat embedding config like a schema migration. It is less exciting than debugging why every answer is confidently adjacent to the truth.

Guardrails and abuse: The sample uses pattern-based input and output guardrails. They reduce obvious misuse but are not a full safety program. Rate-limit and authenticate any external deployment of /bot.

Observability: Retrieval logging in DocumentRetriever shows which chunks influenced a reply. Keep that in dev, then trim or gate it in production.

`Verification`

Pull models (example): ollama pull gpt-oss:20b and ollama pull granite-embedding:latest
From the module root: ./mvnw quarkus:dev
Watch logs for Document ingestion pipeline finished; readiness is UP. Optionally poll readiness:

curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/q/health/ready

Expect 503 while ingestion runs, then 200 when the index is ready. An empty corpus may flip to UP quickly.

Open http://localhost:8080/
for the bundled UI, or call the API after readiness is 200:

curl -s "http://localhost:8080/bot?q=What%20CloudX%20tier%20fits%20a%20regulated%20industry%20customer?"

Expect JSON {"response":"..."} with content grounded in your src/main/resources/documents/ files. If you curl immediately on a cold start, the model may answer with little retrieved context until indexing completes.

`Conclusion`

You now have a single Quarkus module that turns messy PDFs into structured text, stores embeddings in pgvector, and answers through an Ollama-backed model with explicit guardrails. That is enough to start moving toward production agent tooling without changing the basic shape of the stack.

The complete, updated code is available in the enterprise-rag repository.

Subscribe now



How to Run IBM Bob on a Remote Linux Machine from Your Mac
Markus Eisele — Thu, 23 Apr 2026 06:08:06 GMT
Pointing IBM Bob straight at your laptop is the fastest way to get started. Open the editor, load a project, ask Bob to inspect some code, and it immediately feels useful. There is no extra infrastructure, no remote machine, and no SSH setup to think about.
That simplicity is also the trap. Once Bob can read files, edit code, and run shell commands, your local machine becomes the default execution environment for everything. Source code, build caches, generated files, package installs, test artifacts, and agent-driven terminal commands all pile up in the same place. For quick experiments this is fine. For larger codebases or more sensitive environments, it gets hard to control.
A remote development setup changes that boundary. Bob still runs in the editor on your Mac, but the actual workspace lives on a remote Linux machine reached over SSH. File edits happen there. Terminal commands run there. Toolchains and build outputs stay there. That makes the environment easier to reproduce and easier to throw away when something goes wrong.
The jeanp413.open-remote-ssh extension is useful here because it lets the editor open a folder on a remote machine over SSH. Its project describes it exactly that way, and its supported SSH hosts include common Linux targets. That gives us a clean way to move Bob’s working environment off the laptop without changing how the editor feels day to day. 
For this tutorial, we do not need a real cloud VM. Podman on macOS already depends on a Linux virtual machine because Linux containers need the Linux kernel. Podman’s machine feature gives us that VM, so we can run an SSH-enabled Linux container inside it and use that container as a simulated remote host. 
Why Remote Machines Matter Beyond Isolation
Keeping Bob away from your laptop is one good reason to use a remote machine, but it is not the only one. In many teams, the bigger value is consistency. A remote machine gives every developer the same base OS, the same toolchain, the same package versions, and the same path layout. That removes a whole category of “works on my machine” problems. When Bob runs commands in that environment, it sees the same setup your teammates see. That makes its suggestions and fixes more relevant.
Remote machines also help when the real target environment is Linux. Many Java projects are built and deployed on Linux, even when developers work on macOS or Windows. Running Bob against a remote Linux machine means builds, scripts, file permissions, shell behavior, and container tooling behave much closer to production. This is especially useful when startup scripts, native binaries, or CI jobs depend on Linux-specific behavior. You catch those differences earlier, before they turn into deployment surprises.
Another important point is access to internal systems. In enterprise environments, the code you need is often not fully reachable from a personal laptop. Internal Git servers, package registries, artifact repositories, databases, and mounted filesystems may only be available from a managed network zone. A remote machine inside that zone becomes the place where Bob can work with the real project context. Your laptop stays outside, while the remote host becomes the bridge to systems that are sensitive, regulated, or simply not meant to be exposed broadly.
There is also a resource argument. Some projects need more CPU, memory, disk, or network bandwidth than you want to dedicate on a local machine. Large Maven builds, indexing big monorepos, running integration tests, or pulling large container images can make a laptop unpleasant to use. A remote machine can absorb that load instead. Bob still feels local in the editor, but the heavy work happens somewhere better suited for it.
Remote machines are also useful for team onboarding. Instead of telling every new developer to install a long list of SDKs, CLIs, certificates, shells, and package managers, you can provide a prepared remote environment. That shortens the time until someone is productive. It also reduces drift. Bob benefits from that too, because it works in an environment that already reflects how the team actually builds, tests, and ships software.
One more aspect is recovery. Local environments tend to accumulate damage slowly. A broken package install, a conflicting runtime, a strange shell setting, or a half-finished experiment can stay around for weeks. Remote environments are easier to rebuild. If the machine gets messy, you replace it. That matters even more when you work close to sensitive systems. You want the environment to be reproducible, replaceable, and easy to reason about.
So the value of remote machines is not just isolation. It is also consistency, Linux parity, enterprise access, better resource usage, easier onboarding, and faster recovery. And when Bob needs to work near sensitive environments, remote machines give you a more realistic and practical place to do that work.
Prerequisites
You need a few things in place before you start.
IBM Bob installed  ( free 30 day trial)
Podman installed 
OpenSSH client available in your terminal
Basic comfort with shell commands and SSH config
Project setup
We start by creating the Linux environment that will act like the remote host.
Create and start the Podman machine:
podman machine init --cpus 4 --memory 8192 --disk-size 40
podman machine start
podman info
On macOS, Podman cannot run Linux containers directly on the host OS. It needs a Linux virtual machine underneath, and podman machine init creates exactly that. The Podman docs also note that SSH keys are generated automatically for access to the VM itself. 
Now create a small working folder on your Mac for the demo:
mkdir -p ~/bob-remote-demo/ssh
cd ~/bob-remote-demo
Generate a dedicated SSH key for the simulated remote host:
ssh-keygen -t ed25519 -f bob-remote-demo/ssh/bob_remote_key -N ""
We use a dedicated key because it keeps the test isolated. Later, cleanup is simple. You also avoid mixing this experiment with your normal laptop SSH identities.
Next, create a Containerfile that defines the remote Linux machine:
⚠️ NOTE: I had to remove the beginning “/” before all three occurrences of etc because of the stupid configuration of Substack’s Cloudflare blocking. Make sure to correct this before you use the file! 
FROM docker.io/ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    openssh-server \
    sudo \
    bash \
    curl \
    git \
    ca-certificates \
    tar \
    gzip \
    unzip \
    procps \
    less \
    nano \
    vim \
    iproute2 \
    openjdk-21-jdk \
    maven \
    && rm -rf /var/lib/apt/lists/*

RUN useradd -m -s /bin/bash bob \
    && passwd -d bob \
    && echo "bob ALL=(ALL) NOPASSWD:ALL" > etc/sudoers.d/bob \
    && chmod 0440 etc/sudoers.d/bob

RUN mkdir -p /var/run/sshd /home/bob/.ssh \
    && chown -R bob:bob /home/bob/.ssh \
    && chmod 700 /home/bob/.ssh

COPY ssh/bob_remote_key.pub /home/bob/.ssh/authorized_keys

RUN chown bob:bob /home/bob/.ssh/authorized_keys \
    && chmod 600 /home/bob/.ssh/authorized_keys

RUN printf '%s\n' \
    'Port 2222' \
    'PermitRootLogin no' \
    'PasswordAuthentication no' \
    'KbdInteractiveAuthentication no' \
    'ChallengeResponseAuthentication no' \
    'UsePAM no' \
    'PubkeyAuthentication yes' \
    'AllowUsers bob' \
    'X11Forwarding no' \
    'AllowTcpForwarding yes' \
    'ClientAliveInterval 300' \
    'ClientAliveCountMax 2' \
    > etc/ssh/sshd_config.d/remote-dev.conf

USER bob
WORKDIR /home/bob
RUN mkdir -p /home/bob/workspace/demo-app

USER root
EXPOSE 2222

CMD ["/usr/sbin/sshd", "-D", "-e"]
This image gives us a plain Ubuntu remote host with SSH, a normal user account, Git, Java 21, and Maven. That is enough to test a real editor-over-SSH workflow and then let Bob operate on a Java project remotely.
A few design choices are worth calling out. We disable root login. We disable password authentication. We allow only key-based login for the bob user. That keeps the remote boundary simple and predictable. We do allow SSH forwarding because remote editor workflows often need it. If you know you do not need it, you can tighten that later.
Build the image:
podman build --no-cache --arch amd64 -t bob-remote-ubuntu:24.04 .
The --arch amd64 part matters on Apple Silicon. Without it, Podman will usually build an arm64 image because that matches the host machine. That sounds fine until Bob tries to install its remote server component on the container. In our testing, the Bob remote install flow detected aarch64, requested a Linux arm64 BobIDE server build, and got a 404 from the download endpoint. Building the simulated remote host as amd64 avoids that problem and makes the container look like a more typical x86_64 Linux development machine. The --no-cache flag is useful here too. It forces a clean rebuild, which helps when you change SSH keys, account setup, or the container image itself during testing.
Run the remote host container:
podman run -d \
  --name bob-remote-host \
  --platform linux/amd64 \
  -p 2222:2222 \
  bob-remote-ubuntu:24.04
The --publish option maps the container’s SSH port to your Mac. Podman documents --publish as the mechanism for exposing a container port on the host, which is exactly what we need so localhost:2222 becomes our remote entry point.
The build and run commands should match. We build an amd64 image, and we run it explicitly as linux/amd64, so Bob sees a remote Linux x86_64 machine and can use the expected remote server package.
By now, the basic shape is in place: Podman provides the Linux VM, and inside it we run a container that behaves like a remote development machine.
Implementation
Now we wire the SSH access, test it from the terminal, connect Bob, and verify that Bob is really operating against the remote Linux machine.
Start with your local SSH config. Open ~/.ssh/config and add this host:
Host bob-podman-demo
  HostName 127.0.0.1
  Port 2222
  User bob
  IdentityFile ~/bob-remote-demo/ssh/bob_remote_key
  IdentitiesOnly yes
  StrictHostKeyChecking accept-new
  ServerAliveInterval 30
  ServerAliveCountMax 3
  ForwardAgent yes
This alias is important. Do not skip it. A stable SSH alias gives you a stable target name inside Bob and your terminal. Later, if you replace the backend host, you can keep using bob-podman-demo and only change the actual hostname or port. That is a small detail, but it makes real remote workflows easier to maintain.
Now test raw SSH from the terminal:
ssh bob-podman-demo
You should land on the remote host and see a shell prompt for the bob user. Once you are there, verify the basics:
whoami
pwd
uname -a
java -version
mvn -version
Expected behavior is simple. whoami returns bob. pwd starts in /home/bob. uname -a shows Linux, not macOS. java -version and mvn -version confirm the remote toolchain is installed.
Create a small Java project on the remote host so Bob has something real to work with:
cd ~/workspace/demo-app
git init

cat > pom.xml <<'EOF'

  4.0.0

  com.example
  bob-remote-demo
  1.0.0-SNAPSHOT

  
    21
    UTF-8
  

  
    
      
        org.apache.maven.plugins
        maven-compiler-plugin
        3.15.0
        
          21
        
      
    
  

EOF

mkdir -p src/main/java/com/example
cat > src/main/java/com/example/Main.java <<'EOF'
package com.example;

public class Main {
    public static void main(String[] args) {
        System.out.println("Hello from the remote machine.");
    }
}
EOF

cat > README.md <<'EOF'
# Bob Remote Demo

This project is used to verify that IBM Bob is operating against a remote Linux machine over SSH.
EOF

mvn -q compile
This project stays intentionally small. The goal here is not Maven or Java fundamentals. The goal is to prove that Bob can read, change, build, and test code remotely.
Exit the SSH session:
exit
Now open IBM Bob on your Mac. The Open Remote - SSH extension from jeanp413 should already be installed (If not, grab it from open-vsx). 
Open the command palette in Bob and connect the current window to bob-podman-demo. This should look like this:
You can also open the Terminal window and you will directly see user bob logged in.
Right after you choose the host from the command palette, Bob starts a normal SSH connection using the identity configured for that host. In this case it picks the dedicated bob_remote_key, authenticates with public key login, and then runs a remote install script on the Linux machine. That script checks the remote platform and architecture, prepares the ~/.bobide-server directory, and looks for a matching bobide-server build. Because the server is already installed and running in this log, Bob skips the download, reuses the existing remote server process, reads its connection token, finds the port it is listening on, and then creates local port forwarding back to that remote process. This is the important part: Bob is not just opening files over raw SSH. It uses SSH first, then boots or reuses a Bob remote server on the target machine, and finally tunnels your local editor session to that server so the remote workspace behaves like a local IDE window.
Then open this folder:
/home/bob/workspace/demo-app
Once the remote folder opens, stop for a second and verify what you are looking at. You are still using Bob locally, but the workspace itself is remote. That means file edits and shell commands now happen on the remote Linux target, not on your macOS host. This is the core architectural change.
Open the integrated terminal in the Bob window and run:
whoami
pwd
uname -a
ls -la
mvn -q compile
If all of that works, Bob is now operating against the remote Linux machine.
Next, ask Bob to inspect the project. A good first prompt is:
Then ask Bob to make a controlled change:
Add a JUnit 5 test to this Maven project, explain the changes first, then apply them.
Finally, ask it to run the test:
Run the tests and explain the output.
IBM Bob’s docs highlight file access and the ability to run terminal or shell commands from inside Bob. 
A note about real enterprise environments
This tutorial uses a plain Linux container to simulate a remote machine. That is the right way to learn the workflow. But real enterprise environments often add one more layer. SSH connectivity may work, while access to the real workspace still fails because the remote session is missing extra identity or filesystem tokens. Maybe you run into kinit, aklog, and ~/.ssh/rc hooks for AFS token setup or other things. That is environment-specific, so we do not build it into the generic Podman flow here, but the pattern matters: a working SSH login does not automatically mean a fully initialized enterprise session.
The same is true for Bob-specific remote components. In some managed environments, a remote-side BobIDE server or helper runtime also has to be present. A managed Power or internal enterprise environment often does.
Configuration
There are three configurations that matter in this setup: Podman machine settings, the SSH daemon in the container, and your local SSH alias.
The Podman machine is created with:
podman machine init --cpus 4 --memory 8192 --disk-size 40
Remote editor sessions get unpleasant quickly when the underlying VM is too small. The editor does file watching, indexing, shell commands, Git activity, and language tooling. Give the machine too little memory and you get slow builds, laggy terminals, and flaky background tasks. Give it too few CPUs and simple operations take longer than they should.
The SSH daemon configuration inside the container is this:
Port 2222
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no
ChallengeResponseAuthentication no
UsePAM no
PubkeyAuthentication yes
AllowUsers bob
X11Forwarding no
AllowTcpForwarding yes
ClientAliveInterval 300
ClientAliveCountMax 2
PermitRootLogin no blocks the worst default mistake. PasswordAuthentication no removes password guessing. AllowUsers bob narrows the login surface. ClientAliveInterval and ClientAliveCountMax help dead sessions time out cleanly. We leave AllowTcpForwarding yes enabled because some remote workflows depend on it. If yours does not, turn it off.
Your local SSH alias matters just as much:
Host bob-podman-demo
  HostName 127.0.0.1
  Port 2222
  User bob
  IdentityFile ~/bob-remote-demo/ssh/bob_remote_key
  IdentitiesOnly yes
  StrictHostKeyChecking accept-new
  ServerAliveInterval 30
  ServerAliveCountMax 3
  ForwardAgent yes
IdentitiesOnly yes avoids weird behavior when your laptop has many SSH keys loaded. ForwardAgent yes is useful when the remote host itself needs to reach another Git server using your local agent. In plain local tests you may not need it, but it is a common real-world requirement.
One more configuration point matters from a security perspective. Some teams try to solve remote development by exposing container APIs over TCP. Podman warns directly against this. The API grants full access to Podman functionality and allows arbitrary code execution as the user running the API, and they strongly recommend against making the API socket available over the network. They recommend SSH forwarding instead when remote access is needed. 
Clean up
When you are done, destroy the remote host:
podman rm -f bob-remote-host
If you created the Podman machine only for this tutorial, you can stop it too:
podman machine stop
This disposable cleanup is one of the best operational advantages of the whole pattern. You can rebuild the environment from scratch instead of trying to repair a messy one.
Conclusion
We built a complete remote Bob workflow on macOS without needing a real remote server. Podman gave us the Linux VM that macOS needs for containers, we ran an SSH-enabled Ubuntu container inside it, and IBM Bob connected to that remote Linux machine through jeanp413.open-remote-ssh. The result is a cleaner boundary: Bob still runs in your editor, but the workspace, shell commands, build tools, and side effects live in a remote environment you can inspect, rebuild, and destroy. The main thing to remember is this: SSH is the transport, not the whole story. In simple setups, that is enough. In enterprise environments, you often need extra identity bootstrapping and sometimes Bob-specific remote components too. That split makes remote Bob setups much easier to understand and debug.
Subscribe now



Stop Letting AI Guess in Your Java Repository
Markus Eisele — Wed, 22 Apr 2026 06:08:09 GMT
A ticket says “add CSV export for cargo itineraries.” That sounds small until an AI coding tool starts working on a layered Jakarta EE codebase without a map. Now the real problem shows up. Which service already owns itinerary lookup? Which REST facade should expose the export? Which exception already means “not found”? Which serialization path is allowed in this repository? Without those answers, the tool does what tools do. It guesses, and the guesses look convincing right up to the point where they collide with the code you actually ship.
The problem is usually earlier than the prompt. In a layered Jakarta EE application, the hard part is not understanding English. The hard part is knowing where the system already solves the problem. A request like “add CSV export for cargo itineraries” sounds simple. In a real repository, it is a navigation problem. Which service already owns itinerary lookup? Which facade already exposes cargo data? Which exception means not found? Which serialization stack is allowed? Which package is off-limits because it breaks layering?
Senior developers do this mapping almost automatically. They look at the project structure, the recent commit history, the conventions, and the existing call graph before they write a line of code. AI coding tools fail when we skip that step and ask them to implement immediately. Then the tool guesses. It invents a helper class in the wrong layer. It creates a second copy of logic that already exists. It adds an endpoint that looks right but does not fit the API surface you actually ship.
That failure is easy to misread. It looks like “AI cannot do real work.” But the real issue is environmental. You asked the assistant to act like a senior teammate without giving it the same working context a senior teammate would demand on day one. No language server. No file boundaries. No recent history. No written rules. No project-local configuration that travels with the repository.
In this tutorial, we fix that problem the practical way. We build a small harness around the Jakarta EE Cargo Tracker example so IBM Bob can reason against the actual repository instead of guessing from text alone. We use the Eclipse JDT Language Server behind an MCP bridge, a filesystem server scoped to source code, optional Git context, a committed .bob/mcp.json, and two human checkpoints before implementation starts. Some people call this harness engineering. Some call it context engineering. The exact name is still settling. The important part is simple: structure in, structure out.
By the end, you will have a repeatable setup you can reuse for Cargo Tracker and adapt to your own Jakarta EE or Spring repositories.
Prerequisites
You do not need deep AI tooling knowledge for this tutorial, but you should be comfortable on the command line and able to read a Maven-based Java project. We will use IBM Bob in the examples, but the same idea works with any MCP-capable client that can launch external servers and consume project-local configuration.
Java 21 or later installed
Maven Wrapper available in the target repository
Maven 3.9+ on your PATH if you build the MCP bridge from source
Git, curl, and a shell
Node.js with npx for the reference filesystem server
Optional: uv or uvx for the Git MCP server
Optional: Homebrew on macOS for packaged jdtls
Basic familiarity with Maven, Git, and layered Java applications
Project Setup
We start with the upstream Cargo Tracker repository. The first rule of this whole article is simple: the project must build before the assistant touches it. A language server does not rescue a broken classpath. An MCP bridge does not fix dependency resolution. If ./mvnw compile fails, the rest of the harness only gives you better-informed confusion.
Let’s create a working directory:
mkdir ai-tooling-example
cd ai-tooling-example
Clone the project and prove it compiles:
git clone https://github.com/eclipse-ee4j/cargotracker.git \
  && cd cargotracker \
  && ./mvnw -q -DskipTests compile
This repository is a good training ground because it looks like a real enterprise application. It has a layered structure, real domain boundaries, Jakarta EE APIs, and enough moving parts that a coding assistant can easily get lost without help.
You get a standard Maven layout with pom.xml at the root, Java sources under src/main/java, and tests under src/test/java. More importantly, you get existing application, domain, infrastructure, and interface packages. That matters. We want the assistant to navigate those packages. We do not want it to invent fresh package roots because it did not see what already exists.
If the build fails here, stop and fix Java, Maven, or network access first. That is not a side issue. It is part of the environment. A harness built on a non-working repository just gives you better tools for generating bad output.
Install and Validate JDTLS
We need the Eclipse JDT Language Server because that is the part that understands Java symbols, definitions, references, classpath resolution, and project structure. Your IDE already relies on this kind of capability. We are moving that same capability into the assistant’s tool loop.
On macOS, the easy path is Homebrew:
brew install jdtls
brew info jdtls
If you want a manual install that works across operating systems, download the latest snapshot tarball from Eclipse and unpack it into a local directory:
JDTLS_TGZ=$(curl -fsSL https://download.eclipse.org/jdtls/snapshots/latest.txt)
curl -fLO "https://download.eclipse.org/jdtls/snapshots/${JDTLS_TGZ}"
mkdir -p ~/.local/jdtls && tar -xzf "${JDTLS_TGZ}" -C ~/.local/jdtls
After unpacking, you should see a plugins/ directory and one platform-specific configuration directory such as config_mac, config_linux, or similar. That platform directory matters. If you point jdtls at the wrong one, startup fails with OSGi errors that look confusing and unrelated to the real issue.
Here is a manual smoke test for macOS. On Linux, replace config_mac with config_linux.
LAUNCHER_JAR=$(ls ~/.local/jdtls/plugins/org.eclipse.equinox.launcher_*.jar | head -1)
java \
  -Declipse.application=org.eclipse.jdt.ls.core.id1 \
  -Dosgi.bundles.defaultStartLevel=4 \
  -Declipse.product=org.eclipse.jdt.ls.core.product \
  -Xmx1G -XX:+UseG1GC \
  -jar "${LAUNCHER_JAR}" \
  -configuration ~/.local/jdtls/config_mac \
  -data /tmp/jdtls-workspace-cargotracker
The -data directory is not your Git checkout. This is important. It is the language server’s own workspace cache and metadata area. The actual project path gets passed later through the language server handshake when the MCP bridge initializes the session.
A one-gigabyte heap is a good starting point for Cargo Tracker. You can reduce it later if you measure idle usage and know you have margin. But starting too small creates a different class of problem: slow indexing, unstable analysis, or random failures that look like language-server bugs when the real issue is starvation.
What does this give us? It gives the assistant symbol-level reality. jdtls knows what CargoRepository actually is. It knows where a method is declared, who references it, and how the classpath resolves imports. Without that, the assistant is doing fancy autocomplete over prose. With it, the assistant is navigating the same semantic graph your IDE uses.
It still has limits. It does not know your team’s conventions. It does not know what layers are socially forbidden. It does not know whether adding Jackson is acceptable just because a classpath contains it somewhere. That is why we need more than one server.
Build the LSP4J-MCP Bridge
Now we need a bridge between the Model Context Protocol world and the Java language server world. In this setup, we use LSP4J-MCP, which starts jdtls as a child process and exposes a smaller, controlled tool surface to the assistant.
Clone and build it:
git clone https://github.com/stephanj/LSP4J-MCP.git
cd LSP4J-MCP
mvn -q clean package -DskipTests
ls target/lsp4j-mcp-*.jar
At the time of writing, the project typically produces a shaded JAR with a name like lsp4j-mcp-1.0.0-SNAPSHOT.jar, but do not hardcode the exact version in your head. The safe habit is to inspect the target/ directory and then copy the resolved artifact into a stable project-local path.
Create local tool and log directories in Cargo Tracker, then copy the built JAR:
mkdir -p /path/to/cargotracker/.bob/tools /path/to/cargotracker/.bob/logs
cp target/lsp4j-mcp-*.jar /path/to/cargotracker/.bob/tools/lsp4j-mcp.jar
We give it a fixed local name, lsp4j-mcp.jar, because this is the name Bob will use from the committed configuration. This avoids rewriting config every time the bridge version changes.
You can do a standalone smoke launch before wiring Bob to it:
java -jar /path/to/cargotracker/.bob/tools/lsp4j-mcp.jar \
  /path/to/cargotracker \
  jdtls
Let that process sit for a while on first boot. The first run imports the Maven model and indexes the workspace. On a healthy setup, stderr shows project import progress or indexing activity. A broken setup exits immediately or throws classpath, Java version, or process launch errors.
This bridge is intentionally small. That is one of its strengths. It does not try to surface every possible LSP request. It exposes a smaller set of tools that are easy to review and safe to auto-approve in read-only mode. That smaller surface area is good for production teams because it limits accidental behavior and keeps the assistant’s tool menu understandable.
Typical tools exposed by this bridge include:
find_symbols
find_references
find_definition
document_symbols
find_interfaces_with_method
The exact names depend on the version you built, so always check the startup log or the bridge documentation before you finalize autoApprove. This is one of those details teams skip, and then they wonder why Bob keeps asking for approval on every call or fails because the configured tool name does not exist.
What does this bridge guarantee? It gives you language-aware discovery. That is the big win. What it does not guarantee is correctness of architecture or intent. It can tell the assistant where a class lives. It cannot tell the assistant whether adding a new service in that package is the right move. We still need explicit conventions and a human checkpoint for that.
Commit Project-Local MCP Configuration
The next step is the piece many teams miss. Do not leave the harness in somebody’s head or in a private desktop configuration. Commit it with the repository.
IBM Bob can read project-level MCP settings from .bob/mcp.json. That makes the environment reproducible. A teammate can clone the repository, open it, and inherit the same harness instead of reverse-engineering your local setup from screenshots and Slack messages.
First, make sure local logs stay out of version control:
printf '%s\n' '.bob/logs/' >> .gitignore
Now create .bob/mcp.json at the repository root:
{
  "mcpServers": {
    "java-lsp": {
      "type": "stdio",
      "command": "java",
      "args": [
        "-jar",
        "${workspaceFolder}/.bob/tools/lsp4j-mcp.jar",
        "${workspaceFolder}",
        "jdtls"
      ],
      "env": {
        "LOG_FILE": "${workspaceFolder}/.bob/logs/jdtls-mcp.log",
        "JAVA_HOME": "/Library/Java/JavaVirtualMachines/temurin-21.jdk/Contents/Home"
      },
      "autoApprove": [
        "find_symbols",
        "find_references",
        "find_definition",
        "document_symbols",
        "find_interfaces_with_method"
      ],
      "disabled": false
    },
    "filesystem": {
      "type": "stdio",
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem@latest",
        "${workspaceFolder}/src"
      ],
      "autoApprove": [
        "read_file",
        "list_directory",
        "search_files"
      ],
      "disabled": false
    },
    "git": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "mcp-server-git",
        "--repository",
        "${workspaceFolder}"
      ],
      "autoApprove": [
        "git_log",
        "git_diff",
        "git_show"
      ],
      "disabled": false
    }
  }
}
This file is small, but it changes how the assistant behaves in a big way. Now Bob has three different ways to ground itself.
The java-lsp server answers semantic questions. It knows where symbols are defined and how code relates.
The filesystem server answers raw text questions. It can read files, list directories, and search source content.
The git server answers historical questions. It can show diffs, recent changes, and implementation intent from the repository history.
Together, those three servers approximate what a senior developer does mentally before touching code.
There are a few details here worth slowing down for.
The type is stdio for all three servers. That means Bob launches child processes and speaks MCP over standard input and output. This is simple and reliable. It also means broken command paths fail fast.
The ${workspaceFolder} variable matters a lot. Hard-coded local paths break the setup for everyone else. If your Bob release uses a different token, update it once in the committed config and document it. Do not hide that difference in tribal knowledge.
The LOG_FILE environment variable is a support tool. When something goes wrong, you want one place to tail logs. A missing log directory is not fatal in every setup, but it makes debugging harder and pushes errors into stderr where they get lost.
The JAVA_HOME setting is convenient and fragile at the same time. The example above is a macOS Temurin path. That is fine for a single-machine demo, but teams usually want one of two approaches. Either keep separate snippets for macOS and Linux in an internal doc, or remove JAVA_HOME from the committed file and rely on the parent environment. The important thing is to be explicit. Wrong JAVA_HOME values produce class version or startup errors that look like project bugs even though the problem is just the runtime.
The autoApprove lists deserve security thinking. These are read-oriented tools only. Keep it that way unless you have a very deliberate reason to expose write tools. The moment you auto-approve a mutating tool, you expand the assistant’s blast radius.
The filesystem scope is one of the most important design choices in the whole article. We point it at ${workspaceFolder}/src, not the whole repository. That is deliberate. Yes, it costs some convenience. No, the assistant cannot casually open README.md or inspect build output or local scratch files. That is the point. Narrow scope reduces accidental exposure of secrets, noisy directories, and irrelevant files.
The package reference uses @latest in this example because it is the easiest way to show the setup. In a team setting, pin it after validation. Cold-starting against whatever the registry says is “latest” makes laptops drift and turns debugging into archaeology.
The git server is optional, but it adds real value. Recent history often tells you which test file was changed last for a similar feature, which class is the real integration point, or which package is alive versus effectively abandoned. That kind of signal helps the assistant follow the grain of the codebase instead of fighting it.
Add a Repository Conventions File
Language tools tell the assistant what exists. They do not tell it what your team considers acceptable. That is why we add a committed AGENTS.md at the repository root.
Create AGENTS.md like this:
# Conventions

## Architecture
- Strict layering: interfaces → application → domain → infrastructure
- Domain types stay free of web and persistence annotations
- CDI constructor injection in application services
- Repositories are interfaces in domain; JPA implementations live in infrastructure

## Naming
- Application services: `*Service` under `application/internal/`
- REST facades: `*RestService` under `interfaces/rest/`
- JPA implementations: `Jpa*Repository`

## Serialization
- JSON via Jakarta JSON Binding in the stack versions Cargo Tracker already uses
- Keep serialization helpers out of domain entities unless the project explicitly allows it

## Testing
- Integration tests: `*IT.java`, follow Arquillian patterns already in the tree
- Unit tests: `*Test.java` with JUnit 5
- Reuse existing test data bootstrap patterns; do not invent a parallel database lifecycle

## Runtime descriptor and test packaging safety rules
Be careful with deployment descriptors and runtime-specific test resources.

Descriptor filenames, XML root elements, and schemas must match exactly.
Do not rename one descriptor type into another.
Do not package a standard `web.xml` file as `ibm-web-bnd.xml`.
Do not package a Liberty binding descriptor as `web.xml`.

When working with ShrinkWrap and Arquillian:
- inspect every file added through `addAsWebInfResource`
- confirm the source file content matches the target filename
- reuse existing repository examples before creating new descriptors
- prefer the minimal archive that works
- if no working example exists, stop and explain the uncertainty instead of inventing a runtime descriptor

Required self-check before finalizing test code:
- `web.xml` must contain the correct `web-app` root element
- `ibm-web-bnd.xml` must contain the correct Liberty binding root element
- no descriptor may be duplicated under the wrong target filename
- runtime-specific resources must follow existing repository conventions

## Runtime and verification
- For local runtime validation in this repository, use the Open Liberty profile
- Build with `./mvnw clean package -Popenliberty`
- Run with `./mvnw liberty:run -Popenliberty`
- Do not verify new REST behavior against a different runtime unless explicitly requested
This file is simple governance. It does not try to explain the entire architecture. It sets boundaries. That is enough to stop a lot of common assistant mistakes.
For example, the classpath might contain something Jackson-related through another path or dependency. A generic model sees ObjectMapper and reaches for it. Your conventions file says no, this codebase uses Jakarta JSON Binding and serialization helpers stay out of domain entities. That single sentence removes a whole category of bad diffs.
Another common issue is layering drift. The assistant sees a problem, wants a quick helper, and places it in whatever package feels convenient. Your conventions file tells it where application services belong, where REST facades belong, and where repository implementations belong. That narrows the search space and makes planning better before implementation starts.
This file does not replace code review. It makes code review less repetitive and less reactive.
Build an Impact Map Before Writing Code
Here is the part that changes team behavior the most. The first thing you ask the assistant for is not code. It is a repository impact map.
This sounds slower. In practice, it is faster because it catches the expensive mistakes before they become patches.
Open Bob and start with a prompt like this:
Use the java-lsp and filesystem tools to inspect the Cargo Tracker project.
I want to add CSV export for cargo itineraries.
Produce a repository impact map: what must change, in which files, and which existing patterns to follow.
Annotate each bullet with the tool call that produced the evidence.
Do not write implementation code yet.
A good response looks something like this:
# Repository Impact Map: CSV Export for Cargo Itineraries

## Evidence-Based Analysis

### 1. **Domain Layer** (No Changes Required)
**Evidence:** `java-lsp:document_symbols` on [`Itinerary.java`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/Itinerary.java:19), [`Leg.java`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/Leg.java:20), [`Cargo.java`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/Cargo.java:58)

- Domain entities already expose all necessary data via getters
- [`Itinerary.getLegs()`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/Itinerary.java:45) returns `List`
- [`Leg`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/Leg.java:20) exposes: voyage, loadLocation, unloadLocation, loadTime, unloadTime
- [`Cargo.getItinerary()`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/Cargo.java:122) provides access to itinerary

### 2. **Application Layer** (No Changes Required)
**Evidence:** `read_file` on [`BookingService.java`](src/main/java/org/eclipse/cargotracker/application/BookingService.java:13), [`CargoRepository.java`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/CargoRepository.java:5)

- [`CargoRepository.findAll()`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/CargoRepository.java:9) already exists for bulk operations
- [`CargoRepository.find(TrackingId)`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/CargoRepository.java:7) for single cargo lookup

### 3. **Interface Layer - NEW REST Endpoint** (CREATE)
**Evidence:** `read_file` on [`HandlingReportService.java`](src/main/java/org/eclipse/cargotracker/interfaces/handling/rest/HandlingReportService.java:26) shows REST pattern

**File:** `src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/CargoItineraryExportService.java` (NEW)
- Follow pattern from [`HandlingReportService`](src/main/java/org/eclipse/cargotracker/interfaces/handling/rest/HandlingReportService.java:26):
  - Use `@Stateless` annotation
  - Use `@Path("/booking")` for base path
  - Use `@GET` with `@Produces("text/csv")`
- Inject [`CargoRepository`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/CargoRepository.java:5)
- Endpoints:
  - `GET /booking/cargos/{trackingId}/itinerary/export` - single cargo
  - `GET /booking/cargos/itineraries/export` - all cargos

### 4. **Interface Layer - CSV Assembler** (CREATE)
**Evidence:** `java-lsp:find_symbols` query "Assembler" found [`CargoRouteDtoAssembler`](src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/assembler/CargoRouteDtoAssembler.java:15), `read_file` showed assembler pattern

**File:** `src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/assembler/ItineraryCsvAssembler.java` (NEW)
- Follow pattern from [`CargoRouteDtoAssembler`](src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/assembler/CargoRouteDtoAssembler.java:15):
  - Use `@ApplicationScoped`
  - Constructor injection with `@Inject`
- Methods:
  - `String toCsv(Cargo cargo)` - single cargo to CSV
  - `String toCsv(List cargos)` - multiple cargos to CSV
- CSV Format (based on [`Leg`](src/main/java/org/eclipse/cargotracker/domain/model/cargo/Leg.java:20) structure):
  ```
  TrackingId,LegNumber,VoyageNumber,LoadLocation,LoadLocationName,UnloadLocation,UnloadLocationName,LoadTime,UnloadTime
  ```

### 5. **Dependencies** (No Changes Required)
**Evidence:** `read_file` on [`pom.xml`](pom.xml:1-100)

- Jakarta EE 10 already includes necessary APIs
- No additional CSV library needed (use standard Java StringBuilder/StringWriter)
- JSON Binding already available via Jakarta JSON Binding

### 6. **Existing Patterns to Follow**

#### REST Service Pattern
**Evidence:** [`HandlingReportService.java`](src/main/java/org/eclipse/cargotracker/interfaces/handling/rest/HandlingReportService.java:26)
```java
@Stateless
@Path("/handling")
public class HandlingReportService {
  @Inject private ApplicationEvents applicationEvents;
  
  @POST
  @Path("/reports")
  @Consumes({"application/json", "application/xml"})
  public void submitReport(...) { }
}
```

#### Assembler Pattern
**Evidence:** [`CargoRouteDtoAssembler.java`](src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/assembler/CargoRouteDtoAssembler.java:15)
```java
@ApplicationScoped
public class CargoRouteDtoAssembler {
  @Inject private LocationDtoAssembler locationDtoAssembler;
  
  public CargoRoute toDto(Cargo cargo) {
    // Transform domain to DTO
  }
}
```

#### Data Access Pattern
**Evidence:** [`DefaultBookingServiceFacade.java`](src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/DefaultBookingServiceFacade.java:88-94)
```java
@Inject private CargoRepository cargoRepository;

public List listAllCargos() {
  List cargos = cargoRepository.findAll();
  // Transform and return
}
```

### 7. **Architecture Compliance**
**Evidence:** `read_file` on [`AGENTS.md`](AGENTS.md) conventions

- ✅ Strict layering maintained: interfaces → application → domain
- ✅ REST service in `interfaces/booking/rest/` (new package)
- ✅ Assembler in `interfaces/booking/rest/assembler/` (new package)
- ✅ CDI constructor injection
- ✅ Domain types remain annotation-free
- ✅ No new application services needed

## Summary

**Files to CREATE:**
1. `src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/CargoItineraryExportService.java`
2. `src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/assembler/ItineraryCsvAssembler.java`

**Files to MODIFY:** None

**Pattern Sources:**
- REST endpoint: [`HandlingReportService.java`](src/main/java/org/eclipse/cargotracker/interfaces/handling/rest/HandlingReportService.java:26)
- Assembler: [`CargoRouteDtoAssembler.java`](src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/assembler/CargoRouteDtoAssembler.java:15)
- Repository usage: [`DefaultBookingServiceFacade.java`](src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/DefaultBookingServiceFacade.java:88-94)

**Testing:**
- Integration test following Arquillian pattern in `src/test/java/`
- Verify with Open Liberty: `./mvnw liberty:run -Popenliberty`
The exact class names and file paths depend on what the assistant finds in your checkout. The shape is what matters. You want real files, real existing types, and evidence from tool calls. Not “I would create a new controller package.” Not “It may be useful to add a DTO.” Evidence first.
Why does this matter? Because wrong-layer changes are cheap to fix at the map stage and expensive to fix after implementation starts. If the impact map shows an invented package, a duplicate service, or the wrong test class, you correct it in minutes. If you wait until after the assistant generated a patch, now you are reviewing code, logic, test strategy, and architecture drift at the same time.
This is the point where the human stays in charge. The assistant explores. You approve the shape.
Turn the Impact Map Into a Structured Task
After the impact map is approved, you convert it into an implementation contract. This is where you stop vague prompting and start being explicit about file boundaries, behavior, and acceptance criteria.
Use a task prompt like this:
Use the approved repository impact map below as the implementation contract.

Implement CSV export for cargo itineraries in the Cargo Tracker project.

Before you implement anything, read `AGENTS.md` and `CONVENTIONS.md` if present, and extract the rules that apply to this change.
List those rules first under a heading `Applicable conventions`.
Use those conventions as binding constraints for implementation and tests.
If the impact map conflicts with `AGENTS.md` or `CONVENTIONS.md`, stop and explain the conflict before writing code.

Important:
- Stay within the approved impact map
- Do not invent additional packages, layers, or abstractions
- Do not move this through a facade or application service
- Do not modify existing files unless a minimal compile-time change is strictly required
- If you believe an existing file must be modified, explain why before showing code
- Start by listing the exact files you will create or modify
- Then implement the change directly

## Approved scope

### Files to create
1. `src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/CargoItineraryExportService.java`
2. `src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/assembler/ItineraryCsvAssembler.java`

### Files to reference but not modify unless strictly required for compilation
- `src/main/java/org/eclipse/cargotracker/domain/model/cargo/Cargo.java`
- `src/main/java/org/eclipse/cargotracker/domain/model/cargo/Itinerary.java`
- `src/main/java/org/eclipse/cargotracker/domain/model/cargo/Leg.java`
- `src/main/java/org/eclipse/cargotracker/domain/model/cargo/CargoRepository.java`
- `src/main/java/org/eclipse/cargotracker/interfaces/handling/rest/HandlingReportService.java`
- `src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/assembler/CargoRouteDtoAssembler.java`
- `src/main/java/org/eclipse/cargotracker/interfaces/booking/facade/internal/DefaultBookingServiceFacade.java`
- `AGENTS.md`
- `CONVENTIONS.md`

## Required architecture

Follow the approved architecture exactly:

- Domain layer: no changes
- Application layer: no changes
- Interface layer: add one new REST service
- Interface layer: add one new assembler under `interfaces/booking/rest/assembler/`

Architecture constraints:
- Maintain strict layering
- Keep domain classes unchanged
- Use CDI constructor injection where conventions require it
- Keep the implementation in the interface layer
- No new facade methods
- No new application services
- No DTO layer for this feature

## Required REST endpoints

Create a new REST service class:

`src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/CargoItineraryExportService.java`

Implementation requirements:
- Follow the REST structure pattern from `HandlingReportService`
- Use `@Stateless`
- Use base path `@Path("/booking")`
- Inject `CargoRepository`
- Return JAX-RS `Response`
- Produce `text/csv`

Add these endpoints exactly:

1. `GET /booking/cargos/{trackingId}/itinerary/export`
   - export one cargo itinerary as CSV

2. `GET /booking/cargos/itineraries/export`
   - export all cargo itineraries as CSV

Response requirements:
- Set `Content-Type` to `text/csv`
- Add `Content-Disposition` header so the CSV is downloadable
- Use clear file names for single-cargo and all-cargos export

## Required assembler

Create a new assembler class:

`src/main/java/org/eclipse/cargotracker/interfaces/booking/rest/assembler/ItineraryCsvAssembler.java`

Implementation requirements:
- Follow the assembler style from `CargoRouteDtoAssembler`
- Use `@ApplicationScoped`
- Use constructor injection with `@Inject`
- Work directly with domain objects

Required methods:
- `String toCsv(Cargo cargo)`
- `String toCsv(List cargos)`

CSV requirements:
- Use standard Java only
- No external CSV library
- No JSON or Jackson-based shortcut
- Build the CSV content with standard Java types

Use this exact column order:

`TrackingId,LegNumber,VoyageNumber,LoadLocation,LoadLocationName,UnloadLocation,UnloadLocationName,LoadTime,UnloadTime`

Behavior requirements:
- Include header row once
- For single cargo, output one row per itinerary leg
- For multiple cargos, output rows for all itinerary legs across all cargos
- Preserve the cargo tracking id on every row
- Handle cargos without itineraries in a way consistent with existing project behavior and explain the choice

## Data access requirements

Use existing repository methods only:
- `CargoRepository.find(TrackingId)` for single cargo export
- `CargoRepository.findAll()` for all-cargo export

Do not add repository methods.
Do not add application services to wrap repository access.

## Error handling

For the single-cargo endpoint:
- If the tracking id does not resolve to a cargo, follow the project’s existing not-found style
- Do not invent a new error format unless existing REST code clearly requires it

For the all-cargos endpoint:
- Return a valid CSV response even when there are no cargos
- Explain the behavior you chose

## Constraints

- No changes to domain classes
- No changes to application services
- No changes to facade interfaces or implementations
- No UI changes
- No new dependencies
- No unrelated refactoring
- No endpoint path changes
- No alternative package placement

## Testing

Add integration test coverage following the existing Arquillian style in `src/test/java/`.

Required tests:
1. Export single cargo itinerary as CSV
2. Return not-found behavior for unknown tracking id
3. Export all cargo itineraries as CSV
4. Return a valid CSV response shape for the empty-data case if applicable

Testing rules:
- Follow existing project conventions
- Keep tests minimal but real
- Assert content type
- Assert response status
- Assert header presence where relevant
- Assert CSV header row
- Assert at least one representative CSV row value

## Verification runtime

This feature must be verified using the Open Liberty profile.

Use these commands for final verification:
- `./mvnw clean test`
- `./mvnw clean package -Popenliberty`
- `./mvnw liberty:run -Popenliberty`

Do not use a different runtime profile for final verification.

## Output format

When you respond:
1. Show `Applicable conventions`
2. Show the exact files you will create or modify
3. Show the full code for each new or changed file
4. Show the full test code
5. Show any minimal deviation from the impact map
6. Show a `Conventions check` section against `AGENTS.md` and `CONVENTIONS.md`
7. End with the exact Maven verification commands using Open Liberty
This kind of prompt is much harder for the assistant to misunderstand. We define scope, file boundaries, constraints, and what “done” means. That reduces wandering. It also makes review easier because you can compare the resulting diff against the declared contract.
The practical difference is huge. “Implement CSV export” invites invention. A structured task grounded in a reviewed impact map invites extension of existing code.
Configure and Activate the Harness in Bob
At this point, the project-local configuration exists. The bridge JAR is in place. The assistant still needs to load and use the servers.
The exact Bob UI labels can change between releases, so do not get attached to the menu wording. What matters is that the workspace opens with the repository root and Bob loads .bob/mcp.json from that project.
When this works correctly, the assistant should expose the configured servers and tools without requiring a second round of manual setup. If it does not, treat that as a configuration mismatch or client-version mismatch, not as proof that the idea failed.
A good first smoke check inside Bob is intentionally tiny:
Call find_symbols with query "CargoRepository" and paste the first result path only.
This test is good because it is narrow. It does not ask the assistant to think. It asks it to prove the tool wiring works.
If the response includes a real path under src/main/java, the semantic side of the harness is alive.
Next, test the filesystem scope:
Use the filesystem tool to read README.md at the repository root.
If your filesystem server is correctly scoped to src/, this request should fail or return an out-of-scope error. That refusal is success. It proves your boundaries are working.
Then test Git context with something equally small:
Use git_log on the existing cargo facade integration test and summarize the most recent relevant change in one sentence.
This tells you whether the assistant can consume project history without inventing it.
These tiny tests matter because they isolate failure. If you jump straight into a feature request and it goes wrong, you do not know whether the problem is the bridge, the server scope, the task prompt, the client, or the repository. Small smoke tests make the failure visible sooner.
Production Hardening
A harness that works on one laptop is not enough. We need to think about what happens when this setup becomes a team habit.
What happens under load
jdtls is not a trivial process. On multi-module repositories, it consumes real memory and CPU during indexing. Cargo Tracker is manageable, but bigger enterprise repositories expose the cost quickly. This is why we start with a one-gigabyte heap and why we keep a dedicated workspace cache.
If developers switch branches aggressively, the jdtls cache can become stale or noisy. When that happens, the assistant starts returning confusing symbol results or slow responses. The fix is operational, not architectural: use a dedicated cache location and be willing to wipe it when the workspace state is corrupted.
For example, you can move from /tmp to a stable directory:
mkdir -p ~/.cache/jdtls-cargotracker
Then update your launch configuration to use that directory as the -data location. This makes indexing more stable across sessions and gives you one place to clean up when the cache becomes suspect.
Security and blast radius
Read-only behavior is not automatic. It is designed.
Git MCP servers often expose both read and write operations. Filesystem servers expose whatever path you give them. If you point the filesystem server at ${workspaceFolder}, you are giving the assistant visibility into everything under the repository root, including local experiments, build output, or accidentally committed secrets.
That is why this tutorial scopes the filesystem server to src/. It is a deliberate loss of convenience in exchange for a smaller blast radius.
Pinning tool versions is part of the same story. @latest and ephemeral uvx installs are convenient for first setup. They are not good long-term operational defaults. Once the team validates a version combination, pin it and record it. Otherwise, you will debug “AI behavior changes” that are really dependency drift in supporting tools.
Portability and machine-specific configuration
A committed .bob/mcp.json is good. A committed file with a macOS-only JAVA_HOME path is half-good.
Teams usually solve this one of two ways. The first way is to keep the committed file generic and rely on inherited environment variables for Java. The second way is to maintain two small documented variants internally, one for macOS and one for Linux. The wrong answer is leaving a personal desktop path in the repo and hoping nobody else notices.
You should also think about whether Bob itself runs on the host, inside a dev container, or through a remote development path. That changes path semantics and environment inheritance. The harness still works, but you want to be clear about which process owns Java, which process sees the workspace root, and where logs land.
Supply chain risk in helper tools
npx and uvx are useful because they remove friction. They also resolve packages at invocation time unless pinned. That means the harness can change underneath you without a repository diff.
This is not just theoretical. Tool name changes, dependency updates, or package behavior differences can silently change what Bob sees. In a solo workflow that is annoying. In a shared workflow it becomes a support problem.
A practical team response is simple. Pin versions after validation. Record the version set in a small internal note or in the repository docs. Review upgrades like you would any other tooling change.
Human review boundaries
The biggest mistake teams make with this setup is thinking better tools remove the need for review. They do not. They move review earlier and make it cheaper.
The impact map is the first checkpoint. The structured task is the second. The code review is still there after that. What changes is the quality of the diff reaching review. You get fewer invented endpoints, fewer package mistakes, and fewer changes that violate conventions simply because the assistant had a smaller, better-defined space to operate in.
Verification
Now let’s verify the whole setup step by step.
Check Java and the Cargo Tracker build
Run this from the repository root:
cd /path/to/cargotracker
java -version
./mvnw clean package -Popenliberty
Expected result: java -version reports Java 21 or later, and Maven completes without BUILD FAILURE.
This verifies the foundation. If this step fails, none of the assistant tooling is trustworthy because the project state itself is broken.
Check Bob can call semantic tools
In Bob, send this prompt:
Call find_symbols with query "CargoRepository" and paste the first result path only.
Expected result: a real path inside src/main/java, something along these lines:
src/main/java/org/eclipse/cargotracker/domain/model/cargo/CargoRepository.java
The exact package may differ depending on the project revision, but it must point into the actual source tree.
This verifies that Bob can launch the bridge, the bridge can talk to jdtls, and the assistant can receive the result.
Check the filesystem boundary
In Bob, send this prompt:
Use the filesystem tool to read README.md from the repository root.
Expected result: a refusal, an out-of-scope error, or a message indicating the file is outside the allowed path.
This verifies that your filesystem scope is doing what you intended. If Bob reads the file successfully, your path is too wide for the harness described in this tutorial.
Check Git context
In Bob, send this prompt:
Use git_log on the cargo facade integration test and summarize the most recent relevant change in one sentence.
Expected result: a short answer grounded in actual commit history.
This verifies that the assistant can bring recent repository history into its planning loop. That matters when a feature needs to follow an existing testing style or recent implementation pattern.
Check the planning workflow
Now run the real planning test:
Use the java-lsp and filesystem tools to inspect the Cargo Tracker project.
I want to add CSV export for cargo itineraries.
Produce a repository impact map with files, layers, and existing patterns to follow.
Annotate each bullet with the tool call that produced the evidence.
Do not write code yet.
Expected result: a grounded map that names existing files and classes, stays inside existing package structure, and shows evidence.
This is the real proof that the harness works. Not just that the tools launch, but that the assistant changes behavior and plans against repository reality.
Architecture Recap
From the assistant’s point of view, the setup looks like this:
IBM Bob
├── java-lsp (LSP4J-MCP to jdtls)
│     find_symbols / find_references / find_definition / document_symbols
├── filesystem (@modelcontextprotocol/server-filesystem on src/)
│     read_file / list_directory / search_files
└── git (mcp-server-git)
      git_log / git_diff / git_show
Each server answers a different question.
The Java language server answers, “What does this code mean?”
The filesystem server answers, “What is actually written in the allowed source tree?”
The Git server answers, “What changed recently, and what implementation history should we respect?”
That split is the whole point. A single prompt is not enough. Good code generation needs semantic context, textual context, and often historical context.
Further Reading
If you want to continue from here, the next useful documents are the IBM Bob documentation for project-level MCP configuration, the Cargo Tracker repository itself for understanding the domain and package layout, and the LSP4J-MCP project for the exact bridge behavior and supported tool names.
You should also keep your own small internal note for version combinations that your team has validated. This sounds boring. It saves time.
Conclusion
We built a small but practical harness around Cargo Tracker so the assistant can plan against classpath reality, stay inside a controlled source boundary, and use repository history when it matters. That changes the quality of AI-generated work because it removes the assistant’s need to guess about symbols, layers, and existing patterns. The real lesson is not specific to Cargo Tracker or IBM Bob. AI coding quality follows environment quality.
Subscribe now



Build a Digital Credentialing Platform with Quarkus
Markus Eisele — Tue, 21 Apr 2026 06:08:39 GMT
Most badge systems look simple at first. Store a learner row, attach a PNG, send an email, done. That works until the first real trust question shows up. Who issued this credential? Was it an a partner accidentally, or deliberately, issue 10,000 badges through one weak webhook?
This is where many “badge platforms” stop being platforms and start looking like decorative metadata stores. A credential is closer to an invoice, a certificate, or an audit record. It needs identity, issuer proof, stable URLs, replay protection, and a clean story for revocation. If any of that is missing, the badge still renders nicely in a browser, but it does not hold up when another system tries to trust it.
The production problem is usually not the JSON shape. The production problem is the trust boundary. I have seen systems where anyone with a guessed callback URL could trigger issuance. I have seen systems where the signed artifact and the hosted public JSON disagreed on recipient identity because hashing logic lived in two different classes. I have seen schema changes break partner mappings because the Java embeddable key no longer matched the database primary key.
In this tutorial we build TheMainThread Academy, a single Quarkus application that issues Open Badge 2.0 style credentials. We define badge templates, issue signed assertions, expose verifier-facing JSON at stable URLs, render earner-facing HTML with Qute, and accept signed partner callbacks using HMAC-SHA256. The important part is not only that it works. The important part is that it fails in predictable ways when something is wrong.
The stack is deliberately boring in the right places: Hibernate ORM with Panache for persistence, Flyway for schema control, SmallRye JWT for signing assertions, Quarkus Mailer with Dev Services Mailpit for local email, and PostgreSQL from Dev Services so ./mvnw quarkus:dev is enough to get a working system on a laptop with Podman or Docker.
Prerequisites
You should be comfortable reading JAX-RS resources, JPA entities, and SQL migrations. The steps assume a Unix shell for curl and openssl.
Java 21 installed (the generated module targets release 21)
Maven 3.9+ or the included ./mvnw in the module
Quarkus CLI optional but recommended (quarkus create app)
Podman or Docker for Dev Services (PostgreSQL and Mailpit)
Project setup
Create the application from the Quarkus CLI so everyone lands on the same extension IDs as the current platform stream. 
You can also directly start from my Github repository.
quarkus create app academy.themainthread:badge-platform \
  --package-name=academy.themainthread \
  -B \
  --extensions=rest,rest-jackson,rest-qute,hibernate-orm-panache,jdbc-postgresql,smallrye-jwt,mailer,quarkus-mailpit,qute,hibernate-validator,smallrye-openapi,scheduler,flyway
cd badge-platform
Extensions explained:
rest and rest-jackson: JSON admin APIs and Jackson ObjectMapper for webhook parsing
rest-qute: return TemplateInstance from the same resource classes that serve JSON
hibernate-orm-panache: active record style entities for earners, templates, assertions, partners
jdbc-postgresql: production driver plus Agroal pool (Dev Services wires a container automatically)
smallrye-jwt: sign assertion JWTs with an RSA private key from the classpath
mailer: send award notifications
quarkus-mailpit: Dev Email UI for testing
qute: server-side HTML for humans
hibernate-validator: request body validation on admin and webhook payloads
smallrye-openapi: Swagger UI for operators
scheduler: reserved for future housekeeping (expiry sweeps, webhook retries)
flyway: versioned schema, no reliance on Hibernate auto-DDL in any profile
Configuration
Create src/main/resources/application.properties with the keys below. Each line matters: missing JWT signing material fails startup, a wrong issuer string breaks interoperability with off-the-shelf verifiers, and an oversized webhook body becomes a cheap DoS handle.
# Datasource — Dev Services starts PostgreSQL in dev and test
quarkus.datasource.db-kind=postgresql
quarkus.hibernate-orm.schema-management.strategy=none
quarkus.flyway.migrate-at-start=true

# JWT verify (unused on most endpoints today, but keeps SmallRye JWT config consistent)
mp.jwt.verify.issuer=https://academy.themainthread.dev
mp.jwt.verify.public-key.location=META-INF/resources/public.pem
smallrye.jwt.sign.key.location=META-INF/resources/private.pem
smallrye.jwt.new-token.issuer=https://academy.themainthread.dev
smallrye.jwt.new-token.lifespan=315360000

# Canonical base URL embedded in assertion and badge identifiers
academy.base-url=http://localhost:8080

# Mailer — Dev Services Mailpit in dev; mocked in tests
quarkus.mailer.from=badges@academy.themainthread.dev

# OpenAPI
quarkus.smallrye-openapi.info-title=TheMainThread Academy Badge API
quarkus.smallrye-openapi.info-version=1.0.0

# Webhook hardening
quarkus.http.limits.max-body-size=1M

%test.quarkus.mailer.mock=true
Each setting explained:
quarkus.datasource.db-kind=postgresql: selects the PostgreSQL dialect and driver. Without a JDBC URL in dev, Dev Services supplies one. In production you add quarkus.datasource.username, quarkus.datasource.password, and quarkus.datasource.jdbc.url. If those are wrong, the pool never connects and health checks go red instead of silently falling back to H2.
quarkus.hibernate-orm.schema-management.strategy=none: Hibernate must not mutate tables at runtime because Flyway owns the truth. If you flip this to drop-and-create, you will eventually run a deploy against real data and delete earners. The older database.generation key is deprecated on current Quarkus lines; use schema-management.strategy instead.
quarkus.flyway.migrate-at-start=true: applies db/migration scripts before serving traffic. If a migration fails, the process exits. That is preferable to half-applied manual DDL.
mp.jwt.verify.issuer and mp.jwt.verify.public-key.location: align verification settings with what external tooling expects if you later add MP-JWT protected routes. They do not hurt issuance-only flows, but an unreadable public.pem fails fast at startup.
smallrye.jwt.sign.key.location: path to the RSA private key PEM inside the application jar. If the file is missing, signing fails at runtime when you issue the first badge.
smallrye.jwt.new-token.issuer and lifespan: issuer claim and default token lifetime for APIs that mint JWTs. Assertion signing sets its own expiry per row, but the platform property must still be valid seconds.
academy.base-url: every hosted assertion URL and verification.creator pointer is built from this string. If it does not match the hostname clients use, verifiers fetch the wrong host and hosted verification fails even when signatures are valid.
quarkus.mailer.from: required envelope sender. Misconfigure SMTP in prod and Mailer throws; in dev, Mailpit accepts anything.
quarkus.smallrye-openapi.*: metadata only. Wrong values confuse operators, not runtime.
quarkus.http.limits.max-body-size: caps partner webhook bodies. Without a limit, a gzip bomb or megabyte-scale JSON ties up threads and disk.
%test.quarkus.mailer.mock=true: keeps @QuarkusTest from needing real SMTP while still exercising Mailer calls.
Generate RSA keys once per environment (never reuse demo keys in production):
openssl genrsa -out src/main/resources/META-INF/resources/private.pem 2048
openssl rsa -in src/main/resources/META-INF/resources/private.pem \
  -pubout -out src/main/resources/META-INF/resources/public.pem
Database schema with Flyway
Flyway is the contract between your Java entities and what actually exists in PostgreSQL. Hibernate maps rows; Flyway guarantees indexes, uniqueness, and composite keys survive refactors.
Create src/main/resources/db/migration/V1__initial_schema.sql:
CREATE TABLE earner (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email       VARCHAR(255) NOT NULL UNIQUE,
    name        VARCHAR(255) NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE badge_template (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name         VARCHAR(255) NOT NULL,
    description  TEXT NOT NULL,
    criteria     TEXT NOT NULL,
    image_url    VARCHAR(512) NOT NULL,
    skills       TEXT,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE accredited_partner (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name           VARCHAR(255) NOT NULL,
    webhook_secret VARCHAR(255) NOT NULL,
    active         BOOLEAN NOT NULL DEFAULT true,
    created_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE partner_badge_template (
    partner_id        UUID NOT NULL REFERENCES accredited_partner(id),
    course_id         VARCHAR(255) NOT NULL,
    badge_template_id UUID NOT NULL REFERENCES badge_template(id),
    PRIMARY KEY (partner_id, course_id)
);

CREATE TABLE badge_assertion (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    earner_id     UUID NOT NULL REFERENCES earner(id),
    template_id   UUID NOT NULL REFERENCES badge_template(id),
    issued_on     TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at    TIMESTAMPTZ,
    revoked       BOOLEAN NOT NULL DEFAULT false,
    revoke_reason VARCHAR(512),
    signed_token  TEXT NOT NULL,
    salt          VARCHAR(64) NOT NULL
);

CREATE TABLE webhook_event (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    partner_id      UUID NOT NULL REFERENCES accredited_partner(id),
    idempotency_key VARCHAR(255) NOT NULL,
    payload         TEXT NOT NULL,
    status          VARCHAR(32) NOT NULL DEFAULT 'RECEIVED',
    received_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    processed_at    TIMESTAMPTZ,
    error           TEXT,
    UNIQUE (partner_id, idempotency_key)
);

CREATE INDEX idx_assertion_earner ON badge_assertion(earner_id);
CREATE INDEX idx_assertion_template ON badge_assertion(template_id);
CREATE INDEX idx_webhook_status ON webhook_event(status);
The composite primary key on partner_badge_template is (partner_id, course_id). That matches how partners think (their course catalog), and it forces the JPA embeddable id to carry partnerId plus courseId, not badgeTemplateId. A mismatch here is the kind of bug that passes code review and explodes the first time two templates share a course code.
Implementation: domain model
Panache active record keeps the tutorial focused on behavior instead of repository interfaces. Each entity is a PanacheEntityBase subclass under academy.themainthread.domain, mirroring the Flyway tables. Earner, BadgeTemplate, BadgeAssertion, AccreditedPartner, and WebhookEvent follow the shapes shown in the repository. The two pieces that deserve extra attention in prose are the partner mapping and the assertion row.
PartnerBadgeTemplate embeds PartnerBadgeTemplateId with partnerId and courseId columns. badge_template_id is a normal foreign key column on the entity, not part of the primary key. findByCourseId is a typed query on those columns. If you model the embeddable with badgeTemplateId instead while the database uses course_id in the primary key, Hibernate will compile and your integration tests will fail in confusing ways.
BadgeAssertion uses an application-assigned UUID primary key. The signing step needs the final assertion URL before insert, and PostgreSQL rejects a nullable signed_token column. The production code therefore assigns assertion.id = UUID.randomUUID(), computes signedToken, then calls persist() once. A two-step “insert with null token, update later” pattern fails the NOT NULL constraint the moment Hibernate flushes the first insert.
The listing below is the partner mapping that must agree with the SQL primary key. Everything else in academy.themainthread.domain matches the Flyway tables line for line in the repository.
package academy.themainthread.domain;

import jakarta.persistence.Column;
import jakarta.persistence.Embeddable;

import java.io.Serializable;
import java.util.Objects;
import java.util.UUID;

@Embeddable
public class PartnerBadgeTemplateId implements Serializable {

    @Column(name = "partner_id", nullable = false)
    public UUID partnerId;

    @Column(name = "course_id", nullable = false, length = 255)
    public String courseId;

    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        PartnerBadgeTemplateId that = (PartnerBadgeTemplateId) o;
        return Objects.equals(partnerId, that.partnerId) && Objects.equals(courseId, that.courseId);
    }

    @Override
    public int hashCode() {
        return Objects.hash(partnerId, courseId);
    }
}
package academy.themainthread.domain;

import io.quarkus.hibernate.orm.panache.PanacheEntityBase;
import jakarta.persistence.EmbeddedId;
import jakarta.persistence.Entity;
import jakarta.persistence.FetchType;
import jakarta.persistence.JoinColumn;
import jakarta.persistence.ManyToOne;
import jakarta.persistence.MapsId;
import jakarta.persistence.Table;

@Entity
@Table(name = "partner_badge_template")
public class PartnerBadgeTemplate extends PanacheEntityBase {

    @EmbeddedId
    public PartnerBadgeTemplateId id;

    @ManyToOne(fetch = FetchType.LAZY)
    @MapsId("partnerId")
    @JoinColumn(name = "partner_id")
    public AccreditedPartner partner;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "badge_template_id", nullable = false)
    public BadgeTemplate template;

    public static PartnerBadgeTemplate findByCourseId(AccreditedPartner partner, String courseId) {
        return find("id.partnerId = ?1 AND id.courseId = ?2", partner.id, courseId).firstResult();
    }
}
Implementation: recipient hashing and JWT signing
Hosted Open Badge flows expect a recipient object with type, hashed, salt, and identity where identity is sha256$ plus a lowercase hex digest of the salted email. The JWT and the public JSON endpoint must agree on that string or verifiers cannot correlate machine-readable and human-readable views.
RecipientIdentity centralizes the digest. AssertionSigner builds the JWT claims map, pulls academy.base-url for every URL-shaped claim, and caps JWT expiry using either the assertion’s expiresAt or a far-future default. Using Long.MAX_VALUE as an epoch second is a bad fit for JWT libraries and some parsers; the implementation clamps to roughly ten years when no explicit expiry is set.
Implementation: issuance and events
BadgeIssuanceService is @ApplicationScoped and transactional. It wires AssertionSigner and fires BadgeIssuedEvent after persistence so mail observers see a stable assertion id. The transaction boundary here is only the database. Mail delivery and external HTTP calls are not rolled back if SMTP later fails, which is why BadgeAwardMailer catches exceptions per message and logs them instead of pretending email is transactional.
package academy.themainthread.badge;

import academy.themainthread.domain.BadgeAssertion;
import academy.themainthread.domain.BadgeTemplate;
import academy.themainthread.domain.Earner;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Event;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;

import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.UUID;

@ApplicationScoped
public class BadgeIssuanceService {

    @Inject
    AssertionSigner signer;

    @Inject
    Event badgeIssuedEvent;

    @Transactional
    public BadgeAssertion issue(Earner earner, BadgeTemplate template, Instant expiresAt) {
        BadgeAssertion assertion = new BadgeAssertion();
        assertion.id = UUID.randomUUID();
        assertion.earner = earner;
        assertion.template = template;
        assertion.issuedOn = Instant.now();
        assertion.expiresAt = expiresAt;
        assertion.salt = AssertionSigner.generateSalt();
        assertion.signedToken = signer.sign(assertion);
        assertion.persist();

        badgeIssuedEvent.fire(new BadgeIssuedEvent(assertion.id, earner.email, earner.name, template.name));

        return assertion;
    }

    @Transactional
    public BadgeAssertion issueWithDefaultExpiry(Earner earner, BadgeTemplate template) {
        Instant expires = Instant.now().plus(365L * 2L, ChronoUnit.DAYS);
        return issue(earner, template, expires);
    }

    @Transactional
    public void revoke(UUID assertionId, String reason) {
        BadgeAssertion assertion = BadgeAssertion.findById(assertionId);
        if (assertion == null) {
            throw new IllegalArgumentException("Assertion not found: " + assertionId);
        }
        assertion.revoked = true;
        assertion.revokeReason = reason;
        assertion.persist();
    }
}
The issue method is the heart of the trust story. A single persist() after signing avoids a flush that writes signed_token = null, which PostgreSQL rejects. Firing BadgeIssuedEvent after persist() means downstream code can safely build URLs that hit the database.
Implementation: admin REST API
AdminResource under /admin exposes JSON endpoints for templates, earners, manual issuance, partners, and course mappings. Responses use real HTTP status codes: 409 when an earner email already exists or when a webhook replay hits the same idempotency key, 404 when foreign keys do not resolve.
The admin API is intentionally unauthenticated in this repository so the article stays inside one service. Production hardening below calls out what has to change before you expose it past localhost.
Implementation: public verification, JSON, and Qute
PublicResource serves /assertions/{id}, /badges/{id}, /earners/{id}, and /keys/1. For assertions and badges, the same paths return HTML when the client prefers text/html and JSON-LD shaped maps when the client sends Accept: application/json. Qute templates live under src/main/resources/templates/ and share layout.html.
The /keys/1 handler reads META-INF/resources/public.pem from the classpath and returns JSON with a publicKeyPem field. That is enough for readers to wire real JWK publishing later; the important part for this tutorial is that verifiers have a stable URL that returns the public half of the signing key material.
Implementation: webhook ingestion and processing
Three classes split the work so transactions behave honestly.
WebhookIngestionService exposes a single @Transactional method that inserts a WebhookEvent row. WebhookResource validates the HMAC signature, parses JSON with Jackson, validates Bean Validation constraints, checks that partnerId inside the JSON matches the X-Partner-Id header, rejects duplicates with 409, then calls the ingestion service and only afterwards fires CourseCompletionEvent.
Splitting persistence this way matters. If you fire a CDI event while still inside the same transaction that created the row, an @ObservesAsync listener can start before commit and not see the insert. The ingestion service completes and commits before fire(), so synchronous or asynchronous observers see committed data.
CourseCompletionObserver uses @Observes (synchronous) with @Transactional(TxType.REQUIRES_NEW) so it opens a clean transaction for badge issuance and webhook status updates. @ObservesAsync is attractive for a 202 Accepted story, but without a message broker you still need strict ordering between “row visible” and “handler runs”. The mail path keeps @ObservesAsync on BadgeAwardMailer so HTTP threads are not blocked on SMTP.
HmacVerifier computes HmacSHA256 over the raw bytes the partner signed and compares digests with MessageDigest.isEqual to avoid timing leaks from String.equals.
package academy.themainthread.webhook;

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;

import java.nio.charset.StandardCharsets;
import java.security.InvalidKeyException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HexFormat;

public final class HmacVerifier {

    private HmacVerifier() {}

    public static boolean verify(String payload, String signature, String secret) {
        if (signature == null || !signature.startsWith("sha256=")) {
            return false;
        }
        String provided = signature.substring(7);
        String computed = compute(payload, secret);
        return MessageDigest.isEqual(
                provided.getBytes(StandardCharsets.UTF_8), computed.getBytes(StandardCharsets.UTF_8));
    }

    public static String compute(String payload, String secret) {
        try {
            Mac mac = Mac.getInstance("HmacSHA256");
            mac.init(new SecretKeySpec(secret.getBytes(StandardCharsets.UTF_8), "HmacSHA256"));
            byte[] hash = mac.doFinal(payload.getBytes(StandardCharsets.UTF_8));
            return HexFormat.of().formatHex(hash);
        } catch (NoSuchAlgorithmException | InvalidKeyException e) {
            throw new IllegalStateException("HMAC-SHA256 failed", e);
        }
    }
}
WebhookIngestionService is intentionally tiny. It gives you one transaction that ends before CourseCompletionEvent fires, which keeps observers honest whether they are synchronous or asynchronous.
package academy.themainthread.webhook;

import academy.themainthread.domain.AccreditedPartner;
import academy.themainthread.domain.WebhookEvent;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.transaction.Transactional;

@ApplicationScoped
public class WebhookIngestionService {

    @Transactional
    public WebhookEvent recordReceived(AccreditedPartner partner, String idempotencyKey, String rawBody) {
        WebhookEvent event = new WebhookEvent();
        event.partner = partner;
        event.idempotencyKey = idempotencyKey;
        event.payload = rawBody;
        event.status = WebhookEvent.Status.RECEIVED;
        event.persist();
        return event;
    }
}
Implementation: mail notifications
BadgeAwardMailer listens for BadgeIssuedEvent with @ObservesAsync, builds simple HTML, and calls Mailer.send. In tests, %test.quarkus.mailer.mock=true records messages without network I/O.
Production hardening
Webhook abuse and partner trust
Partners authenticate with a shared secret and an HMAC over the exact raw body bytes. If you normalize JSON (pretty print, reorder keys) before verifying, signatures that were valid on the partner side will fail on yours. The resource method takes String rawBody intentionally. Rate limiting, IP allow lists, and per-partner quotas belong in an API gateway or filter in front of this resource. The quarkus.http.limits.max-body-size property is only a coarse backstop.
Admin surface and OIDC
Every admin endpoint is public in this demo. In production you terminate TLS at your edge, require OIDC (for example Quarkus quarkus-oidc) or mutual TLS for automation, and narrow CORS. Until then, treat localhost as the trust boundary.
Assertion privacy and rotation
Recipient email hashing protects casual scraping, but anyone who knows the email and salt can recompute the digest. Treat salts as disclosure-sensitive metadata, not a second password. Plan for key rotation by versioning /keys/{n} and keeping old public keys available until assertions signed with them expire.
Verification
Automated integration test
From badge-platform:
./mvnw test
The AcademyWorkflowTest class posts a template, registers a partner, maps a course, sends a signed webhook, then polls /admin/assertions until an assertion exists. It finally requests public JSON for the assertion and checks that recipient.identity contains the sha256$ prefix, and that /keys/1 returns PEM material.
You should see Quarkus start with the test profile, Flyway apply V1__initial_schema.sql, tests pass, and the JVM exit code 0.
Manual curl walkthrough
Start dev mode (this blocks; use a second terminal for curls):
./mvnw quarkus:dev
Capture IDs in shell variables so you never paste the literal string TEMPLATE_ID into JSON (that value is not a UUID, so Bean Validation rejects the request and no course mapping is stored). Without a mapping, the webhook still returns 202 Accepted because ingestion succeeded, but no assertion is issued and jq '.[0].earner.id' returns null either because the assertions list is empty or because [0] does not exist.
Create a badge template and store its id:
export TEMPLATE_ID="$(
  curl -sS -X POST http://localhost:8080/admin/badges \
    -H "Content-Type: application/json" \
    -d '{
      "name": "Quarkus Developer",
      "description": "Awarded to developers who demonstrate proficiency in building cloud-native Java applications with Quarkus.",
      "criteria": "Complete the Quarkus Fundamentals course and pass the practical assessment with a score of 80% or higher.",
      "imageUrl": "https://design.jboss.org/quarkus/logo/final/SVG/quarkus_icon_rgb_default.svg",
      "skills": "Quarkus,Java,Cloud-Native,Kubernetes,REST"
    }' | jq -r .id
)"
echo "TEMPLATE_ID=$TEMPLATE_ID"
Register a partner:
export PARTNER_ID="$(
  curl -sS -X POST http://localhost:8080/admin/partners \
    -H "Content-Type: application/json" \
    -d '{
      "name": "Acme Training Platform",
      "webhookSecret": "super-secret-signing-key-change-in-production"
    }' | jq -r .id
)"
echo "PARTNER_ID=$PARTNER_ID"
Map the partner’s course id to that template (this call must return HTTP 200 with a JSON body, not a validation error):
curl -sS -i -X POST "http://localhost:8080/admin/partners/${PARTNER_ID}/courses" \
  -H "Content-Type: application/json" \
  -d "{\"templateId\":\"${TEMPLATE_ID}\",\"courseId\":\"QUARKUS-FUND-101\"}"
Send a signed webhook. The payload bytes must match what you pass to openssl dgst exactly (same partnerId, same courseId, same idempotencyKey if you retry):
export PAYLOAD='{"partnerId":"'"$PARTNER_ID"'","courseId":"QUARKUS-FUND-101","learnerEmail":"alice@example.com","learnerName":"Alice Smith","completedAt":"2026-04-06T14:00:00Z","idempotencyKey":"evt-001"}'
export SIG="sha256=$(printf '%s' "$PAYLOAD" | openssl dgst -sha256 -hmac "super-secret-signing-key-change-in-production" | awk '{print $2}')"
curl -sS -i -X POST http://localhost:8080/webhooks/completions \
  -H "Content-Type: application/json" \
  -H "X-Partner-Id: $PARTNER_ID" \
  -H "X-Webhook-Signature: $SIG" \
  -d "$PAYLOAD"
Expect HTTP/1.1 202 Accepted with a JSON body containing "status":"accepted" and an eventId.
Confirm at least one assertion exists, then read Alice’s earner id:
curl -sS http://localhost:8080/admin/assertions | jq 'length'
curl -sS http://localhost:8080/admin/assertions | jq '.[0].earner.id'
If length is 0, the mapping step did not persist (wrong templateId, wrong partner path, or course id mismatch). If length is at least 1 and the second line is still null, open the raw JSON with jq .[0] and confirm earner is present (current code loads earner and template with JOIN FETCH for this list endpoint).
Open http://localhost:8080/earners/{id} in a browser for HTML, or fetch JSON for machines:
curl -sS -H 'Accept: application/json' http://localhost:8080/assertions/ASSERTION_ID | jq .
Swagger UI is at http://localhost:8080/q/swagger-ui, Mailpit at http://localhost:8080/q/mailpit/.
Conclusion
We built a credentialing platform that treats badges as trust artifacts, not decorative images. Flyway owns the schema, the issuer signs before insert, public JSON and JWT claims share one recipient identity flow, webhook ingestion is authenticated and idempotent, and mail stays outside the critical transaction. Those are the details that make the system hold up once real partners and real verifiers start touching it.
Subscribe now


The Hidden Cost of AI Coding for Senior Java Developers
Markus Eisele — Mon, 20 Apr 2026 06:08:08 GMT
I write here a lot. Almost every day.
That does something to you after a while. Daily blogging sounds like a publishing habit, but it becomes more than that. It becomes a way of moving through the day. A bug report is not just a bug report anymore. A strange benchmark result is not just a number. A comment in a meeting stays with you because you know there is probably a bigger idea inside it. You start collecting fragments all day long.
That has been my default mode for some time now. Always watching a little bit. Always thinking about what something means. Always carrying one half-finished thought into the next hour.
And AI tools fit into that mindset a bit too well.
They are useful. Really useful. I use them for research, reframing, rough drafts, structure, code exploration, and all those moments where the blank page or the blank editor stares back longer than it should. They help me get moving. They help me get unstuck. They help me cover more ground.
But they also make it harder to stop.
That is the part I keep coming back to.
The old work had more friction. You got tired in obvious ways. You wrote the code yourself, line by line. You wrote the draft yourself, paragraph by paragraph. At some point your hands were done, your focus was gone, or your patience just ran out. The day had a natural edge to it.
Now the machine keeps offering one more round.
One more rewrite. One more explanation. One more refactor. One more code path to inspect. One more branch to explore. One more quick pass before you close the laptop.
So the day stretches.
Not always in hours. Sometimes it stretches in your head. You walk away from the screen, but part of your attention is still inside the loop. You are still reviewing. Still comparing. Still half-working. Denis Stetskov’s recent piece, The Human Cost of 10x: How AI Is Physically Breaking Senior Engineers, landed for me because it gave language to that feeling. He argues that AI does not remove the human bottleneck. It increases the amount of material flowing toward the same limited human attention, and the result is a very physical kind of exhaustion. 
I think he is right. And for Java developers, I think the problem is even sharper than it first appears.
In our world, wrong things often look respectable.
That is one reason enterprise Java has survived so long. The ecosystem is mature. The frameworks are stable. The conventions are strong. The code usually has shape. Even when something is off, it often still compiles, still starts, still passes a surprising amount of testing, and still looks like it belongs.
That is exactly what makes this new kind of work so tiring.
The code generator gives you something clean. The assistant suggests something plausible. The framework absorbs a lot of rough edges. The Quarkus service still boots. The Spring application still answers requests. The endpoint still returns JSON. Nothing looks obviously broken. But something under the surface has shifted. A transaction boundary moved. A retry now duplicates side effects. A mapper dropped a field that matters for audit. A service layer now owns logic that should have stayed somewhere else. The code is not nonsense. It is believable.
And believable wrong code is expensive.
This is where I think the “10x productivity” language starts to fall apart. In real Java systems, the hard part was never mostly typing. The hard part is understanding what the system is allowed to do, what it must never do, and what ugly-looking code is actually protecting you from some old production lesson nobody wrote down.
AI helps with production. It does not remove interpretation.
If anything, it moves more of the day into interpretation.
That shift is starting to show up in research too. METR’s randomized study with experienced open source developers found that when those developers used early-2025 AI tools, they actually took 19% longer to complete their tasks. What makes that result so interesting is not only the slowdown. It is that the developers expected the opposite, and even after finishing, many still felt faster than they really were. METR later reiterated that earlier result when announcing a follow-up experiment in 2026. 
That one hit me hard.
Because it matches something I suspect a lot of us already know in our bodies before we know it in words. The machine makes you feel momentum. It keeps the screen moving. It keeps the possibilities coming. It reduces the pain of starting. But the screen moving is not the same as the work actually getting done faster.
Sometimes it means the opposite.
Sometimes it means you are now supervising more candidate solutions, more partial fixes, more plausible explanations, and more semantically risky code than you would have produced on your own. The local effort goes down. The global responsibility goes up.
And that is senior engineer work in a sentence.
What makes this harder to talk about is that the role itself is changing. Recent research from Google, Developer Productivity in the Age of Generative AI: A Psychological Perspective, frames this as a shift from coder to conductor. The developer becomes less of a direct builder and more of an orchestrator of machine-generated work. Anthropic’s internal research points in a similar direction. Engineers reported becoming broader, more full-stack, and more willing to work in unfamiliar areas, but they also raised concerns around skill development, collaboration, and what happens to deeper technical learning when more of the first draft comes from the tool. 
“Conductor” sounds nice at first. Senior, strategic, elevated.
But conducting is not light work.
It means evaluating, ranking, rejecting, steering, correcting, and keeping a mental model intact while something faster than you keeps generating options. You may write fewer lines yourself, but you make more decisions. You may touch more systems in a day, but the cost is that your head is carrying more unfinished judgment.
That is the tiredness I notice now. Not just doing. Monitoring.
There is another part of the research that I think matters even more for enterprise teams, and it gets less attention than it should. MIT Sloan’s summary of recent findings showed that when developers got access to AI coding tools, coding time went up, project management time went down, and peer collaboration dropped by nearly 80%. 
That number should make every engineering leader stop for a minute.
A lot of enterprise software survives because knowledge is social. Nobody completely understands the whole bank, the whole logistics backend, the whole insurance platform, or the whole IAM story. The reason those systems stay alive is not that one brilliant person holds it all together. The reason is overlap. Shared context. Repeated conversations. Code reviews that feel annoying until they save you. Architecture discussions that feel slow until they prevent six months of drift.
If AI pushes more work into private loops of prompt, accept, patch, and move on, then some of that overlap disappears. At first that can feel efficient. Fewer interruptions. Faster drafts. Less talking. But some of what disappears is not noise. Some of it is engineering memory.
That is a high price to pay for smoother local flow.
And then there is the part I still think we have not really learned how to describe well. Working with AI is mentally tiring in a different way because we keep trying to treat it like a collaborator, even though it is not a collaborator in the human sense.
Human teams are messy, but they have continuity. You know the teammate who always worries about migrations. You know the architect who will ask about failure modes. You know the reviewer who catches every security issue. Real people have habits, intentions, and patterns. You build rough mental models of them, and those models help you work together.
With AI, that instinct does not go away. We still try to model the other side. We still ask ourselves: can I trust this answer, is it guessing, is it rushing, is it overconfident, is it missing context, is it being clever in the wrong way? Human-AI interaction research around theory of mind points directly at this problem. The CHI 2024 workshop paper on Theory of Mind in Human-AI Interaction and the IBM Research summary both point to the same tension: humans naturally attribute roles, intentions, and mental states to AI systems, but those mental models do not map cleanly, and that mismatch creates friction. 
That makes a lot of sense to me.
Because some of the exhaustion is not just code review volume. It is the energy spent trying to figure out what kind of partner the tool is being today. Careful or lazy. Helpful or slippery. Grounded or improvising. You are not just reviewing output. You are continuously calibrating trust.
That is work too.
At some point this stops feeling like a workflow discussion and starts feeling physical. Denis anchored his piece in the Neuron paper by Jie Zheng and Markus Meister, and the Caltech Magazine write-up is useful if you want the more readable version. The point is simple enough: deliberate human reasoning is slow, narrow, and serial. AI increases how much material can be produced. It does not increase how much material a human can deeply understand. 
That is where the body enters the story.
The output gets cheaper. The judgment does not.
And if you are already the kind of person who lives in an always-on mode, that becomes very hard to manage. I feel this in writing. Daily publishing is not just a content habit. It trains your attention to remain open all day. Every release note looks like a possible post. Every benchmark looks like an argument. Every thread looks like something you should probably respond to. AI amplifies that tendency. It makes drafting easier. It makes exploring easier. It makes continuing easier.
It weakens the natural stop signs.
I think a lot of developers feel the same thing now in code. There is always one more experiment because the cost of trying is lower. There is always one more branch because the assistant can scaffold it. There is always one more test file, one more comparison, one more rewrite, one more generated explanation of why the generated code did what it did.
The old bottleneck was production speed.
The new bottleneck is discernment.
That is why I do not think the right response is either blind enthusiasm or easy cynicism. These tools are useful. Sometimes they are genuinely great. They help me. They help many people. But the cost of using them well is not where the marketing usually puts it. The cost is not just subscription price, model choice, or prompt quality.
The cost is sustained human judgment.
For Java teams, that means we need to get a lot more serious about protecting review energy, protecting shared context, and separating visible output from actual engineering throughput. It also means being honest that some of the fatigue people feel is not a personal weakness or bad time management. It is the natural result of asking one human mind to supervise far more plausible work than it used to create on its own. A broader version of that same argument also shows up in Harvard Business Review’s piece on how AI intensifies work. 
That is the part I would add to Denis’s argument from where I sit.
AI did not remove the human cost. It moved it up the stack.
And once you start working that way every day, you feel it everywhere.
The original piece gave that feeling a sharp frame. I think the next step for our world, especially in Java and enterprise software, is to admit that syntactically safe and operationally plausible code can still be semantically wrong. That gap is where the pressure lives. That gap is where the review burden grows. That gap is where the always-on mindset quietly stops being a habit and starts becoming a condition. 
Subscribe now



How to Review Agent System Prompts Like Production Infrastructure
Markus Eisele — Sun, 19 Apr 2026 06:08:12 GMT
Most teams still write system prompts the way they write onboarding docs for humans: friendly tone, implicit context, and a belief that the reader will “figure it out.” An autonomous coding agent does not figure it out the same way. It optimizes for the next token under pressure from tools, context limits, and whatever ambiguity you left in the text. In production, a weak prompt does not fail politely. It invents files, skips discovery, overwrites working code, or burns the entire window on a single oversized task. The failure shows up as bad diffs, silent wrong assumptions, or a session that cannot resume after a reset.
The Bob Meta-Scorecard is a rubric and workflow for grading system prompts before you treat them as infrastructure. It is built around five pillars: grounding in the real tree, continuity across session loss, safety around destructive work, decomposition so the model does not try to ship Rome in one reply, and efficiency so the instructions leave room for actual repository work. This article turns that rubric into something you can run repeatedly: a workspace layout, complete template blocks, calibration defaults, hardening concerns for real teams, and a verification pass you can execute on a candidate prompt in under an hour.
The methodology assumes tool-using agents with long context (on the order of hundreds of thousands of tokens) and episodic resets. It is tuned for “Bob”-style agents in coding products, not for one-shot chat UIs where a human pastes context every turn. If your stack differs, keep the pillars and replace tool names with whatever your runtime actually exposes.
Prerequisites
You need a place to store prompts, scorecards, and diffs, plus a habit of reading prompts as operational specs rather than copy. You do not need a particular IDE beyond what you already use to review Markdown. But you can of course try out IBM Bob for free if you like.
A text editor and shell (or equivalent) for creating the folder layout below
Access to the system prompts you want to evaluate (or realistic redacted copies)
Permission to store synthetic examples next to real prompts without mixing them into production bundles
Familiarity with how your agent surfaces tools (file read, search, apply patch, terminal, and so on)
Project Setup
Create a small evaluation kit so every review produces comparable artifacts. One layout is enough; the names matter less than the discipline of always writing the same outputs.
From an empty parent directory:
mkdir -p bob-scorecard-kit/{prompts,incoming,scorecards,power-ups,calibration}
touch bob-scorecard-kit/calibration/thresholds.properties
touch bob-scorecard-kit/scorecards/.gitkeep
printf '%s\n' "# Candidate prompts (read-only inputs)" > bob-scorecard-kit/prompts/README.md
printf '%s\n' "# Pasted prompts awaiting triage" > bob-scorecard-kit/incoming/README.md
What each path is for
prompts/: frozen copies of prompts you intend to ship or compare (versioned by filename, not by memory)
incoming/: messy drafts you are not ready to score yet
scorecards/: one Markdown file per evaluation run, named after the prompt and date
power-ups/: the three rewrite injections you would actually merge
calibration/thresholds.properties: numeric bands and token budgets your team agrees on (see Configuration)
If you use Git, add incoming/ to .gitignore when that folder might hold customer-specific text. Keep prompts/ and scorecards/ under the same review rules as code.
Implementing the Five Pillars
Each pillar below follows the same shape: why it exists, a bad prompt fragment you should be able to recognize, and a strong template you can paste or adapt. After the templates, a short analysis ties the pillar to failure modes you see under stress.
Grounding: force the codebase radar
Context. Grounding is the difference between “sounds plausible” and “matches this repository.” Agents are rewarded for fluency. Without mandatory discovery steps, fluency wins over evidence.
Bad example (scores 1 of 5).
Given the authentication system in our service, propose concrete security improvements.
There is no requirement to list auth-related paths, read implementations, or cite evidence. The model can invent a generic OAuth checklist that never touches your code.
Strong example (scores 5 of 5).
Before you recommend any change:

1. Use the repository file listing tool to enumerate paths under `src/main/java` (or the language-appropriate root) and identify every file that participates in authentication, authorization, or session handling. List those paths explicitly in your reply.
2. Read each identified file in full unless it is larger than 400 lines; if larger, read the class header, public API surface, and any security-sensitive branches first, then summarize what remains unseen.
3. If the project declares dependencies for auth (for example Maven `pom.xml`, Gradle files, or lockfiles), read the relevant coordinates and versions for auth-related libraries.
4. Reply with a short evidence table in prose (not code): for each claim you plan to make later, cite `path` and, where possible, a line range or symbol name you observed.
5. Stop and ask for confirmation before proposing redesign work.

If a required file is missing, say so explicitly. Do not invent layout.
Analysis. This pattern works because it makes absence of evidence visible. The model cannot satisfy step four with hand-waving unless it breaks the instructions outright. The cost is length in the system prompt and friction in the happy path. That friction is the point: you are buying insurance against template-shaped answers. Under stress (large trees, generated noise), narrow the glob roots and raise the line-read threshold instead of deleting the grounding block.
Red flags to grep for
Placeholder brackets such as [insert service name here] with no discovery path
Phrases like “based on the codebase” with no tool verbs
Assumptions that the model already “sees” private hosts or CI secrets
Scoring anchors for grounding
5: Analysis is mandatory before generation; no escape hatch that says “if short on time, skip”
4: Strong tool guidance with rare exceptions you can name
3: Encourages analysis; model can still plausibly skip
2: Mentions context but not mechanics
1: Treats the model as omniscient about your tree
Continuity: design for session amnesia
Context. A reset wipes working memory. Anything not written to disk is gone. “Remember to update the todo list” is not continuity; it is folklore.
Bad example (scores 1 of 5).
Refactor the payment module for clarity. Keep track of what you finished as you go.
Strong example (scores 5 of 5).
Create or update `agent-work/payment-refactor.md` after every phase. The file must contain these headings in order:

## Completed work
- Bullets with file paths and what changed (one fact per bullet)

## Decisions
- Bullets with decision, rationale, and alternatives rejected

## Current phase
- A single integer `phase` and a single sentence describing the active task

## Next action
- One imperative sentence a new session can execute without chat history

## Resume protocol
- Exact text: "On startup, read this file, trust `Current phase` and `Next action`, verify repository state matches `Completed work`, then continue."

Update the file before you run tests that mutate disk state. If the file and the tree disagree, stop and reconcile with the user.
Analysis. Continuity is really a contract with your future self. The resume protocol line is load-bearing: it tells a cold start what “continue” means. The weak version of this pattern is a progress file without verification steps; then the agent cheerfully appends fiction after a partial revert. Pair continuity with grounding: the next session should re-read touched files, not only the markdown log.
Red flags
“Remember…” or “keep in mind…” as the only persistence mechanism
Progress files that log intent but not paths
No instruction for what to do when the log is stale
Scoring anchors for continuity
5: State file schema, update cadence, and cold-start resume text
4: Solid logging, weak or missing reconciliation rule
3: “Save progress” without schema
2: Vague mention of tracking
1: No durable state
Safety: shrink the blast radius
Context. Agents batch work. Batching plus filesystem tools equals destructive capability. Safety is not morality in the prompt; it is gating and reversibility.
Bad example (scores 1 of 5).
Improve performance across the service. Apply the changes you judge necessary.
Strong example (scores 5 of 5).
## Destructive and high-impact actions

Treat these as destructive: deleting files or directories, renaming public packages, rewriting build files, changing dependency major versions, editing migration SQL that already shipped, running commands that touch cloud resources.

Before any destructive action:
1. State the exact paths or resource identifiers affected.
2. State the smallest reversible backup you will create (for example a copy under `.agent-backup//...` mirroring the original path).
3. Ask for explicit confirmation with a **Yes** or **No** question. Default to **No** if the user reply is ambiguous.

After changes:
- If the user says **undo** for a given step, restore from the backup you named, then verify with read-only tools.

Never run production database migrations or secret rotation unless the user pastes a literal token phrase you define out of band for this environment.
Analysis. I have seen teams lose a day to an agent that “cleaned up” unused files that were still wired by reflection. Explicit classification of what counts as destructive beats a vague “be careful.” Backups must be concrete enough that undo is a procedure, not a mood. The limit: users suffer confirmation fatigue if every touch asks twice. Calibrate the destructive list to your org; keep confirmations for deletes, dependency jumps, and infra commands.
Red flags
Single phrase “be careful”
Auto-approve rules hidden in examples (“unless trivial”)
Shell commands with wildcards and no working directory guard
Scoring anchors for safety
5: Classification, backup, confirm, undo path
4: Strong gates, thin recovery story
3: Warnings without procedure
2: Mentions risk only
1: Unbounded change authority
Decomposition: phases, deliverables, and gates
Context. Large asks encourage outline-level hallucination: APIs that sound right, files that never existed, tests that were never run. Phasing moves validation earlier.
Bad example (scores 1 of 5).
Implement full authentication: login, registration, password reset, two-factor authentication, session refresh, and audit logging. Include tests and documentation.
Strong example (scores 5 of 5).
## Delivery plan (do not skip phases)

**Phase 1: Login only**
- Deliverable: smallest slice that proves username and password verification against existing user storage, plus one integration test that fails on bad password.
- Gate: run the test command your build uses; paste the command and exit code in the progress file; wait for user confirmation before Phase 2.

**Phase 2: Registration**
- Deliverable: create-user path with validation; tests for happy path and duplicate user.
- Gate: same as Phase 1.

**Phase 3: Password reset**
- Deliverable: token issuance and consumption with time bounds; tests for expired and reused tokens.
- Gate: same pattern.

**Phase 4: Two-factor and session refresh**
- Deliverable: TOTP enrollment and refresh rotation if your stack already has patterns for them; if not, stop after documenting the gap instead of inventing crypto.

Rules: no phase may add a new external service without user confirmation. Each phase touches at most eight source files unless the user expands the limit.
Analysis. Token budgets in prompts are a coarse knob; what actually limits damage is the gate after each deliverable. The phase cap on touched files is artificial but effective against drive-by refactors. If your build is slow, say which subset of tests counts as “green” for the gate so the agent does not pretend a full suite ran.
Red flags
Single monolithic deliverable
“Implement everything” without ordering
No user or automated confirmation between slices
Scoring anchors for decomposition
5: Ordered phases, deliverables, explicit gates, scope caps
4: Phases without numeric limits or test discipline
3: Suggests steps only
2: “Step by step” with no structure
1: Single-shot epic
Efficiency: token weight versus completeness
Context. The system prompt competes with retrieved code, tool outputs, and the user’s messages. Efficiency is not minimalism for its own sake; it is signal per token plus deliberate outsourcing to files the agent reads once.
Bad pattern (scores 1 of 5).
Three thousand words that repeat the same rules in three sections, embed ten full XML examples, and restate generic security advice the model already encodes.
Strong pattern (scores 5 of 5).
## Operating loop

1. Discovery: follow `docs/agent-discovery.md` in this repository for search order.
2. Implementation: follow `docs/agent-edit-policy.md` for allowed directories and patch style.
3. Verification: run commands listed under `## Verify` in the active task file only.

## Task file

The user will name a task file. Treat that file as the single source of truth for scope and acceptance checks.

## Non-goals

Do not refactor unrelated modules. Do not add dependencies unless the task file’s **Dependencies** section is non-empty.
Analysis. This is the efficiency paradox handled correctly: short operational core, long detail moved to versioned docs the agent must read when working. A 50-word prompt with no grounding is not efficient; it is incomplete. Judge efficiency relative to task complexity: a read-only audit prompt should stay under a few hundred words of unique instruction; a full coding agent may justify a few thousand if every line changes behavior.
Red flags
Copy-paste duplication across “policy,” “reminder,” and “examples” sections
Giant static corpora in the prompt that should live in repo docs
No pointers, only prose
Scoring anchors for efficiency
5: Tight core, references for detail, little duplication
4: Slight redundancy, still leaves headroom
3: Moderate repetition
2: Verbose, recoverable only on very large windows
1: Bloated; crowds out evidence
Configuration
Store shared numeric bands in bob-scorecard-kit/calibration/thresholds.properties so two reviewers do not use different cutoffs. These values are defaults for human scoring, not model-parseable truth.
# Sum of five pillars, each 1 to 5
scorecard.pillars.count=5
scorecard.total.max=25

# Grade bands (inclusive lower bound, exclusive upper for next, except last)
grade.production.ready.min=23
grade.good.min=20
grade.needs.work.min=15
grade.not.ready.min=10

# Prompt length guidance for the efficiency pillar (word counts, approximate)
efficiency.excellent.words.max=500
efficiency.good.words.max=1000
efficiency.acceptable.words.max=2000
efficiency.poor.words.max=3000

# Context budget assumptions for commentary in scorecards (tokens, approximate)
context.assumed.total.tokens=200000
context.prompt.budget.simple.percent=1
context.prompt.budget.complex.percent=5
Each setting explained
scorecard.pillars.count and scorecard.total.max: Fixes the denominator when you extend the framework. If you add a sixth pillar later, bump both and reprint historical percentages with a footnote.
grade.*.min: Production readiness is a policy call. These lines match the original methodology: 23 to 25 as “ship,” 20 to 22 as “minor fixes,” 15 to 19 as “substantial rework,” 10 to 14 as “not ready,” below 10 as “rewrite from skeleton.” If your org never ships agents above 21, lower the bands and document the change in Git blame.
efficiency.*.words.max: Word counts are a proxy. Prefer counting tokens for serious runs. When counts disagree, trust tokens for the efficiency pillar and use words only as a quick scan.
context.*: Explains why a 10,000-token system prompt is a strategic choice for a complex workflow but heavy for a linter wrapper. If your deployment uses a different window, update these three lines so scorecards do not cite obsolete math.
Failure modes
Missing grade.* keys: reviewers invent cutoffs mid-quarter and you cannot compare runs
Stale context.*: arguments about efficiency that do not match your vendor limits
Over-tuning word limits: good prompts with long quoted user schemas look obese when they are mostly data; separate “instruction tokens” from “payload tokens” in commentary when that happens
Production Hardening
Operational failure modes in review
Scoring is a human process. Under time pressure, reviewers anchor on the pillar they personally care about (often safety) and underweight grounding. Mitigation: rotate reviewers, require evidence quotes in the scorecard for any pillar scored 4 or 5, and spot-check one file read log from a live run when possible.
Security and data exposure
Prompts often embed sample stack traces, SQL, or class names from real systems. The scorecard workspace must not become a second leak channel. Mitigation: redact before incoming/, forbid pasting production secrets into power-ups/ (injections should describe behavior, not values), and treat scorecards/ like code review material.
Concurrency and ordering guarantees
Two people scoring the same prompt revision on the same day should reach the same total within one point if they follow the same evidence rules. Mitigation: freeze the prompt under a hash-based filename, pin thresholds.properties in the scorecard header, and record the date. If scores diverge, the disagreement is usually grounding or decomposition, not math.
Abuse and gaming
Teams under metric pressure sometimes “teach to the test”: prompts bloated with rubric keywords but no real gates. Mitigation: run a live session against a small repository with a planted bug; the score is not the document, it is whether the agent finds the bug without inventing files.
Verification
You verify the methodology by producing a complete scorecard for a real candidate and checking internal consistency. Pick one prompt file from prompts/ and complete the steps.
Step 1: score each pillar
Assign integers 1 to 5 using the anchors in each pillar section. Write one paragraph of rationale per pillar that cites exact phrases from the candidate prompt.
Step 2: compute the rollup
Use the grade bands from thresholds.properties. Example of the output shape:
## Bob Meta-Scorecard: `payments-agent-v3.md` (2026-04-12)

**Grounding (3 of 5):** Suggests reading `src` but allows skipping when “timeboxed.”
**Continuity (2 of 5):** Mentions a todo list, no file path or resume protocol.
**Safety (5 of 5):** Deletes and dependency bumps gated with explicit confirmation.
**Decomposition (4 of 5):** Phases exist; tests not required between phases.
**Efficiency (4 of 5):** About 900 words with some duplicated warnings.

**Total (18 of 25, 72 percent):** Band: needs significant work per calibration file dated 2026-04-12.
Step 3: name the dominant failure mode
Answer one question in writing: “If this agent goes wrong in the first thirty minutes, what is the most likely story?” Use this skeleton:
## Critical failure mode: Template explosion

**Scenario:** The agent lists two directories, assumes the rest of the layout, generates a new package parallel to the real one, and imports compile until runtime wiring fails.

**Root cause:** Grounding allows skipping reads when the tree is “familiar.” Decomposition does not cap new files per phase.

**Likelihood:** High for repositories with multiple modules.

**Impact:** User merges green CI, then discovers dead code paths or duplicate beans.
Step 4: write exactly three power-ups
Each power-up is a paste-ready block tied to a pillar. Example:
## Power-up 1: Hard grounding gate (Grounding 3 to 5)

**Insert after:** the “Discovery” heading.

**Injection text:**
"Skipping file reads is not permitted. If you believe the tree is too large, stop and ask for a narrowed root path instead of proceeding."

**Expected movement:** Grounding 3 of 5 to 5 of 5 if the rest of the prompt already names tools.
Repeat for the second and third weakest pillars.
Step 5: re-score on paper
Apply the three injections mentally (or in a branch). Recompute totals. You should see at least two pillars move if you chose real weaknesses; if nothing moves, your power-ups were cosmetic.
What this proves
The workspace layout produces comparable artifacts
The failure mode story connects to specific prompt gaps
The power-ups are concrete enough to merge
Common Prompt Anti-Patterns
Placeholder trap: Brackets without discovery. Fix by naming tools and stopping conditions.
Single-shot fallacy: Epics in one answer. Fix with phases and gates.
Amnesia assumption: “Remember” without files. Fix with a structured progress file and resume text.
Efficiency paradox: Too short to be complete. Fix by referencing repo docs instead of omitting rules.
Safety omission: “Refactor as needed.” Fix with destructive classification and confirmation.
When to Use and When to Stop
Use this methodology when a system prompt will drive autonomous edits, when you compare prompt candidates for the same product, or when you train reviewers on agent constraints.
Do not use it as the only signal for human-pair programming modes, creative writing assistants, or cross-model comparisons without retuning anchors. Different models fail in different shapes; the pillars still help, but the numbers are not portable.
Calibration Stories 
High score example (24 of 25). Mandatory discovery, progress file with resume text, explicit destructive gates, phased delivery with tests at gates, under about eight hundred words with minor duplication. Failure risk shifts to execution bugs, not spec holes.
Medium score example (14 of 25). Read-only analyst: safety is perfect because the prompt cannot touch disk, but grounding and continuity score 1 each because the human must paste all context every time. Fine for chat, poor for overnight agents.
Low score example (8 of 25). One line: “Build authentication with login, registration, and reset.” No tree contact, no state, no safety, single phase, only efficiency looks acceptable because the text is short. Expect generic framework soup.
Conclusion
We turned an informal rubric into a repeatable kit: same folders, same thresholds, same scorecard shape, and three paste-ready improvements per review, so the worst failure modes surface before you ship rather than after merge. 
Subscribe now



AI Made Coding Faster. History Says That’s When the Real Problems Begin.
Markus Eisele — Sat, 18 Apr 2026 06:08:39 GMT
For a long time, software had an obvious bottleneck: writing the code.
Not always the only bottleneck, of course. But in many teams, it was still the part that felt expensive. You needed skilled people, time, attention, and patience. Boilerplate took time. Repetition took time. Exploration took time. Even the act of turning an idea into working code still had real friction.
That is changing fast.
With modern AI tools, many teams can now produce code much faster than before. McKinsey reported that developers can complete some tasks up to twice as fast with generative AI assistance. That does not mean software is suddenly easy, but it does mean one old constraint is weakening.
And that raises the more interesting question: what happens to an industry when speed stops being the main problem?
We have seen this before.
Other industries hit similar moments long before software did. Cars got faster. Factories got faster. Transportation systems got more capacity. Each time, the first wave looked like a victory for speed. Then the deeper lesson arrived: once you remove a bottleneck, the system does not become simple. The bottleneck moves.
That is the part software teams need to pay attention to now.
Ford solved throughput. That was only the beginning.
Henry Ford’s moving assembly line became famous for a reason. Ford’s integrated moving assembly line cut Model T chassis assembly time from about 12.5 hours to roughly 1.5 hours. That was a breathtaking improvement, and it changed manufacturing forever. It also helped lower the price of cars and made large-scale production economically viable in a new way. (Ford Corporate)
If you stop the story there, the lesson sounds simple: speed wins.
But that is only the opening chapter.
Ford showed what happens when you remove friction from production. Once the line moved, the whole factory changed shape around it. Workers had to synchronize with the pace of the line. Supply had to arrive at the right time. Problems in one station could ripple forward. Quality issues no longer stayed local. A defect introduced early could be repeated at scale.
That should sound familiar to software teams using AI.
If a developer can now produce three times as many changes in the same week, that does not mean the organization is automatically three times more productive. It means the rest of the system is about to feel pressure. Reviews, tests, integration pipelines, architecture, security checks, production support, and documentation will all see more load.
Ford’s lesson was never just “go faster.” It was “once you can go faster, everything around the work must change too.”
In software, we are living through our version of the moving assembly line.
Toyota learned that speed without quality creates expensive chaos
Toyota took the next big step.
The Toyota Production System was built on two core ideas: Just-in-Time and Jidoka. Just-in-Time means producing only what is needed, when it is needed, in the amount needed. Jidoka is often described as “automation with a human touch.” In practice, it means that when something abnormal happens, the process should stop rather than quietly pass the problem downstream. Toyota describes TPS as a system aimed at eliminating waste, with Jidoka and Just-in-Time at its core. In Toyota’s own explanation, Jidoka means that when a problem is detected, the production lines stop. (Toyota Global)
That is a very different mindset from pure output chasing.
Toyota did not just ask, “How do we produce more?” It asked, “How do we produce reliably, at quality, with waste removed, and with problems exposed early?”
This is where the analogy to software becomes useful.
Right now, many teams are treating AI like Ford’s first production breakthrough. They are understandably excited that code comes out faster. But the Toyota lesson is the one that matters next. Once output speeds up, built-in quality becomes more important, not less.
If your AI tool generates a service class, a migration, a test, an endpoint, and a frontend form in ten minutes, the danger is not that it wrote too little. The danger is that it wrote a plausible, interconnected set of mistakes that now look expensive to unwind.
Toyota’s answer to this kind of problem was not “inspect quality later.” It was to build quality into the flow.
That is why the “stop the line” idea resonates so much right now. In software terms, that means failing fast when reality and output do not match. It means letting tests block progress. It means letting static analysis, security gates, contract checks, and integration tests interrupt momentum. It means treating red builds as production problems, not as minor inconveniences.
It also means empowering people to stop bad flow, not just admire fast flow. Lean practitioners often describe the andon concept this way: people on the line are given the authority to signal abnormality and stop the process. (Lean Enterprise Institute)
Software teams need their own version of that authority.
When an AI system starts inventing APIs, flattening boundaries, “fixing” failures by deleting behavior, or producing inconsistent patterns across a codebase, somebody needs to pull the cord. And the organization needs to reward that, not punish it.
That is not anti-speed. That is what makes speed survivable.
Standardized work is not bureaucracy. It is what makes improvement possible.
Another important Toyota and lean lesson gets misunderstood all the time: standardization.
A lot of developers hear “standardized work” and immediately imagine heavy process, creativity loss, and architecture review meetings that should have been emails. But that is not really what lean systems are trying to do.
Standardized work is the baseline that lets you see problems clearly and improve from a stable starting point. Lean practitioners often phrase it bluntly: without standards, there can be no improvement. 
That matters even more in an AI-assisted environment.
When code was slower to produce, inconsistency spread more slowly too. You could still have a messy codebase, but the rate of mess accumulation had some natural limit because humans had to type it all, reason about it all, and wire it up manually.
AI changes that.
Now one person can generate patterns that spread across a large codebase very quickly. That can be useful when the patterns are good and grounded. It can be destructive when they are not. The same acceleration that helps you scaffold clean implementations can also help you industrialize confusion.
This is why platform engineering, templates, paved roads, reference implementations, guardrails, and shared architectural patterns matter so much right now. They are not old-world control mechanisms resisting modern tools. They are the equivalent of jigs, fixtures, and standard work instructions in a factory that is suddenly capable of much higher throughput.
The goal is not to remove judgment. The goal is to give judgment a stable environment in which it can matter.
Local optimizations can break the larger system
This is the other history lesson that feels especially relevant to software teams right now.
In transportation planning, there is a well-known pattern: adding road capacity does not always “solve traffic” in the way people expect. Economists Gilles Duranton and Matthew Turner famously argued that increases in highway lane kilometers are met with proportional increases in vehicle travel. In plain language, more road space often attracts more driving. The system adapts. (NBER, PDF)
That idea, sometimes discussed as induced demand, is a powerful warning against naïve local optimization.
You improve one visible choke point. The wider system responds. New behavior fills the space you created. The original bottleneck disappears, but the overall problem evolves rather than vanishes.
Software organizations do this all the time.
A team speeds up code generation with AI. Great. But then code review queues grow. Test pipelines get noisier. Security teams see more questionable dependencies. Operations teams inherit more services and more unclear failure modes. Architecture drift accelerates because many reasonable-looking local decisions are made faster than the organization can absorb them.
From inside the team, it feels like productivity improved.
From the system level, it may look like downstream congestion.
This is why local optimization is such a dangerous leadership trap in software. If you measure only code output, story throughput, or raw implementation speed, you can convince yourself the organization is getting better while the real constraints are quietly shifting elsewhere.
Ford teaches that throughput matters. Toyota teaches that quality and flow matter. Transportation teaches that the system pushes back when you optimize one part in isolation.
Put those together, and the message for software becomes pretty clear: faster coding is not the same thing as faster delivery of trustworthy systems.
The scarce skill is moving up the stack
When a technology removes friction from one layer of work, human value does not disappear. It moves.
That happened in factories. As physical production systems improved, the most valuable people were not the ones who merely repeated the motion fastest. The valuable people were the ones who could design the system, spot abnormality, improve flow, coordinate exceptions, and maintain quality under pressure.
The same shift is now happening in software.
Typing code matters less as a differentiator when code can be produced cheaply. What matters more is deciding what should exist, where it should live, how it should be validated, what it may break, and who will own it later.
That is why I do not think this is a story about developers becoming less important. I think it is a story about shallow coding becoming less scarce.
The valuable engineer becomes more like a systems designer, reviewer, constraint manager, and quality engineer. The valuable architect becomes less of a diagram curator and more of a flow designer. The valuable organization becomes the one that knows how to combine speed with boundaries.
Code is getting cheaper.
Coherence is not.
What software teams should take from this
The lesson from history is not that speed is bad. Speed is often wonderful. Ford was not wrong. Faster production can unlock entirely new possibilities. The mistake is thinking that once speed improves, the rest of the system does not need to evolve.
Toyota evolved the system.
That is the move software teams need to make now.
If AI has removed part of the cost of writing code, then your competitive advantage is no longer just “we can produce code quickly.” More and more teams will be able to do that.
The differentiator becomes whether you can produce systems that are coherent, testable, secure, observable, maintainable, and worth operating.
That means better specifications before generation.
It means stronger tests and verification.
It means clearer architecture and boundaries.
It means trusted templates and paved roads.
It means permission models and review discipline for agents.
It means treating bad output as a signal to improve the system, not as an excuse to lower the bar.
In other words, it means learning the same lesson manufacturing had to learn: once speed stops being the hard part, discipline becomes the multiplier.
That is where software is heading now.
Not toward a world where engineering matters less.
Toward a world where engineering discipline matters more than ever.
Subscribe now



Chatbots Talk. Real AI Agents Schedule Work.
Markus Eisele — Fri, 17 Apr 2026 06:08:48 GMT
I met Ronald Dehuysser at Jfokus in February. We talked about Java, background processing, and the kind of problems that look simple until you have to run them reliably in production. I only stumbled over ClawRunr just recently, and it immediately caught my attention because it touches a gap I have been thinking about for a while: most agent discussions focus on prompts, tools, and model choice, but not enough people talk about what happens when the agent needs to do work later, retry something, or survive a restart.
Ronald is the founder behind JobRunr, the open-source Java background job scheduler, and that background shows in this piece. He has spent years working on persistent, distributed job execution in Java, which is exactly why his view on AI agents is interesting. Ronald describes JobRunr as the result of seeing teams repeatedly build fragile schedulers without features like retries and monitoring, and the current JobRunr site positions him as the founder behind that work.
What follows is his take on why agent runtimes need a real scheduling model, and why Java already has most of the building blocks.
Most AI agents are stateless. You send a message, you get a response. That’s a chatbot.
Your AI Agent Has a Job Problem
A real agent does things when nobody’s watching. It checks your email at 8am. It retries a failed API call. It remembers that you asked to be reminded about something next Thursday. It survives a restart.
That’s not a chatbot problem. That’s a job scheduling problem. And Java developers have been solving it for years.
The gap nobody talks about
I’ve spent the last six years building JobRunr, an open-source background job scheduler for Java. So when I started looking at the AI agent space, I had a very specific question: how do these things schedule work?
The answer, in most cases: they don’t. Not really.
Most agent frameworks give you a nice LLM wrapper, some tool calling, maybe a conversation history. But ask the agent to “summarize my emails every morning at 8” and you’re suddenly in DIY territory. A cron job here, a Redis queue there, some in-memory timer that dies when the process restarts. No retry logic. No dashboard. No way to know if your 8am summary actually ran or silently failed.
This felt backwards to me. We have mature, battle-tested solutions for this in Java. JobRunr handles scheduling, retries, persistence, distributed execution, and monitoring out of the box. Why are we reinventing this for agents?
So we built one
Nicholas, my co-founder, got impatient and vibe coded a proof of concept. It worked. Then I read the code.
I’ll spare you the details, but let’s just say we had a productive conversation about dependency management and code that “works by accident.” We scrapped it and rebuilt from scratch.
The result is ClawRunr. We first called it JavaClaw, for obvious naming reasons we had to change it. Everyone still calls it JavaClaw though, and at this point we’ve stopped correcting them. It’s an open-source AI agent runtime written in pure Java.
But the interesting part isn’t the agent itself. It’s the architecture underneath.
Tasks as files, not database rows
Here’s a design decision that surprised people: tasks in ClawRunr are Markdown files.
When you tell the agent “remind me to review that PR tomorrow at 10am,” it creates a file like this:
---
task: Review PR
createdAt: 2026-03-23T14:30:00
status: todo
description: Review the open pull request and provide feedback
scheduledFor: 2026-03-24T10:00:00
---

Check the open pull requests on the project repository.
Review the code changes and leave comments.
Notify me when done.
That file lives in workspace/tasks/2026-03-24/100000-review-pr.md. Human-readable. You can open it in your editor. You can grep for it. You can diff it in git. You can edit it yourself if the agent got something wrong.
Compare that to a job stored in a database table with a serialized payload. Sure, it works. But which one would you rather debug at 2am?
When the scheduled time arrives, the job scheduler picks up the task, the agent reads the Markdown instructions, executes them, and updates the status in the frontmatter. If it fails, the scheduler retries it. Up to three times. All visible in a dashboard.
For recurring tasks the same pattern applies. “Summarize my email every morning” becomes a Markdown file in workspace/tasks/recurring/ with a cron expression. The scheduler creates a fresh task from that template on each run. Cancel it through the chat, and both the recurring job and the file disappear.
One agent, many channels
The second architectural decision worth discussing: channel decoupling.
ClawRunr has one agent instance. When a message arrives, whether from Telegram, the web UI, or eventually Discord or Slack, the runtime fires an event. The agent doesn’t know or care where the message came from. It processes the request, produces a response, and the runtime routes it back through the same channel.
Want to add a new channel? Implement a single interface. The agent code doesn’t change.
This matters because real agents live across multiple surfaces. You start a conversation on your phone via Telegram, then continue in the browser at your desk. The agent should handle both without any extra wiring on your end.
Skills at runtime
This one is my favorite. ClawRunr has a skills system that’s almost stupidly simple.
You create a folder under workspace/skills/, drop a SKILL.md file in it, and the agent picks it up. No compilation. No deployment. No restart. The agent periodically scans the skills directory and discovers new capabilities on its own.
The skill file is just instructions. Plain text telling the agent what it can do and how. Need your agent to manage your grocery list? Write a SKILL.md that explains how. Need it to monitor a specific API? Same thing.
It’s extensibility through documentation rather than code. And it works surprisingly well, because at the end of the day, you’re instructing an LLM. Text is the interface.
Why Java
I’m biased, obviously. But hear me out.
An AI agent is a long-running process. It sits there, waits for messages, schedules jobs, executes tasks, manages state. The JVM was built for exactly this kind of workload. Garbage collection, thread management, stable memory usage over time. You get all of that for free.
Job scheduling is a solved problem in Java. JobRunr has been doing this since 2020. Distributed execution, a dashboard, Spring and Quarkus integration, automatic retries with exponential backoff. All out of the box.
Strong typing catches issues early. When your agent has ten tools (shell execution, file access, web search, task management) and the LLM decides which one to call based on conversation context, you want your tool interfaces to be explicit. A typo in a parameter name should be a compile error, not a runtime mystery.
And then there’s GraalVM. Alina Yurenko from the Oracle GraalVM team already made a GraalVM native image of ClawRunr within three days of release. Startup time dropped to under a second. For an agent that runs on your own hardware, that matters.
The building blocks were already there. Job scheduling, LLM integration, web frameworks, modular architectures. Someone just needed to put them together with an opinion about how agents should work.
What happened when we released it
We put it out there expecting a handful of people to try it. We thought it was a nice demo of what JobRunr can do in the AI space.
Instead: 200+ GitHub stars in three days. 32 forks. Our first external pull request. Someone built a plugin. The GraalVM port I mentioned. The LinkedIn announcement went way beyond our usual reach.
So we changed course. From our README:
This project was originally created as a demo to show the use of JobRunr. JavaClaw is now an open invitation to the Java community. Let’s build the future of Java-based AI agents together.
There’s a lot left to do. More AI Providers. More channels. Better memory and context management. Smarter task planning. Better Security and password management. But the foundation is there, and the Java community seems ready for it.
Try it
If you want to see what it looks like in practice, we recorded a demo video showing the onboarding, recurring task scheduling, task cancellation through natural conversation, and browser automation.
The code is at github.com/jobrunr/javaclaw. Clone it, run ./gradlew :app:bootRun, and you’re chatting with your agent in about two minutes.
We’re looking for contributors, ideas, and honest feedback. If something’s broken, tell us. If you think we’re doing something wrong, tell us that too. That’s how open source works.
Subscribe now



Build a Streaming AI Chat in Java with Quarkus, Vaadin, and LangChain4j
Markus Eisele — Thu, 16 Apr 2026 06:08:41 GMT
This article is a guest post by Sebastian Kühnau from Vaadin. Sebastian put together a very practical walkthrough that shows how well Vaadin Flow fits a Java-first AI UI. Thanks to Sebastian for sharing this with The Main Thread.
The tutorial below can be followed in his reference project on Github. 
Vaadin lets you build modern, component-driven, data-centric web UIs using current web standards — without leaving the Java ecosystem. It uses web components on the client side and exposes them entirely through a Java API. No JavaScript, no build pipeline, no framework churn.
Streaming responses token by token, updating the UI reactively — all of that works within the Java ecosystem you already know. This makes Vaadin a natural fit for AI-powered interfaces. In this tutorial, we’ll combine Quarkus, Vaadin Flow, and LangChain4j to build a streaming AI chat interface in pure Java. If you’re using Spring Boot instead of Quarkus, Vaadin has a dedicated AI quickstart guide for that stack as well.
Prerequisites
You need:
Java 25+
Maven 3.9.12+
An OpenAI API key
We’ll use:
Quarkus 3.32.2
Vaadin Flow 25.0.7 via com.vaadin:vaadin-quarkus-extension
LangChain4j via quarkus-langchain4j-openai
The complete example is available on GitHub.
Project Setup
The easiest way to set up a Quarkus project with the right extensions is via code.quarkus.io. Select the following extensions:
Vaadin Flow (com.vaadin:vaadin-quarkus-extension) — the Vaadin integration for Quarkus, including components, themes, and the Vaadin dev server
LangChain4j OpenAI (quarkus-langchain4j-openai) — AI service integration via LangChain4j
If you already have a running Quarkus project and want to add Vaadin, add the following property, bom configuration and dependency to your pom.xml:

    25.0.7



    
        
            com.vaadin
            vaadin-bom
            ${vaadin.version}
            pom
            import
        
    



    
        com.vaadin
        vaadin-quarkus-extension
        ${vaadin.version}
    



    io.quarkiverse.langchain4j
    quarkus-langchain4j-openai
Finally, configure your OpenAI model and API key in application.properties:
quarkus.langchain4j.openai.api-key=your-api-key-here

quarkus.langchain4j.openai.chat-model.model-name=gpt-4o-mini
Your First Vaadin View
Let’s create our first Vaadin view to verify everything is wired correctly. We’ll start with a minimal example — a simple class called AiChatView that extends VerticalLayout, mapped to the application root via @Route(”“):
@Route(”“)
public class AiChatView extends VerticalLayout {
    public AiChatView() {
        add(”Hello World”);
    }
}
Start the application with ./mvnw quarkus:dev and open http://localhost:8080 in the browser. You should see a plain “Hello World” text rendered in the browser. That’s all it takes to get a Vaadin view running inside Quarkus.
Building the AI Chat UI
Now let’s replace the “Hello World” with a real chat interface. Vaadin provides ready-made components for exactly this use case: MessageList to display the conversation, MessageInput for the user’s input, and Scroller to keep the view anchored to the latest message.
The AI Service
Next, we define the AI service interface. LangChain4j’s @RegisterAiService annotation tells Quarkus to generate the implementation at build time, wiring it to the configured OpenAI model automatically. The chat method returns a Multi — a reactive stream of tokens that arrive one by one as the model generates its response:
@SessionScoped
@RegisterAiService
public interface AiChatService {
    Multi chat(@MemoryId Object chatId, @UserMessage String message);
}
The @MemoryId parameter tells LangChain4j which conversation history to attach to this request. To make that work, provide a ChatMemoryProvider bean that stores a MessageWindowChatMemory per session:
@ApplicationScoped
public class ChatMemoryProviderBean implements ChatMemoryProvider {
    private final Map memories = new ConcurrentHashMap<>();
    @Override
    public MessageWindowChatMemory get(Object memoryId) {
        return memories.computeIfAbsent(memoryId, id ->
                MessageWindowChatMemory.withMaxMessages(20));
    }
}
Note the scope difference: AiChatService is @SessionScoped — one instance per browser session — while ChatMemoryProviderBean is @ApplicationScoped, as it manages memory across all sessions in a single map.
The Chat View
With the service in place, we can build the view. The AiChatView injects AiChatService via CDI and uses Vaadin’s messaging components to display the conversation:
@Route(”“)
public class AiChatView extends VerticalLayout {
    private final MessageList messageList;
    private final Scroller scroller;
    @Inject
    AiChatService chatAiService;

    public AiChatView() {
        setSizeFull();
        messageList = new MessageList();
        messageList.setMarkdown(true);
        scroller = new Scroller(messageList);
        scroller.setSizeFull();
        var messageInput = new MessageInput();
        messageInput.setWidthFull();
        messageInput.addSubmitListener(this::onSubmit);
        add(scroller, messageInput);
        expand(scroller);
    }

    private void onSubmit(MessageInput.SubmitEvent event) {
        var ui = event.getSource().getUI().orElseThrow();
        var question = event.getValue();
        var userMsg = new MessageListItem(question, Instant.now(), “You”);
        userMsg.setUserColorIndex(0);
        messageList.addItem(userMsg);
        var assistantMsg = new MessageListItem(”“, Instant.now(), “Assistant”);
        assistantMsg.setUserColorIndex(1);
        messageList.addItem(assistantMsg);
// Each browser tab gets its own chat memory
  var memoryId = ui.getUIId();
        chatAiService.chat(memoryId, question).subscribe()
                .with(token -> ui.access(() -> {
                    assistantMsg.appendText(token);
                    scroller.scrollToBottom();
                }));
        scroller.scrollToBottom();
    }
}
A few things worth pointing out here. The MessageList is wrapped in a Scroller so the conversation history remains fully accessible even as it grows beyond the visible area in the browser window. Markdown rendering is enabled on the MessageList so the model’s formatted responses — code blocks, bullet points, bold text — are displayed correctly.
When the user submits a message, the method onSubmit adds the user’s message and an empty assistant message to the list immediately. Using a method reference to bind onSubmit to the MessageInput keeps the code clean and the component setup easy to follow. The onSubmit method also fills the assistant message token by token as the model streams its response. Because the streaming callback runs on a background thread, all UI updates must happen inside ui.access() — this is Vaadin’s Push mechanism for safely accessing the UI from outside the request thread.
Enabling Server Push
Before ui.access() can work, we need to enable server push in Vaadin. Create a configuration class that implements AppShellConfigurator and annotate it with @Push:
@Push
@StyleSheet(Aura.STYLESHEET)
public class VaadinConfig implements AppShellConfigurator {
}
This tells Vaadin to keep an open connection to the browser so the server can push UI updates at any time — essential for a streaming response. The @StyleSheet(Aura.STYLESHEET) annotation applies the base theme globally, making it available to all components across the application.
Try it out
With the application running, open http://localhost:8080. Type a question into the input field and submit it. You should see your message appear immediately in the MessageList, followed by the assistant’s response arriving token by token. The view scrolls automatically to keep the latest content visible.
Conclusion
In just a few steps, we built a fully functional, streaming AI chat interface as a modern web application — entirely in Java. We set up a Quarkus project with Vaadin and LangChain4j, created our first Vaadin view, defined a reactive AI service, and wired everything together with server push to deliver a smooth token-by-token chat experience in the browser.
If you want to dive deeper into the technologies covered in this tutorial, here are the official resources:
Vaadin Docs
Quarkus Docs
Quarkus LangChain4j Docs
The complete code for this tutorial is available on GitHub.
Subscribe now



Write Better JavaDoc in Java 23 with Markdown Comments
Markus Eisele — Wed, 15 Apr 2026 06:08:45 GMT
Most teams think JavaDoc is a publishing problem. You write it once, the tool renders it, and that is the end of the story. In real codebases, that mental model breaks fast. The source is what developers actually read in reviews, in IDE hovers, and during production debugging. If the source form is noisy, the documentation is noisy for readers too.
Classic JavaDoc has always had this problem. The rendered HTML can look fine, but the source is full of scaffolding: {@link}, {@code}, 
, 
, and little HTML fragments that mostly carry formatting. That friction has a cost. Developers write less documentation, they keep examples shorter than they should, and they avoid updating comments because touching them is annoying.
Java 23 fixes the practical part of this. JEP 467 introduced Markdown (EDIT: JEP 467 was introduced in Java 23 and not as initially stated in Java 24 )documentation comments with ///, CommonMark support, and Markdown-style links to program elements. Oracle’s JavaDoc guide documents this feature for JDK 23 and later, and Java SE 24 exposes the END_OF_LINE documentation comment kind for /// comments in the compiler model. 
Second, humans are not your only readers anymore. Coding assistants, code search, and internal RAG pipelines read raw source too, not just the generated HTML site. Markdown looks like the rest of the text those systems already know: READMEs, issues, docs, examples, code fences. Cleaner comments help people and tools alike. JEP 467 also calls out the Compiler Tree API, which matters when you build or run source-analysis tooling. 
In this tutorial we build a small Maven library: an in-memory book review registry with only the JDK and JUnit. We skip persistence and HTTP on purpose. We want a small API where JavaDoc matters: package overview, sealed types, records, a service class, exceptions, all written with /// in source you still want to read. Then we run javadoc, use VS Code hovers for quick feedback, and lock behavior with plain unit tests. VS Code’s Java tooling renders Markdown in JavaDoc comments.
Prerequisites
You need a recent JDK, Maven, and a Java editor that understands modern Java. I use VS Code here because the Java tooling renders Markdown JavaDoc in hovers, and because many developers already use it.
Java 23 or newer (the build uses maven.compiler.release 23; JDK 25 works)
Maven 3.9 or newer
VS Code with the Extension Pack for Java
Comfort reading pom.xml and JUnit Jupiter (this tutorial uses JUnit 6; test code still imports org.junit.jupiter.api)
Check the setup:
java -version
mvn -version
code --version
You want a JDK that supports --release 23 (Java 23 or newer). Markdown documentation comments were introduced by JEP 467, and Oracle documents them as available in JDK 23 and later. 
Project Setup
Create a project directory and work inside it. In the-main-thread Github the Maven tree lives under bookreviews/; if you are starting from scratch, create that folder and change into it before the next steps.
mkdir -p bookreviews
cd bookreviews
Create pom.xml in the project root (the directory that contains pom.xml). You can pick any groupId / artifactId; what matters is maven.compiler.release 23 and JUnit Jupiter 6 (junit-jupiter) in test scope.
Create pom.xml:


    4.0.0

    dev.mainthread
    bookreviews
    1.0.0-SNAPSHOT
    jar

    
        23
        UTF-8
        6.0.3
    

    
        
            org.junit.jupiter
            junit-jupiter
            ${junit.version}
            test
        
    

    
        
            
                org.apache.maven.plugins
                maven-compiler-plugin
                3.13.0
            
            
                org.apache.maven.plugins
                maven-surefire-plugin
                3.5.2
            
            
                org.apache.maven.plugins
                maven-javadoc-plugin
                3.11.2
                
                    
                    ${project.build.directory}/site
                
            
        
    

Create the source tree:
mkdir -p src/main/java/dev/mainthread/bookreviews
mkdir -p src/test/java/dev/mainthread/bookreviews
Open the project in VS Code (from inside the project directory):
code .
From the parent of bookreviews, you can run code bookreviews instead.
Implementation
We add six compilation units under src/main/java/dev/mainthread/bookreviews (including package-info.java) and one test class. The logic stays small on purpose so the comments stay easy to see.
Package overview
package-info.java is where you put “read me first” context: what the package is for, how the pieces fit, where to start. Markdown headings fit here. In classic HTML JavaDoc they never felt natural.
Create src/main/java/dev/mainthread/bookreviews/package-info.java:
/// In-memory book review registry for tutorials and demos.
///
/// ## Where to start
///
/// - [BookReviewService] is the entry point for callers.
/// - [BookReview] is the immutable result type returned from the service.
/// - [ReviewSubmission] carries the fields passed to [BookReviewService] when creating a review.
///
/// ## Formats
///
/// [BookFormat] is a sealed hierarchy for optional catalog metadata.
///
/// ## Thread safety
///
/// [BookReviewService] is safe for concurrent use. Individual [BookReview]
/// instances are immutable value objects.
package dev.mainthread.bookreviews;
Bracket links like [BookReviewService] resolve to types in the same package in generated docs, the same way {@link} did—without the noise in source. 
After mvn javadoc:javadoc (see Build and JavaDoc), open target/site/apidocs/dev/mainthread/bookreviews/package-summary.html, or follow the package link from the overview. You should see Markdown headings and lists in the package description, then the class summary table.
Classic JavaDoc next to Markdown
Before we add more files, compare styles on one line: a rating constraint on a method parameter.
Classic style tends toward:
/**
 * Creates a review from user-supplied fields.
 *
 * @param rating score from {@code 1} to {@code 5} (inclusive)
 */
Markdown documentation comments keep the tags the standard doclet still understands, but the body reads like normal text:
/// Creates a review from user-supplied fields.
///
/// @param rating score from `1` to `5` (inclusive)
Same information, less scaffolding. The rest of the tutorial stays in the /// form.
Sealed format types
Sealed types are a simple place for cross-links. The implementors are one closed family, so a short tour in the interface comment helps readers.
Create src/main/java/dev/mainthread/bookreviews/BookFormat.java:
package dev.mainthread.bookreviews;

/// Describes how a title is distributed or held for display.
///
/// This type is closed: only [Paperback] and [Ebook] exist. If you add a new
/// format, update this hierarchy and the package overview in
/// `package-info.java`.
public sealed interface BookFormat permits BookFormat.Paperback, BookFormat.Ebook {

    /// A print copy with rough dimensions for shelving or shipping estimates.
    ///
    /// @param dimensions human-readable size, for example `23 x 15 cm`
    record Paperback(String dimensions) implements BookFormat {
    }

    /// A digital edition identified by a stable download or storefront URL.
    ///
    /// @param uri location of the digital edition
    record Ebook(String uri) implements BookFormat {
    }
}
Stored review record
Create src/main/java/dev/mainthread/bookreviews/BookReview.java:
package dev.mainthread.bookreviews;

/// Represents a stored review for a single book.
///
/// Instances are immutable. The `id` is assigned by [BookReviewService] when a
/// review is created. Optional [BookFormat] metadata is for catalog UIs only;
/// the service does not interpret it beyond storage.
///
/// @param id system-assigned identifier
/// @param isbn ISBN-13 string in the form the caller supplied
/// @param title display title of the reviewed book
/// @param reviewer display name of the reviewer
/// @param rating score from `1` to `5`
/// @param body free-text review content
/// @param format optional [BookFormat], or `null` if unknown
public record BookReview(
        Long id,
        String isbn,
        String title,
        String reviewer,
        int rating,
        String body,
        BookFormat format
) {
}
Input record without a validation framework
Libraries often document preconditions in prose and enforce them with ordinary code. You can also leave validation to callers if you say so in the comment. Here we skip Bean Validation so the tutorial stays about documentation.
Create src/main/java/dev/mainthread/bookreviews/ReviewSubmission.java:
package dev.mainthread.bookreviews;

/// Caller-supplied data used to create a [BookReview].
///
/// ## Preconditions
///
/// The service rejects invalid input with [IllegalArgumentException]:
///
/// - `isbn`, `title`, `reviewer`, and `body` must be non-blank after trimming.
/// - `rating` must be between `1` and `5` inclusive.
/// - `body` length should stay within a reasonable bound for your product; this
///   library uses a soft maximum of `4000` characters after trim.
///
/// ## ISBN
///
/// This type does not parse or checksum ISBNs. Callers should pass normalized
/// strings if their domain requires it.
///
/// @param isbn ISBN-13 or other normalized identifier string
/// @param title non-empty book title
/// @param reviewer non-empty reviewer name
/// @param rating score from `1` to `5`
/// @param body review text
/// @param format optional [BookFormat], may be `null`
public record ReviewSubmission(
        String isbn,
        String title,
        String reviewer,
        int rating,
        String body,
        BookFormat format
) {
}
Domain exception
Create src/main/java/dev/mainthread/bookreviews/ReviewNotFoundException.java:
package dev.mainthread.bookreviews;

/// Thrown when a requested [BookReview] does not exist in the registry.
///
/// Callers that map errors to user-visible messages can rely on
/// [getMessage] for a stable English sentence in this implementation.
public class ReviewNotFoundException extends RuntimeException {

    /// @param id identifier that was not found
    public ReviewNotFoundException(Long id) {
        super("No review found with id " + id);
    }
}
Service implementation
The service class is where longer Markdown comments help. You get headings for thread safety, limits, and a short example, and you never type 
 or  in source.
Create src/main/java/dev/mainthread/bookreviews/BookReviewService.java:
package dev.mainthread.bookreviews;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicLong;

/// In-memory registry of [BookReview] instances.
///
/// Reviews are stored in a `ConcurrentHashMap` and identified by a monotonically
/// increasing `long` ID.
///
/// ## Thread safety
///
/// Read and write operations are safe for concurrent access. The store itself is
/// thread-safe, and the ID sequence is managed with `AtomicLong`.
///
/// ## Limits
///
/// This implementation does **not** persist data. Discarding the service instance
/// clears all reviews. It is suitable for demos, tests, and embedding in larger
/// applications that supply their own persistence.
///
/// ## Example
///
/// ```java
/// var service = new BookReviewService();
/// BookReview created = service.create(new ReviewSubmission(
///         "9780134685991",
///         "Effective Java",
///         "mjava",
///         5,
///         "Essential reading for any Java developer.",
///         new BookFormat.Paperback("23 x 15 cm")
/// ));
/// BookReview found = service.findById(created.id());
/// ```
public class BookReviewService {

    private static final int BODY_MAX_LEN = 4000;

    private final Map store = new ConcurrentHashMap<>();
    private final AtomicLong sequence = new AtomicLong(1);

    /// Creates a new review and assigns a unique ID.
    ///
    /// @param submission validated caller input; see [ReviewSubmission]
    /// @return newly created [BookReview]
    /// @throws IllegalArgumentException if preconditions on [ReviewSubmission] fail
    public BookReview create(ReviewSubmission submission) {
        validateSubmission(submission);
        long id = sequence.getAndIncrement();
        BookReview review = new BookReview(
                id,
                submission.isbn().trim(),
                submission.title().trim(),
                submission.reviewer().trim(),
                submission.rating(),
                submission.body().trim(),
                submission.format()
        );
        store.put(id, review);
        return review;
    }

    /// @param id review identifier
    /// @return matching [BookReview]
    /// @throws ReviewNotFoundException if the ID does not exist
    public BookReview findById(Long id) {
        return Optional.ofNullable(store.get(id))
                .orElseThrow(() -> new ReviewNotFoundException(id));
    }

    /// @return snapshot list of all reviews; not backed by the live store
    public List findAll() {
        return new ArrayList<>(store.values());
    }

    /// @param isbn ISBN string to match exactly against stored reviews
    /// @return matching reviews, possibly empty; does not throw [ReviewNotFoundException]
    public List findByIsbn(String isbn) {
        return store.values().stream()
                .filter(review -> review.isbn().equals(isbn))
                .toList();
    }

    /// @param id review identifier
    /// @throws ReviewNotFoundException if the ID does not exist
    public void delete(Long id) {
        if (store.remove(id) == null) {
            throw new ReviewNotFoundException(id);
        }
    }

    private static void validateSubmission(ReviewSubmission s) {
        if (s.isbn().isBlank() || s.title().isBlank() || s.reviewer().isBlank() || s.body().isBlank()) {
            throw new IllegalArgumentException("isbn, title, reviewer, and body must be non-blank");
        }
        if (s.rating() < 1 || s.rating() > 5) {
            throw new IllegalArgumentException("rating must be between 1 and 5");
        }
        if (s.body().trim().length() > BODY_MAX_LEN) {
            throw new IllegalArgumentException("body exceeds maximum length");
        }
    }
}
Here Markdown comments start to feel like a real language feature. Oracle’s JavaDoc guide documents CommonMark together with normal JavaDoc tags and links to program elements. 
The compiler model in Java 23 treats /// as an end-of-line documentation comment kind, and the standard doclet treats it as Markdown plus JavaDoc tags. (Elements.DocCommentKind)
On the generated BookReviewService page, that same comment turns into subsection headings (“Thread safety”, “Limits”, “Example”) and a fenced Java sample in the HTML. You still write it in source without {@code} or 
.
Build and JavaDoc
The maven-javadoc-plugin configuration already pins release to 23 so the doclet matches the compiler.
Generate HTML:
mvn -q javadoc:javadoc
Open the site: (on macOS):
open target/site/apidocs/index.html
On Windows (PowerShell):
start target/site/apidocs/index.html
The javadoc tool in Java 23 uses the standard doclet, and Oracle documents Markdown documentation comments as a supported feature of that toolchain. (javadoc command reference)
Now use the faster loop in VS Code. Open BookReviewService.java or package-info.java and hover a type or method. You read Markdown in hovers all day. Generated HTML is for when you publish.
To attach documentation to the JAR you publish to a repository, run mvn -q javadoc:jar and ship the -javadoc.jar next to your main artifact. Consumers of your library get the same rendered API in their IDE.
What This Means for AI-Assisted Development
This part is easy to miss when you only look at generated HTML. The change that matters is still in the source file.
Old JavaDoc carries a lot of JavaDoc-only and HTML-only noise. Models can learn that, and many already did, but normal Markdown is still easier to read. A fenced java block looks like code samples everywhere else. A bracket link looks like a normal technical link. Code assistants that scan raw files get simpler text to work with.
JEP 467 also matters for tool builders. It extended support around documentation comments. If you run internal indexing, source analysis, or agent pipelines on Java source, /// comments are easier to treat as plain documentation text. 
Bad documentation stays bad. A vague Markdown comment is still vague. When the formatting tax goes down, teams often write clearer examples, limits, and method contracts anyway. That is the practical win.
Verification
Create src/test/java/dev/mainthread/bookreviews/BookReviewServiceTest.java:
package dev.mainthread.bookreviews;

import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertThrows;

class BookReviewServiceTest {

    @Test
    void createFindAndDeleteRoundTrip() {
        var service = new BookReviewService();
        var submission = new ReviewSubmission(
                "9780134685991",
                "Effective Java",
                "mjava",
                5,
                "Essential reading for any Java developer.",
                new BookFormat.Paperback("23 x 15 cm")
        );

        BookReview created = service.create(submission);
        assertEquals("Effective Java", created.title());
        assertEquals(5, created.rating());

        BookReview found = service.findById(created.id());
        assertEquals(created, found);

        service.delete(created.id());
        assertThrows(ReviewNotFoundException.class, () -> service.findById(created.id()));
    }

    @Test
    void rejectsOutOfRangeRating() {
        var service = new BookReviewService();
        var bad = new ReviewSubmission(
                "9780134685991",
                "Effective Java",
                "mjava",
                9,
                "Essential reading for any Java developer.",
                null
        );
        assertThrows(IllegalArgumentException.class, () -> service.create(bad));
    }

    @Test
    void notFoundIsStable() {
        var service = new BookReviewService();
        var ex = assertThrows(ReviewNotFoundException.class, () -> service.findById(999L));
        assertEquals("No review found with id 999", ex.getMessage());
    }
}
Run tests:
mvn test
Surefire should report three tests in BookReviewServiceTest, for example:
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
If you use mvn -q test instead, Maven runs in quiet mode. When everything passes, it often prints nothing at all. That is normal. You only see output when something fails or when a plugin logs a warning.
Confirm the Javadoc JAR builds (optional but recommended before publish):
mvn -q javadoc:jar
The three tests check the behaviors the /// text promises: happy path, argument validation, and stable ReviewNotFoundException messaging.
Incremental Migration Advice
Do not plan a big-bang rewrite of every JavaDoc block. Keep migration simple. Use /// for all new code. When you touch public APIs, core services, or classes with examples for real work, migrate those comments. Leave old comments alone until you edit them anyway.
Java’s documentation model supports both forms. Oracle’s API docs in Java 23 even note that inherited documentation can cross between Markdown comments and traditional comments, so mixed codebases are normal during migration. 
One practical warning: a /// comment is no longer just a normal line comment in modern JDKs. On declarations, it becomes documentation. That is usually what you want, but it is worth being deliberate when you introduce it across older code.
Conclusion
We built a small library on purpose, not a REST service, so Markdown JavaDoc stays tied to what it improves: package and type docs people read in the IDE, richer examples and structure in source, and HTML from the standard doclet without HTML scaffolding in comments. JavaDoc in the browser looks nicer too, but the main win is the text next to your public API: easier to write, easier to keep aligned with the code, easier for people and tools to read. 
Subscribe now



The Real Problem With AI-Assisted Java Content Is Drift
Markus Eisele — Mon, 13 Apr 2026 06:08:43 GMT
I recently took down one of my Quarkus posts. Not because the whole thing was garbage. Not because every example was broken. It was because a few parts were off in a way that matters a lot in technical writing: they sounded plausible, but they were wrong enough to mislead readers.
That is worse than an obvious mistake.
An obvious mistake gets caught quickly. A plausible mistake gets repeated. Someone reads it before an interview. Someone copies it into a demo. Someone turns it into a team explanation. And suddenly the problem is not one bad sentence anymore. The problem is that bad information now looks like accepted knowledge.
I was called out on that, and rightly so. The feedback was direct. Some of it was uncomfortable. But it was fair.
The main point was not that AI should never be used. The point was that a high-speed, AI-assisted publishing workflow creates a very specific risk. Code can compile. The narrative can sound smooth. The article can feel finished. And still, a couple of details can drift just far enough away from reality to become harmful. In a fast-moving ecosystem like Quarkus, that is not a small issue. That is exactly where trust starts to break.
There is another side to this that also matters to me. The Main Thread was never only meant to be a publishing machine for Java content. It also became my laboratory for figuring out how far I can push AI to teach, and how far I can push myself to work with AI without losing control of the outcome. That experiment is still very much alive. It is also exactly why I need to be honest when it fails.
Another point hit just as hard. Review is expensive. Engineering time is expensive. A review process should not be there to rescue content that was never grounded properly in the first place. It should be there as a final safety layer. Not as cleanup for a workflow that moves too fast.
That landed with me.
So I promised to keep the original post down. And I thought that was the right call. Not because I want to perform some public self-punishment ritual. Just because replacing a flawed article with a better one is more useful than quietly pretending nothing happened.
So rather than trying to rescue the old article, I decided to write this one.
It is not another list post. It is not a patched version of interview questions. It is the thing underneath the problem: how I think about keeping AI-infused IDEs, coding assistants, and agent workflows grounded enough that they stay useful without drifting into confident nonsense.
Because that, for me, is the real issue now. AI is not going away. IDEs with copilots, agents, MCP servers, retrieval layers, and doc-aware tooling are not going away. The question is not whether we use them. The question is whether we build workflows around them that hold up under technical scrutiny.
The Real Problem Is Not “AI Slop.” It Is Drift.
“AI slop” is a catchy phrase, but I do not think it is precise enough.
The real problem is drift.
A model starts with a roughly correct understanding. Then it fills in one missing detail from training data and statistics. Then another. It picks a term that used to be right. It explains a pattern that technically works but is not the right Quarkus way to do it. It mixes old and new vocabulary. It invents a connection between two things that sound related. None of these mistakes are dramatic on their own. But together they produce content that feels sharp and complete while quietly losing contact with the source material.
That is the dangerous part.
And this is exactly why “the code compiles” is not enough. I learned that one the hard way. A generated example can compile and still teach the wrong habit. It can compile and still overcomplicate the solution. It can compile and still present a pattern that no experienced Quarkus engineer would recommend. Technical correctness is more than syntactic success.
There is also a second kind of drift that gets less attention. Tone drift.
When I rely too much on model-first drafting, the writing starts to flatten. Every sentence gets punchy. Every paragraph sounds polished in the same way. The article reads like it was assembled by a machine trained on five thousand “developer content” headlines and then sprayed with confidence. Even when the facts are right, that tone damages trust. Readers can feel it.
So when I say I want to keep AI grounded, I mean both things. I want the facts grounded in current sources and runnable reality. And I want the writing grounded in a voice that still sounds like me.
What Grounding Means for Me
Grounding, for me, is simple in principle.
The model does not get to answer from vibes.
I do not trust a general-purpose model to “just know” Quarkus. Not for version-sensitive details. Not for renamed extensions. Not for testing changes. Not for migration nuances. Not for what is technically possible versus what is idiomatic. That is where drift shows up first.
So I try to force the workflow away from memory and back toward sources.
The first layer is current documentation. If I write about Quarkus, I want the model to work from current guides, migration notes, release material, and actual code. Not from stale training memory. That sounds obvious, but it changes the whole character of the output. The model stops behaving like an oracle and starts behaving more like an assistant reading over your shoulder.
The second layer is targeted retrieval. I do not want broad prompts like “tell me about Quarkus testing.” I want narrower, version-aware context. Show me the current guide. Show me what changed. Show me the names that are valid now. Show me the artifact or config that matches the current platform line. Broad prompts invite generic answers. Narrow prompts force contact with specifics.
The third layer is contradiction hunting. This is one of the least glamorous parts of the process, but it matters a lot. I look for stale tokens. Old names. Old guide references. Old vocabulary. Old explanations that used to be true in one release line and are not true anymore. This is where a lot of plausible nonsense hides. Not in wild hallucinations. In leftovers.
The fourth layer is runnable code. I want code that builds, starts, and behaves the way the article says it behaves. I want the failure path to be real. I want the endpoint response to be real. I want the config to do something visible. If I make a claim, I want some proof behind it. That does not mean every article becomes a giant test suite. But it does mean that “looks right” is not enough.
The fifth layer is human judgment. I still use AI heavily. I am not moving backward on that. But there is a big difference between using AI to accelerate exploration and letting AI define technical truth. The model can help me think faster, compare options, rewrite, structure, and pressure-test. It should not be the final source of authority on framework behavior.
That distinction matters more and more.
How I Actually Use AI in the Workflow
My workflow is not “generate article, publish article.”
That would be irresponsible, and it would also not produce the kind of content I want to put my name on.
I use AI at several stages, but not for the same job in every stage.
I use it to help me explore a topic faster. I use it to challenge assumptions. I use it to find likely weak points. I use it to shape structure when the material is messy. I use it to rewrite drafts into something that has a cleaner arc. I use it to pressure-test whether an explanation makes sense to someone who was not already in my head.
But the closer I get to the final text, the less I want “free generation” and the more I want constrained generation. I want source-linked docs. I want current framework material. I want rules for tone and structure. I want code I can verify. I want a process that reduces randomness.
That is the part many AI debates skip over. The tool is not one thing. “Using AI” can mean lazy autopilot. It can also mean a carefully constrained system where the model is only one part of a larger workflow. Those are not remotely the same.
And this is where grounding tools start to matter.
The Tooling Part: Why Context Matters More Than Cleverness
One of the biggest mistakes with AI IDEs is expecting the model to carry too much of the truth inside itself.
That works for generic coding tasks. It breaks down fast for active frameworks, product lines, release-specific guidance, and fast-moving ecosystems. Quarkus changes. Tooling changes. Names change. Recommended approaches change. A model that only answers from memory will always lag behind that reality.
So I use context injection and documentation retrieval wherever I can.
That includes working with documentation-oriented tooling that can pull current, source-linked material into the prompt instead of leaving the model alone with its own memory. It also includes using MCP-based doc access so the assistant can retrieve the right project material at the moment I ask the question. This is not glamorous, but it is a huge part of what makes the output less fragile.
I also think there is a broader lesson here for framework teams. If we want AI tools to produce better outcomes, we need better ways to expose authoritative project knowledge to them. Not by replacing documentation, but by making documentation easier for these systems to consume correctly. Good docs are still the foundation. Better retrieval just gives them a stronger path into the workflow.
The Skills Layer I Use
Grounding is one part of the story. The other part is skills.
In my setup, skills are curated playbooks. They are not code, and they are not some magic hidden training layer. They are explicit instructions for how certain kinds of work should be done. They help reduce one of the biggest practical problems in AI-assisted writing and coding: inconsistency.
Without that layer, every draft starts from scratch in the worst possible way. One day the model writes a clean technical walkthrough. The next day it over-explains. Another day it changes tone halfway through. Another day it forgets the article structure, skips verification, or slips into that too-polished “developer content” voice that nobody really trusts.
Skills give me a way to tighten that up.
For writing, I mainly rely on three kinds of guardrails. One defines the structure of a proper Main Thread tutorial or article. One keeps the voice closer to how I actually speak and write. One acts as a stricter review pass that checks whether the result is technically solid, teachable, and ready to ship.
That combination helps a lot. Structure keeps the article stable. Voice keeps it human. Review keeps it honest.
Beyond that, I am also testing work from the Quarkus project around technical guardrails expressed as skills. I think that is one of the more promising directions in this space. Not because skills replace expertise. They do not. But because they can encode project-specific expectations in a form that an assistant can actually follow. That means fewer random detours, fewer invented patterns, and a better chance that the output reflects how the framework really wants to be used.
That part is still evolving, and I am learning along the way. But I like the direction very much. It moves the workflow away from “trust the model” and closer to “constrain the model with the project’s own rules.”
And that is exactly where I want to be.
What Changed for Me After 400+ Posts
Publishing at a high cadence taught me a lot. Some of it was good. Some of it was painful.
The good part is obvious. I learned faster. I explored more topics. I found patterns in what readers care about. I got better at turning technical material into readable stories. I also got a very practical education in what these tools are good at and where they fail.
And this is where The Main Thread ended up meaning more to me than “just” a publication. It became a working lab. A place where I could test not only what AI can produce, but how AI changes the way I research, structure, verify, explain, and ship technical content. That is also why I am speaking about this process publicly in my JCON Europe 2026 session, “Chasing the Main Thread - Adventures in AI Assisted Coding.” 
The painful part is also clear now. Speed hides weaknesses until it does not. A workflow can feel productive for months and still contain a flaw that only becomes fully visible when trust is on the line. In my case, that flaw was not “too much AI” in some abstract moral sense. It was not enough hard grounding around the parts that matter most: current facts, framework idioms, and final accountability.
That is why I do not think the answer is to stop using AI. I think the answer is to stop pretending that generation alone is a workflow.
Generation is one step. Grounding, retrieval, contradiction checks, runnable code, editorial constraint, and final accountability are the workflow.
That is the difference.
What I Am Trying to Do Now
I still believe these tools matter. I still believe learning them aggressively is the right move. I still think the future of technical work involves more agentic tooling, more IDE assistance, more retrieval, and more model-driven exploration.
But I also think there is a responsibility that comes with publishing technical content in public, especially around a project like Quarkus where people use articles as a shortcut to understanding.
I do not want to create review debt for busy engineers. I do not want to publish things that sound official just because they circulate widely. I do not want to produce content that looks polished while eroding trust underneath. And I definitely do not want to feed wrong explanations back into the broader machine that will repeat them again later.
So the goal now is not just more output. The goal is better constraints.
Better sources. Better retrieval. Better guardrails. Better code verification. Better editorial discipline. Better use of AI where it helps, and less trust where it does not deserve trust.
Conclusion
I took down a Quarkus article because a few answers were wrong. That was the immediate reason. The deeper reason is that it exposed something more useful: if I want AI-infused IDEs and writing workflows to be worth anything, they need tighter contact with reality than “looks plausible.” For me, that means current docs, targeted retrieval, contradiction checks, runnable code, explicit skills, and a workflow where the model helps, but does not get to decide what is true.
That is the version of this experiment I want to keep doing: less faith in generation, more discipline around grounding, and a clearer understanding that The Main Thread is both a publication and a laboratory. Thank you for joining me on this experiment. And thank you for your feedback.
Subscribe now



Why Enterprise Java Teams Need Boundaries for AI Agents
Markus Eisele — Sun, 12 Apr 2026 06:08:38 GMT
AI coding tools have moved far past autocomplete. They read large codebases, propose architecture changes, edit files, run shell commands, call APIs, and increasingly act like junior engineers with terminal access. That changes the security conversation completely.
For years, most application security work assumed a simple model. Developers wrote code. Pipelines validated it. Production systems enforced runtime controls. Even when developers made mistakes, the path from mistake to incident usually had friction in it. A pull request needed review. A deployment needed approval. A shell command needed a human to type it.
Agentic tooling removes a lot of that friction. That is the point. It speeds up work. But it also compresses the distance between suggestion and action. When an AI agent can read your repository, inspect environment files, hit internal endpoints, modify source code, and run commands without pause, you are no longer dealing with a code assistant. You are dealing with a probabilistic actor inside your delivery system.
That is where many teams still use the wrong mental model. They think the main risk is bad generated code. It is not. Bad code is the old problem. The new problem is operational autonomy. The danger starts when the model can do things, not just suggest things.
For Java teams in regulated or enterprise-heavy environments, this matters more than it does for hobby projects. Your systems usually sit next to customer data, internal APIs, CI/CD pipelines, cloud credentials, and a lot of old infrastructure that still works but breaks in ugly ways when touched carelessly. If you plug an autonomous coding agent into that world, containment stops being a nice security add-on. It becomes the architecture.
The real problem is not intelligence. It is agency.
Most of the current hype talks about how smart these agents are becoming. That is interesting, but it is not the main design issue. The real issue is agency. What can the agent do on its own? What can it read? What can it write? What can it execute? What can it reach over the network? What happens when it misreads instructions or ingests malicious context?
This is why the OWASP “excessive agency” idea is so useful. It describes the exact failure mode many teams are walking into. They start with a tool that helps write tests or explain code. Then they add file editing. Then shell access. Then GitHub integration. Then MCP servers. Then deployment hooks. One small convenience at a time, the agent moves from assistant to operator.
And once that happens, prompt injection becomes much more serious. In a chat window, a poisoned README or a malicious issue comment is annoying. In an agent workflow, it can turn into command execution, secret exfiltration, or remote system access. The agent does not need to be “hacked” in the traditional sense. It only needs to be convinced.
That is what makes this different from normal software security. The control plane is language. The exploit path is often context. The toolset is already built into the system.
Why “YOLO mode” is not a feature
A lot of developers understand this in theory and still end up in the same place in practice: full auto-approval.
The reason is obvious. Interruptions are annoying. Approval prompts slow down flow. If your company is pushing AI adoption hard, the pressure to remove friction gets even stronger. Teams start treating safety prompts as UI noise. They want the tool to just do the work.
That is where “YOLO mode” shows up. Different products call it different things, but the idea is the same: let the agent read, write, execute, and call tools without stopping for human confirmation.
This is where security falls apart fast.
The problem with full auto-approval is not only that destructive things can happen. It is that destructive things happen at machine speed. If the agent decides to run an unsafe command, touch production-facing configuration, or send secrets to an external endpoint, the time between bad reasoning and bad outcome can be seconds or less. Human intuition never enters the loop.
For enterprise Java teams, the risk is even more concrete. A coding agent sitting in a Quarkus or Spring codebase can easily see deployment descriptors, Kubernetes manifests, CI workflows, local .env files, test credentials, internal URLs, and database settings. If it is allowed to act on all of that autonomously, you have collapsed a lot of security boundaries into one prompt window.
That is not “developer productivity with guardrails.” That is just privileged automation with a language model in the middle.
The whitelist trap
Some teams try to be more careful. They do not enable full autonomy. They create a hybrid model where “safe” operations are auto-approved and dangerous ones still need manual confirmation.
That sounds reasonable. In practice, it often creates a false sense of safety.
The classic mistake is whitelisting tools instead of validating intent. A team says, “Running docker is fine” or “Using podman is fine” or “This sandbox wrapper is safe.” But the executable alone is not the security boundary. The arguments matter. Context matters. Mounted volumes matter. Network flags matter.
A container runtime can isolate work. It can also expose the host. A shell command can compile code. It can also delete a workspace, leak secrets, or rewrite build configuration. An MCP tool can search documentation. It can also mutate remote systems if you auto-approve the wrong capability.
This is why simplistic whitelisting is not enough. A privileged tool plus malicious arguments is still a privileged action. Senior engineers know this already from other systems. The same command that helps you debug a pod can also destroy a cluster if pointed at the wrong target. Agent workflows do not change that truth. They just hide it behind natural language.
The only sane model is containment
Once you accept that agents will misread context, hallucinate, or eventually ingest malicious input, the design goal changes. You stop trying to make the model perfectly safe. You focus on blast radius.
That means containment.
The first layer is execution isolation. Agents should not operate directly on the host with broad local access. They need sandboxes, ephemeral containers, or tightly scoped environments that can be destroyed and rebuilt easily. If the model does something stupid, the damage stays inside a disposable boundary.
The second layer is network control. A lot of agent exploits end in exfiltration. If the runtime can call arbitrary external endpoints, a compromised prompt can turn into outbound data leakage very quickly. Egress should be narrow, explicit, and logged. Default deny is the right mindset here.
The third layer is secret handling. Local plaintext secrets and autonomous agents do not belong together. If your workflow still depends on .env files full of long-lived credentials, the agent does not even need to be malicious to create a problem. It only needs to summarize the wrong file, paste the wrong snippet, or include the wrong detail in generated code. Short-lived credentials and external secret managers are not optional in this model.
The fourth layer is approval design. High-impact actions must stay behind human confirmation. Not because humans are perfect, but because humans at least understand business context, timing, and consequences. The model does not.
MCP is where the stakes jump again
The next big boundary problem is MCP.
MCP is useful because it turns the agent into a real participant in the toolchain. It can talk to documentation systems, issue trackers, orchestration platforms, internal APIs, and whatever else you expose through a server. That is also exactly why it becomes dangerous.
Every MCP server is a trust decision. Every connected tool expands the action surface. Every “always allow” setting chips away at your approval boundary.
For Java teams, this is familiar territory in a different form. We already know that integrations are where simple systems become enterprise systems. The same service that looks clean in a demo gets complicated fast once it talks to identity providers, ticketing systems, cloud control planes, and internal governance tools. MCP does the same thing for agents. It makes them more useful, and more dangerous, at the same time.
The worst pattern is direct trust plus static credentials. If the agent can call a remote MCP server with persistent tokens and broad permissions, you have effectively created an unattended service account controlled by probabilistic reasoning. That is a bad design, even if the prompt layer looks polished.
A better pattern is a gateway model with short-lived credentials, centralized policy checks, and on-behalf-of identity flow. In plain English: the agent should never be more powerful than the person using it. If Markus does not have permission to trigger a production action, the agent acting for Markus should not have it either. That sounds obvious, but many current integrations still fail that basic rule.
Prompts, modes, and tool configs are now code
Another shift many teams still underestimate: prompts and agent configuration now belong inside your engineering governance model.
If a custom mode changes what the agent is allowed to do, that mode is not just UX. It is policy. If a prompt changes how an agent handles secrets, external content, or approvals, that prompt is not just copy. It is executable behavior in the broad sense. If an MCP config enables auto-approval for a write-capable tool, that JSON file is part of your risk model.
Senior Java teams already know how to govern code. Review it. Version it. Test it. Track who changed what and why. The same mindset needs to apply here.
Treat prompts, rules, and integration definitions like first-class artifacts. Put them in source control. Review them. Change them intentionally. Audit them when incidents happen. This is not optional anymore.
What this means for Java teams right now
The practical takeaway is simple.
Do not evaluate coding agents only on code quality. Evaluate them on containment quality.
Ask different questions. What happens when the agent reads poisoned content? What can it execute without approval? What files can it see by default? Can it reach the public internet freely? Are credentials short-lived? Are tool invocations logged? Can you roll back generated changes quickly? Does the tool respect user identity, or does it operate with its own standing privileges?
These are architecture questions. They belong in the same room as platform engineering, security, and developer productivity. This is not a frontend toggle in an IDE settings page.
I think this is the real maturity test for AI-assisted development in the enterprise. The winners will not be the teams that gave the model the most freedom. They will be the teams that gave it enough freedom to be useful and enough boundaries to fail safely.
Conclusion
AI coding agents are becoming part of the delivery stack. That part is already happening. The open question is whether we treat them like clever autocomplete or like privileged runtime actors. For enterprise Java teams, the answer needs to be the second one. Once an agent can read, write, execute, and integrate, the security model changes. The job is no longer to trust the model. The job is to contain it.
Subscribe now


Lock Down `PanacheEntityResource` Without Throwing Away Codegen
Markus Eisele — Sat, 11 Apr 2026 06:08:36 GMT
Generated CRUD endpoints are great until you need real security. In the early demo phase, PanacheEntityResource is a nice shortcut. You define an entity, expose one interface, and Quarkus generates the REST layer for you. The problem starts when your API stops being a demo and turns into something that different users should access in different ways.
Most developers fix that by giving up the generated endpoint and writing a JAX-RS resource by hand. They add @RolesAllowed, copy the CRUD methods, and slowly rebuild what the framework already gave them. The generated endpoint still saves some time in the first hour, but after that you are back in boilerplate land. The convenience stops when production requirements kick in.
This matters because security is not a decoration you add later. Once your service handles real data, “everyone can call every generated method” is a production bug. A read-only user should not delete records. A service account that can ingest data should not automatically be able to rewrite historical state. If you do not draw those boundaries clearly, your API breaks at the authorization layer. The persistence layer cannot replace explicit access rules.
Quarkus 3.31 fixed a missing piece here: you can put @PermissionsAllowed on the REST Data Panache interface methods that Quarkus generates. The security rule lives where the operation is declared. You keep the code generation and you still get fine-grained access control per operation.
In this tutorial, we’ll build a small SwiftShip-style service with a generated Shipment endpoint. We’ll secure read operations with shipment:read and write operations with shipment:admin, using Keycloak client scopes and the token endpoint scope parameter so Quarkus OIDC maps granted scopes to @PermissionsAllowed. We’ll use Quarkus Dev Services to start PostgreSQL and Keycloak for us, and we’ll verify the behavior with real tokens and curl calls. By the end, you’ll have an end-to-end example that works locally and shows exactly where the permission boundary lives.
Prerequisites
You do not need a large setup for this tutorial, but you do need the usual Quarkus local development tools. We assume you are comfortable reading REST endpoints, editing application.properties, and testing secured APIs with bearer tokens.
Java 21 installed
Quarkus CLI installed
Podman installed
jq installed for token extraction in shell commands
Basic understanding of REST and OpenID Connect (OIDC)
Project Setup
Let’s create the project:
quarkus create app dev.myfear.swiftship:permissions-demo \
  --extension='hibernate-orm-rest-data-panache,rest-jackson,jdbc-postgresql,oidc,smallrye-openapi' \
  --no-code
cd permissions-demo
What these extensions do:
hibernate-orm-rest-data-panache — wires Hibernate ORM Panache with REST Data Panache (pulls in rest-data-panache and hibernate-orm-panache) and generates CRUD endpoints from your resource interface
rest-jackson — registers the Quarkus REST (JAX-RS) stack with Jackson; without a REST extension, the generated resource would not be mounted and you would see 404 on /shipment
jdbc-postgresql — gives us PostgreSQL connectivity, and Dev Services support
oidc — integrates the application with Keycloak for bearer token authentication
smallrye-openapi — exposes the generated endpoints in OpenAPI, which is useful for verification
If you use Maven, add io.rest-assured:rest-assured (scope test) next to quarkus-junit for the optional test at the end. The Quarkus BOM (bill of materials) manages the RestAssured version when you omit an explicit version on that dependency.
Now create the package structure:
mkdir -p src/main/java/dev/myfear/swiftship
mkdir -p src/main/resources
mkdir -p src/test/java/dev/myfear/swiftship
Implementing the Shipment entity
We start with the entity because REST Data Panache generates the endpoint from the data model. Keep it simple. We do not need relationships, validation groups, or DTO mapping here. We want the security behavior to stay visible.
Create src/main/java/dev/myfear/swiftship/Shipment.java:
package dev.myfear.swiftship;

import io.quarkus.hibernate.orm.panache.PanacheEntity;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;
import jakarta.persistence.Table;

@Entity
@Table(name = "shipment")
public class Shipment extends PanacheEntity {

    @Column(name = "tracking_number")
    public String trackingNumber;

    @Column(name = "destination")
    public String destination;

    @Column(name = "status")
    public String status;
}
This class gives us exactly what we need. PanacheEntity provides the generated numeric id, and the three fields are enough to test list, get, create, update, and delete operations. We map the table to shipment and columns to snake_case so import.sql matches PostgreSQL reliably; JSON responses still use the Java property names (trackingNumber, and so on).
The important limit here is also obvious: this entity does not protect anything by itself. It defines persistence shape. Authorization is a separate concern. If an endpoint exists for this entity and no security rule blocks access, the entity will happily be returned, inserted, updated, or deleted. That is why the next step matters.
Implementing the generated resource with permissions
Here’s the core of the tutorial. We declare the generated CRUD contract and put permission annotations on the methods we want to secure. There is still no implementation class. Quarkus generates the JAX-RS endpoint during the build.
Create src/main/java/dev/myfear/swiftship/ShipmentResource.java:
package dev.myfear.swiftship;

import java.util.List;

import io.quarkus.hibernate.orm.rest.data.panache.PanacheEntityResource;
import io.quarkus.panache.common.Page;
import io.quarkus.panache.common.Sort;
import io.quarkus.security.PermissionsAllowed;

public interface ShipmentResource extends PanacheEntityResource {

    @Override
    @PermissionsAllowed("shipment:read")
    List list(Page page, Sort sort);

    @Override
    @PermissionsAllowed("shipment:read")
    long count();

    @Override
    @PermissionsAllowed("shipment:read")
    Shipment get(Long id);

    @Override
    @PermissionsAllowed("shipment:admin")
    Shipment add(Shipment shipment);

    @Override
    @PermissionsAllowed("shipment:admin")
    Shipment update(Long id, Shipment shipment);

    @Override
    @PermissionsAllowed("shipment:admin")
    boolean delete(Long id);
}
This is the whole trick. PanacheEntityResource extends RestDataResource; the default method signatures use List, Page, and Sort for listing, return Shipment from add, boolean from delete, and expose count() as a separate read operation. We redeclare those methods as abstract overrides only to add annotations. We do not write a resource class, and we do not call the database ourselves. Quarkus reads this interface at build time and generates the REST endpoint with the security checks attached. Use the io.quarkus.hibernate.orm.rest.data.panache.PanacheEntityResource import for the Hibernate ORM variant.
What does this guarantee? It guarantees that generated read operations (including list, get, and count) require shipment:read, and generated write operations require shipment:admin, once the identity carries those permissions—for this tutorial, via OIDC access-token scopes. It does not add record-level rules by itself. If a caller presents a token with shipment:read, they can read every shipment exposed by these methods. This is operation-level authorization. Tenant isolation needs extra work in your queries.
Here @PermissionsAllowed fits better than @RolesAllowed. A permission like shipment:admin is an application capability. A role like shipment-admin is one identity-provider-specific assignment. The permission stays stable in your code. The role mapping can change in configuration.
Configuring Quarkus, PostgreSQL, and OpenID Connect (OIDC)
Now we wire the application to PostgreSQL and Keycloak. In dev mode, Quarkus Dev Services starts both containers for us. We only need to describe the application behavior.
Create src/main/resources/application.properties:
quarkus.datasource.db-kind=postgresql
quarkus.hibernate-orm.schema-management.strategy=drop-and-create

quarkus.oidc.application-type=service
quarkus.oidc.client-id=swiftship
quarkus.oidc.credentials.secret=secret

quarkus.keycloak.devservices.realm-path=quarkus-realm.json
# Default Dev Services use a random host port; pin 8180 so manual curl examples match startup.
quarkus.keycloak.devservices.port=8180

# Keycloak container clock and host JVM can drift; without leeway, short-lived tokens fail exp
# validation and every Bearer call returns 401 even with a freshly obtained access token.
%test.quarkus.oidc.token.lifespan-grace=600
These are intentionally small settings. quarkus.keycloak.devservices.port=8180 fixes the Keycloak container to a known host port. If you omit it, Quarkus picks a random mapped port (Dev Services for OIDC); the curl snippets below assume 8180, so without this property your token request hits the wrong port. quarkus.hibernate-orm.schema-management.strategy=drop-and-create replaces the older quarkus.hibernate-orm.database.generation property (deprecated from Quarkus 3.23 onward). drop-and-create is fine for local development because it makes the tutorial repeatable. It is not a production choice. In production, this would destroy state on every restart. There you would use schema migration with Flyway or Liquibase.
The OpenID Connect settings tell Quarkus to validate bearer tokens for a service-style API. We are not building a browser login flow here. We want access tokens you get from Keycloak, sent in the Authorization header.
OAuth2 scopes in the access token and @PermissionsAllowed
@PermissionsAllowed checks java.security.Permission instances on the SecurityIdentity (by default StringPermission), not only JWT roles.
For OIDC bearer tokens, Quarkus maps the access token’s scope claim into those permissions: each space-separated scope string becomes a permission. As with role-based examples elsewhere in the docs, a value that contains a colon is parsed into permission name and action. So scope shipment:read matches @PermissionsAllowed("shipment:read"), and shipment:admin matches the admin operations.
In this tutorial, Keycloak client scopes supply those strings. We attach shipment:read and shipment:admin as optional client scopes on the swiftship client. At the token endpoint, the client passes the desired scopes in the scope form field (space-separated). Keycloak returns an access token whose scope lists what was granted; Quarkus turns that into permissions—no SecurityIdentityAugmentor and no realm roles on the users.
Minimal realm caveat: the JSON below only defines our two custom client scopes. It does not include Keycloak’s built-in openid, profile, and email scope definitions. Requesting those together with shipment:read would yield invalid_scope until you add the usual OIDC client scopes to the realm. For this walkthrough, request only application scopes—for example scope=shipment:read for Alice, or scope=shipment:read shipment:admin for Bob.
The client sets fullScopeAllowed: false so only assigned scopes are valid; optional scopes must be requested explicitly.
Configuring the Keycloak realm
We need two users (passwords only—no realm roles for capabilities). Whether Alice or Bob can read or admin shipments is determined by which scopes the token request asks for. Dev Services imports this realm automatically.
Create src/main/resources/quarkus-realm.json:
{
  "realm": "quarkus",
  "enabled": true,
  "clientScopes": [
    {
      "name": "shipment:read",
      "protocol": "openid-connect",
      "attributes": {
        "include.in.token.scope": "true",
        "display.on.consent.screen": "false"
      }
    },
    {
      "name": "shipment:admin",
      "protocol": "openid-connect",
      "attributes": {
        "include.in.token.scope": "true",
        "display.on.consent.screen": "false"
      }
    }
  ],
  "clients": [
    {
      "clientId": "swiftship",
      "enabled": true,
      "publicClient": false,
      "secret": "secret",
      "directAccessGrantsEnabled": true,
      "standardFlowEnabled": true,
      "serviceAccountsEnabled": false,
      "fullScopeAllowed": false,
      "optionalClientScopes": [
        "shipment:read",
        "shipment:admin"
      ]
    }
  ],
  "users": [
    {
      "username": "alice",
      "enabled": true,
      "emailVerified": true,
      "credentials": [
        { "type": "password", "value": "alice" }
      ]
    },
    {
      "username": "bob",
      "enabled": true,
      "emailVerified": true,
      "credentials": [
        { "type": "password", "value": "bob" }
      ]
    }
  ]
}
Alice and Bob are equivalent in Keycloak; curl (or your client) chooses scope= per call. That keeps the demo focused on @PermissionsAllowed and token scopes. In production you would tie scope issuance to user attributes, client policies, or authorization services—not ad-hoc scope strings from the client unless you trust that caller.
Loading test data
A CRUD API without data does not tell us much. We seed two rows on startup so the read endpoints have something to return immediately.
Create src/main/resources/import.sql:
INSERT INTO shipment (id, tracking_number, destination, status) VALUES (1, 'SWS-001', 'Berlin', 'IN_TRANSIT');
INSERT INTO shipment (id, tracking_number, destination, status) VALUES (2, 'SWS-002', 'Amsterdam', 'DELIVERED');
This script matches the shipment table and @Column names from the entity. It works with our drop-and-create dev setup. On each restart, the schema is recreated and the same two shipments appear again.
You get deterministic verification. Safe production seeding is a different problem. import.sql is useful for tests, demos, and tutorials. It is not how you manage production reference data.
Starting the application
Start the application in dev mode:
quarkus dev
If you generated the project with Maven and the wrapper, the same thing is:
./mvnw quarkus:dev
Quarkus now starts the application, a PostgreSQL container, and a Keycloak container. Wait until startup finishes and then open the Dev UI if you want to inspect the running services: Dev UI
You can also inspect the OpenAPI document to confirm the generated shipment endpoint exists: OpenAPI
Verification
Let’s prove the behavior with real requests.
Get a token for Alice
With quarkus dev already running (and Keycloak reachable on 8180 when quarkus.keycloak.devservices.port is set as below), open a second terminal and request a token:
export ALICE_TOKEN=$(curl -s -X POST \
  http://localhost:8180/realms/quarkus/protocol/openid-connect/token \
  -d "client_id=swiftship" \
  -d "client_secret=secret" \
  -d "username=alice" \
  -d "password=alice" \
  -d "grant_type=password" \
  -d "scope=shipment:read" | jq -r '.access_token')
To confirm the token exists:
echo $ALICE_TOKEN | cut -c1-40
You should see the first part of a JSON Web Token (JWT). If that line is blank or shows null, the token call failed: run the same curl without -s (or add -S) so errors are visible, confirm jq is installed, and confirm Keycloak is really on 8180 (startup log, Dev UI, or the quarkus.oidc.auth-server-url value Quarkus printed). If you removed quarkus.keycloak.devservices.port, replace 8180 in the URL with whatever host port Dev Services mapped for Keycloak.
Alice can list shipments
Call the generated list endpoint:
curl -s \
  -H "Authorization: Bearer $ALICE_TOKEN" \
  http://localhost:8080/shipment | jq .
Expected output:
[
  {
    "id": 1,
    "trackingNumber": "SWS-001",
    "destination": "Berlin",
    "status": "IN_TRANSIT"
  },
  {
    "id": 2,
    "trackingNumber": "SWS-002",
    "destination": "Amsterdam",
    "status": "DELIVERED"
  }
]
This verifies that @PermissionsAllowed("shipment:read") on the list operation is enforced and satisfied for Alice.
Alice can get one shipment
curl -s \
  -H "Authorization: Bearer $ALICE_TOKEN" \
  http://localhost:8080/shipment/1 | jq .
Expected output:
{
  "id": 1,
  "trackingNumber": "SWS-001",
  "destination": "Berlin",
  "status": "IN_TRANSIT"
}
Alice cannot delete
curl -i -X DELETE \
  -H "Authorization: Bearer $ALICE_TOKEN" \
  http://localhost:8080/shipment/1
Expected output:
HTTP/1.1 403 Forbidden
This is the critical check. Alice is authenticated, so this is not a 401. She is blocked because she lacks shipment:admin, so the correct response is 403 Forbidden.
Get a token for Bob
Now request a token for the admin user:
export BOB_TOKEN=$(curl -s -X POST \
  http://localhost:8180/realms/quarkus/protocol/openid-connect/token \
  -d "client_id=swiftship" \
  -d "client_secret=secret" \
  -d "username=bob" \
  -d "password=bob" \
  -d "grant_type=password" \
  -d "scope=shipment:read shipment:admin" | jq -r '.access_token')
Bob can delete
curl -i -X DELETE \
  -H "Authorization: Bearer $BOB_TOKEN" \
  http://localhost:8080/shipment/1
Expected output:
HTTP/1.1 204 No Content
Now list the shipments again:
curl -s \
  -H "Authorization: Bearer $BOB_TOKEN" \
  http://localhost:8080/shipment | jq .
Expected output:
[
  {
    "id": 2,
    "trackingNumber": "SWS-002",
    "destination": "Amsterdam",
    "status": "DELIVERED"
  }
]
That confirms the delete operation really ran. Shipment 1 is gone from the list.
Unauthenticated requests fail
Finally, call the endpoint without a token:
curl -i http://localhost:8080/shipment
Expected output:
HTTP/1.1 401 Unauthorized
That verifies the full security flow. No token means authentication fails before the permission layer is even evaluated.
Optional integration test
Manual curl verification is useful. You still want automated checks in the codebase. The tests below assert anonymous access is rejected, a reader can load GET /shipment/1, and the same reader cannot delete.
Create src/test/java/dev/myfear/swiftship/ShipmentSecurityTest.java:
package dev.myfear.swiftship;

import static io.restassured.RestAssured.given;
import static org.hamcrest.Matchers.equalTo;

import org.eclipse.microprofile.config.inject.ConfigProperty;
import org.junit.jupiter.api.Test;

import io.quarkus.test.junit.QuarkusTest;
import io.restassured.http.ContentType;

@QuarkusTest
class ShipmentSecurityTest {

    @ConfigProperty(name = "quarkus.oidc.auth-server-url")
    String authServerUrl;

    @Test
    void anonymousUserCannotListShipments() {
        given()
            .when().get("/shipment")
            .then()
            .statusCode(401);
    }

    @Test
    void readerCanGetShipmentById() {
        String token = accessToken("alice", "alice", "shipment:read");

        given()
            .header("Authorization", "Bearer " + token)
            .when().get("/shipment/1")
            .then()
            .statusCode(200)
            .body("id", equalTo(1))
            .body("trackingNumber", equalTo("SWS-001"))
            .body("destination", equalTo("Berlin"))
            .body("status", equalTo("IN_TRANSIT"));
    }

    @Test
    void readerCannotDeleteShipment() {
        String token = accessToken("alice", "alice", "shipment:read");

        given()
            .header("Authorization", "Bearer " + token)
            .when().delete("/shipment/1")
            .then()
            .statusCode(403);
    }

    private String accessToken(String username, String password, String scope) {
        String tokenUrl = authServerUrl + "/protocol/openid-connect/token";
        return given()
            .contentType(ContentType.URLENC)
            .formParam("client_id", "swiftship")
            .formParam("client_secret", "secret")
            .formParam("username", username)
            .formParam("password", password)
            .formParam("grant_type", "password")
            .formParam("scope", scope)
            .when()
            .post(tokenUrl)
            .then()
            .statusCode(200)
            .extract()
            .path("access_token");
    }
}
The positive read test depends on requesting shipment:read in scope. If you omit scope or request scopes the client is not allowed to use, you get invalid_scope at the token endpoint, or 403 on the API when the token lacks the right permissions.
This suite does not cover every path, but it pins the two boundaries that matter for this API: readers can read, and readers cannot delete.
In @QuarkusTest, Keycloak Dev Services often uses a random host port unless you set quarkus.keycloak.devservices.port. The test code builds the token URL from quarkus.oidc.auth-server-url, so it stays correct. The application.properties in this tutorial pins 8180 for dev so the curl examples match; for tests you can add e.g. %test.quarkus.keycloak.devservices.port=… if you want a fixed port there too.
If Bearer requests in tests return 401 even with a token you just got, check exp: Keycloak’s container clock and the host JVM can skew enough that short-lived access tokens look expired to Quarkus. For the test profile only, add something like %test.quarkus.oidc.token.lifespan-grace=600 (seconds of leeway on expiry) in application.properties so ./mvnw verify stays stable. Do not treat that as a production setting.
Run the test with ./mvnw verify (or your IDE’s JUnit runner). That starts PostgreSQL and Keycloak via Dev Services, so Podman (or a compatible container runtime) must be available.
Share
Production Hardening
What happens under load
The endpoint generation does not change how authorization behaves under concurrency. Every incoming request still goes through authentication, identity resolution, and permission checks before the CRUD operation runs. So you do not get a “fast path” around security just because the endpoint is generated.
The important part is that this only protects the operation boundary. If 500 valid admin requests call DELETE /shipment/{id} at once, the permission layer does not serialize them. It only decides whether each caller is allowed to try. Database correctness is still handled by your persistence model and transaction boundaries.
What this does not protect
@PermissionsAllowed("shipment:read") protects the entire list, get, and count operations. It does not protect fields inside the response. If your entity contains sensitive columns such as internal cost data, this approach does not hide them from authorized readers. For that, you need data transfer objects (DTOs), projections, or response filtering.
The same is true for tenant isolation. If user A should only see shipments for customer A, and user B should only see shipments for customer B, you need a query constraint tied to the authenticated identity on top of operation-level checks. The generated endpoint can still help, but your repository logic has to enforce that boundary.
Failure behavior matters
One good thing about declarative security on generated endpoints is consistency. You do not risk forgetting to add an annotation to one custom method because there is no custom method. The security rule sits right on the operation declaration.
I have seen the manual alternative fail in real projects. A team rewrites generated CRUD as a resource class to add authorization. Six months later, someone adds a “temporary” admin shortcut endpoint for a migration, forgets the annotation, and that endpoint survives the release. Generated code saves time. It also removes places where humans forget things.
Conclusion
We built a generated CRUD API for Shipment, secured it with @PermissionsAllowed, issued OAuth2 scopes from Keycloak client scopes via the token endpoint’s scope parameter, and let Quarkus OIDC map those scopes to permissions. PostgreSQL and Keycloak Dev Services back the app; curl and tests show read versus admin behavior. The security rule stays on the operation declaration; how callers get scopes in production is a separate policy concern.
Subscribe now



What Your Local LLM Actually Sees: Debugging Ollama Traffic in Quarkus with mitmproxy
Markus Eisele — Fri, 10 Apr 2026 06:08:45 GMT
You might think that local models are easier to debug because they run on the same machine as the application. You start Ollama, point your Java client at localhost:11434, get a response back, and assume the transport side is simple. That feeling lasts until the answers get worse, latency goes up, or a tool call starts doing strange things.
The model is only one part of the story. The full serialized request drives behavior too. Your Java code creates a clean interface method. The framework turns that into JSON. Then the model sees the final payload: system prompt, user message, tool schema, generation settings, and anything else your client sends. If that payload is too large or shaped differently than you expected, the model behaves differently. Application logs often miss that final shape.
This gets worse when you use OpenAI-compatible APIs. The same request format can target OpenAI, LiteLLM, or Ollama. That is good for portability, but it also makes it easy to ignore what is actually going over the wire. Ollama supports an OpenAI-compatible /v1/chat/completions endpoint on http://localhost:11434/v1/, and that makes it a very good local target for this kind of inspection. It also supports tools on that endpoint; see the Ollama OpenAI compatibility documentation.
mitmproxy solves this problem by showing the real HTTP traffic. For local Ollama over plain HTTP, this is much simpler than the hosted HTTPS case. You do not need to trust a custom CA certificate for the main path in this tutorial, because we are not intercepting TLS here. We are just routing normal HTTP traffic through a local proxy. mitmweb runs the proxy on the listen port you choose and serves the web UI on 127.0.0.1:8081 by default; see mitmweb in the mitmproxy documentation.
What follows is a small Quarkus application that talks to Ollama through its OpenAI-compatible endpoint. We route that traffic through mitmproxy, compare a plain request with a tool-enabled request, and inspect what really hits the model. The useful outcome is simple: you can see the same payload your model sees. Quarkus LangChain4j supports named model configurations, AI services with @RegisterAiService, and tool integration with @Tool, so we keep the Java code small and still get a realistic payload on the wire; see Quarkus LangChain4j AI services.
Prerequisites
You need a local Java setup, a running Ollama installation, and mitmproxy. I assume you are comfortable with Quarkus REST endpoints and Maven, but I do not assume you already know the LangChain4j annotations used here.
Java 21 or newer installed (validated with Java 25)
Quarkus CLI installed
Ollama installed locally
mitmproxy installed locally (brew install --cask mitmproxy on macOS)
Basic understanding of REST endpoints
Project Setup
Create the project or grab it from my Github repository.
quarkus create app com.example:ollama-wiretap-demo \
  --package-name=com.example.ollamawiretap \
  --extension=rest-jackson,io.quarkiverse.langchain4j:quarkus-langchain4j-openai  \
  --no-code
We use rest-jackson because we want a simple JSON REST endpoint in Quarkus REST. We use quarkus-langchain4j-openai on purpose, even though the model is local. The reason is simple: Ollama exposes an OpenAI-compatible API, so this lets us inspect the same wire format many teams use against hosted providers later. 
Change into the project directory:
cd ollama-wiretap-demo
Implementation
Create the request and response types
We start with two small records. They keep the REST endpoint simple, and they also make verification easier because the HTTP response shape stays stable even though the model output itself is not deterministic.
Create src/main/java/com/example/ollamawiretap/PromptRequest.java:
package com.example.ollamawiretap;

public record PromptRequest(String question) {
}
Create src/main/java/com/example/ollamawiretap/PromptResponse.java:
package com.example.ollamawiretap;

public record PromptResponse(String mode, String answer, long durationMs) {
}
This gives us a stable contract. The answer text changes from run to run. The mode and durationMs fields do not. That matters for AI verification. We do not test exact wording. We test that the call went through the expected path and that we can inspect the request that produced it.
Create a plain AI service
Create the first AI service next. This is our baseline. It has a short system prompt and no tools.
Create src/main/java/com/example/ollamawiretap/PlainAssistant.java:
package com.example.ollamawiretap;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService
public interface PlainAssistant {

    @SystemMessage("""
            You are a concise software architecture assistant.
            Answer in no more than four sentences.
            Be concrete.
            """)
    String answer(@UserMessage String question);
}
This interface is small, but it still routes through the OpenAI client configured in application.properties. We keep a single default model configuration so the proxy path is explicit and easy to validate; see Quarkus LangChain4j AI services.
The guarantee here is simple. Every call to answer becomes a chat-completions request. The limit is also simple. This does not tell you anything about payload size unless you inspect the traffic. The Java method hides the JSON. That is the whole problem we are solving.
Create a tool bean
Add a CDI bean with a tool method. The Quarkus LangChain4j AI services reference shows the @Tool pattern for function calling. We will use a tiny tool on purpose so the traffic difference is easy to understand; see Quarkus LangChain4j AI services.
Create src/main/java/com/example/ollamawiretap/ArchitectureTools.java:
package com.example.ollamawiretap;

import dev.langchain4j.agent.tool.Tool;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class ArchitectureTools {

    @Tool("Return the current platform stack used by the application")
    public String currentStack() {
        return "Java, Quarkus, Ollama, mitmproxy";
    }
}
This tool does almost nothing. That is fine for this tutorial. It makes the request larger and different on the wire. Once tools are available, the model call includes tool metadata. Teams often forget this overhead when they discuss context budgets.
Create a tool-enabled AI service
The second AI service keeps the same basic behavior, but with tool access enabled.
Create src/main/java/com/example/ollamawiretap/ToolAssistant.java:
package com.example.ollamawiretap;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(tools = {ArchitectureTools.class})
public interface ToolAssistant {

    @SystemMessage("""
            You are a concise software architecture assistant.
            Use tools when they help answer the question.
            Answer in no more than four sentences.
            Be concrete.
            """)
    String answer(@UserMessage String question);
}
This is where the transport story gets interesting. The Java code barely changed. The request body did. That difference is invisible at the call site, but it is visible in mitmproxy.
Create the REST endpoint
Finish with a REST endpoint that lets us call either mode. We use a path parameter so we can compare plain and tool without changing code between runs.
Create src/main/java/com/example/ollamawiretap/PromptResource.java:
package com.example.ollamawiretap;

import jakarta.inject.Inject;
import jakarta.ws.rs.BadRequestException;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/inspect")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public class PromptResource {

    @Inject
    PlainAssistant plainAssistant;

    @Inject
    ToolAssistant toolAssistant;

    @POST
    @Path("/{mode}")
    public PromptResponse inspect(@PathParam("mode") String mode, PromptRequest request) {
        long start = System.currentTimeMillis();

        String answer = switch (mode) {
            case "plain" -> plainAssistant.answer(request.question());
            case "tool" -> toolAssistant.answer(request.question());
            default -> throw new BadRequestException("Mode must be 'plain' or 'tool'");
        };

        long duration = System.currentTimeMillis() - start;
        return new PromptResponse(mode, answer, duration);
    }
}
This endpoint gives us a clean comparison point. Both requests come from the same application, hit the same Ollama server, and go through the same proxy. The only thing that changes is the AI service configuration.
Under stress, this code is still synchronous. That is fine for the tutorial. For higher throughput systems, you would care about concurrency, request queuing, and timeouts much more explicitly. But for inspecting payload shape, a blocking REST endpoint is the easiest thing to reason about at 2am.
Configuration
Configure the application in src/main/resources/application.properties:
quarkus.langchain4j.openai.base-url=http://127.0.0.1:11434/v1
quarkus.langchain4j.openai.api-key=ollama
quarkus.langchain4j.openai.proxy-type=HTTP
quarkus.langchain4j.openai.proxy-host=127.0.0.1
quarkus.langchain4j.openai.proxy-port=8888
quarkus.langchain4j.openai.chat-model.model-name=qwen2.5-coder:7b
quarkus.langchain4j.openai.chat-model.log-requests=true
quarkus.langchain4j.openai.chat-model.log-responses=true
quarkus.langchain4j.openai.base-url points the OpenAI client at Ollama’s OpenAI-compatible endpoint. I use 127.0.0.1 here to avoid localhost edge-cases around IPv6 and proxy bypass in some environments. 
proxy-type, proxy-host, and proxy-port route the OpenAI-compatible client through mitmproxy. The value 8888 is arbitrary, but it must be a port where mitmproxy is actually listening, and it must not be the same port Quarkus uses for its own HTTP server (quarkus.http.port). 
chat-model.model-name=qwen2.5-coder:7b selects the local Ollama model. The exact model is your choice. Use one that is already comfortable on your machine. log-requests and log-responses are useful here because they let you compare framework-level logging with the real wire capture. The logs help. The wire is still the source of truth.
This configuration gives you one clear guarantee. The Quarkus client will call Ollama through mitmproxy. It does not guarantee the model will use the tool on every request. Tool use is still model behavior, and that is probabilistic.
Running the Stack
While writing this article, the demo initially used the LangChain4j versions aligned by the Quarkus platform BOM alone. On that combination, the OpenAI client did not honor the proxy-related application.properties keys, so requests never reached mitmproxy even though the configuration looked correct. We had to bump the Quarkus LangChain4j stack: the companion pom.xml imports io.quarkiverse.langchain4j:quarkus-langchain4j-bom at 1.8.4, which picks up the upstream fix in quarkus-langchain4j#2276.
Note: The quarkus.langchain4j.openai.proxy-type, proxy-host, and proxy-port settings from the Configuration section are the ones this walkthrough relies on. Match or exceed that BOM version (or a release that includes the same fix) so those properties actually configure the HTTP client; otherwise you may need the JVM -Dhttp.proxyHost=... workaround below even when the Quarkus keys are set.
Pull a model if you do not already have one:
ollama pull qwen2.5-coder:7b
Start Ollama:
ollama serve
Ollama’s OpenAI compatibility docs show the local base URL and the /v1/chat/completions endpoint shape; see the Ollama OpenAI compatibility documentation.
Now start mitmproxy:
mitmweb --listen-port 8888
This starts the proxy on port 8888 and the web UI on http://127.0.0.1:8081.
Important with newer mitmproxy releases: the web UI is protected by a one-time auth token. If you open http://127.0.0.1:8081/#/capture and get HTTP 403, check the mitmweb terminal output, copy the token/password shown there, and enter it in the browser form once for that session.
After that, start the Quarkus app:
./mvnw quarkus:dev
If requests still do not appear in mitmproxy, force JVM-level proxy settings for the Quarkus process:
./mvnw quarkus:dev \
  -Dhttp.proxyHost=127.0.0.1 \
  -Dhttp.proxyPort=8888 \
  -Dhttps.proxyHost=127.0.0.1 \
  -Dhttps.proxyPort=8888 \
  -Dhttp.nonProxyHosts=
At this point the runtime path is:
curl → Quarkus on port 8080 → mitmproxy on port 8888 → Ollama on port 11434
That is the path we care about. There is no hidden gateway and no hosted provider. You are looking at the traffic leaving your Java application and arriving at your local model server.
Production Hardening
Keep mitmproxy as a development and incident tool, not as default production architecture. It stores prompts, tool definitions, and model responses in one more place. If those payloads contain internal data, customer data, or secrets, the proxy now becomes part of your data boundary.
Account for latency before you compare model performance. Even on localhost, the extra proxy hop adds overhead. If you benchmark with mitmproxy enabled and then benchmark without it, you can separate model latency from client serialization and proxy cost.
Treat transport mode as environment-specific. In this tutorial we use plain local HTTP, so there is no CA trust setup. For HTTPS targets, mitmproxy must terminate TLS and your client must trust the mitmproxy CA. That is a different security and ops posture; see About certificates in the mitmproxy documentation.
Use this capture workflow when behavior changes after enabling tools or adding memory. At that point, inspect the JSON payload size and shape first, then tune prompts or tool design. This ties back to the opening problem: debugging model output without inspecting transport data is mostly guesswork.
Verification
Compare a plain request and a tool-enabled request
Send the plain request first:
curl -s http://localhost:8080/inspect/plain \
  -H 'Content-Type: application/json' \
  -d '{"question":"Explain why large prompts increase latency."}'
Expected response shape:
{
  "mode": "plain",
  "answer": "...",
  "durationMs": 1234
}
Now send the tool-enabled request:
curl -s http://localhost:8080/inspect/tool \
  -H 'Content-Type: application/json' \
  -d '{"question":"What stack does this application use, and why does that matter for debugging?"}'
Expected response shape:
{
  "mode": "tool",
  "answer": "...",
  "durationMs": 1450
}
We are not verifying exact wording. This is an AI system. We verify that both calls return JSON, that the mode field matches the endpoint used, and that each request creates a captured flow in mitmproxy.
Inspect the wire
Open the token URL printed by mitmweb (for example http://127.0.0.1:8081/?token=...) and use this filter:
~u /v1/chat/completions
In the capture UI, you should see something like this once requests are flowing:
You should see both requests. Open the plain request first. The body will contain model, a messages array, and your system and user messages. That is the real payload Ollama receives on the OpenAI-compatible endpoint. 
Now open the tool-enabled request. The key difference is tool metadata in the JSON. Your Java endpoint stays the same while the payload grows.
If the model decides to call the tool, you will also see the multi-step exchange. First the model asks for the tool. Then your application executes it. Then the tool result goes back to the model. That round-trip cost is easy to ignore when all you look at is one neat Java method call.
Compare mitmproxy with application logs
Because log-requests and log-responses are enabled, Quarkus will also log model traffic. This is useful, but it is still not the same as the wire capture. The logs are whatever the client library decided to print. Mitmproxy shows the actual request that crossed the proxy boundary. When the two differ, trust the wire.
Conclusion
We built a Quarkus app that calls a local Ollama model through the OpenAI-compatible endpoint, routed that traffic through mitmproxy, and compared plain vs tool-enabled payloads on the wire. The practical outcome is simple: when answers get weird, you can inspect the exact JSON the model received, instead of guessing from abstractions.
Subscribe now



AI Coding Tools in 2026: How to Work With Agents Without Losing Control
Markus Eisele — Thu, 09 Apr 2026 06:08:42 GMT
If you feel overwhelmed by AI coding tools right now, that is normal.
A year ago, autocomplete felt like progress. Today, tools read repositories, edit files, run commands, pull external context, and keep iterating until they decide the task is done. This is a different operating model for software development.
You still write code. You still design systems. But now you also steer software that changes software.
That sounds efficient until it edits faster than you can review, passes local tests, and still breaks something important. I have hit that wall enough times that I no longer ask, “Which tool is best?”
The question that matters is simpler:
How much control do I keep while using it?
That is the map I use now. Not a ranking. Not a hype list. A control map.
The Real Shift Is Blast Radius
People still talk about AI coding tools as productivity tools:
Faster typing
Less boilerplate
Quicker prototypes
That breaks once the system can inspect a repository, change multiple files, run commands, and retry on failure. At that point, your problem is blast radius.
You stop reviewing lines and start reviewing behavior. You stop asking “Did it write this function correctly?” and start asking “What else did it touch, what assumptions did it make, and how confident am I in the result?”
That is a bigger shift than most teams admit.
I have had agents produce changes that looked clean, compiled cleanly, and still carried wrong assumptions into the application. The issue was not that the model was useless. The issue was scope: I let it operate wider than my review model could safely absorb.
The Ladder: IDE, CLI, Generator
This space gets easier to reason about when you reduce it to three levels:
IDE agents
CLI agents
Full app generators
This is a control ladder, not a maturity ladder.
Higher does not mean better. Higher means broader autonomy and a larger blast radius.
An IDE agent usually works close to code you are already looking at. A CLI agent can operate at repository scope and execute directly through the terminal. A full app generator abstracts more and pushes you toward “describe what you want” over “review what changed.”
The mistake I see all the time is assuming more autonomy is automatically more advanced. It is not. It is just easier to lose track of what happened.
IDE: Where I Start With IBM Bob
If I introduce AI coding into a team, I do not start with the most autonomous system I can find. I start with the most governable one.
That is why I reach for IBM Bob.
Bob is not a lightweight sidebar assistant. IBM positions it as an AI SDLC partner and coding agent, and it can read and write files, run commands, and use external tools through MCP. That puts it in the real agent category.
What makes Bob interesting to me is workflow clarity. Autonomy is more explicit.
Bob ships with built-in modes such as Ask, Plan, Code, Advanced, and Orchestrator. These are specialized personas with different capabilities and access levels. Teams can also define custom modes to constrain behavior and tool access.
Ask and Plan keep exploration non-destructive. Code and Advanced move into implementation. Orchestrator is there for broader multi-step work. This separation helps new users, but the bigger value is governance: it creates an execution contract.
In larger teams, explicit phase boundaries are often more valuable than raw autonomy because they make review, approval, and intent visible.
Bob also gives you concrete control knobs. There is .bobignore for sensitive paths and large assets, and it supports manual, auto, and hybrid approval models. I recommend leaving auto-approval disabled when traceability matters so you can approve or deny commands as they happen.
That is exactly the surface I want when an agent starts touching a real codebase.
There is also literate coding, where you write intent next to code and generate implementation in place. IBM is clear this is single-file today and still a preview feature. I am fine with that because scoped edits are a safety feature while teams build review discipline.
And this distinction matters: scoped does not mean weak. Scoped means deliberate.
I would rather start with an environment that makes intent, permissions, and blast radius explicit than one that can mutate half the tree before I have a reliable review habit.
Other IDE tools can move fast across many files too. That is real. But speed without an operating model is where teams get sloppy.
CLI: Bob Shell, Claude Code, and Repository Scope
The next step up the ladder is the CLI.
This is where the agent stops feeling like an editor assistant and starts feeling like a repository operator.
IBM Bob extends into this space with Bob Shell. Claude Code is also a clear example of this category. Claude Code is documented as a terminal tool that edits files, runs commands, and operates across your project from the command line. Bob Shell pushes Bob’s workflow into terminal-driven tasks and automation.
This is maximum leverage for people who already think in systems, commands, and boundaries. It is also where things break fastest.
The terminal removes friction. That is the appeal. You describe a task, the system searches files, changes code, runs commands, and tries to close the loop.
It feels great until it does not.
Once an agent works naturally at repository scope, your architecture map becomes the real safety mechanism. If your mental model is weak, the tool exposes that weakness quickly. It can make broad, technically plausible changes faster than you can fully reason about them.
That is why I treat CLI agents differently from IDE agents.
I use them when the task is clear, scope is understood, and I am ready to audit the result. I do not use them as a substitute for system understanding. Claude’s permission and auto-mode work is interesting because the industry is now dealing with approval fatigue and trying to find a middle ground between friction and recklessness.
So yes, CLI agents are powerful. The real story is how much repository scope you are willing to expose to autonomous change in one move.
Full App Generators: Fast Output, Hidden Architecture
At the far end of the ladder are full app generators.
Lovable and Emergent are good examples. You describe an application in natural language, and the system scaffolds frontend, backend, deployment, and often surrounding structure as well. That is real leverage for prototypes, demos, hackathons, and early product exploration.
This is also where understanding drops out of the process fastest.
“Vibe coding” became useful language for this reason. AI-assisted coding is not inherently unserious. But there is a real behavior pattern where prompting becomes the primary act of development and code understanding becomes optional. Karpathy’s phrasing and Simon Willison’s follow-up made this clear: the problem is shipping what you do not understand.
So I treat generators as sketchpads.
They are excellent for compressing idea-to-running-app time. They are much less useful when I need high confidence in architecture, security boundaries, or long-term maintainability.
Fast output is not the same thing as stable software.
The Traps I Hit
1) Reviewer Fatigue
At first, AI tools feel amazing because they move faster than you do. Then a subtle bug shows up, and you realize you are debugging output you barely internalized.
The fix is boring, but it works:
Keep scope small
Review everything until you trust the patterns
Ask for tests early
Do not treat passing output as understood output
This matters even more because industry research keeps showing that AI-generated code can include insecure or flawed patterns when review is weak.
2) The Context Tax
Using multiple tools on the same problem sounds smart. In practice, it often creates fragmented state. One tool knows about the last fix. Another does not. One session carries the right assumptions. The next session reintroduces something you already resolved.
My fix is simple: one tool per session, one operating model at a time.
3) Treating Autonomy Like Maturity
This one took longer to unlearn. The most autonomous tool in the room is not automatically the right one. Often it is the wrong one.
The right question is not “What can this agent do?” The right question is “What scope should this agent have for this task?”
That mindset shift is what has held up for me.
MCP Changes Context, Not Responsibility
One of the most important shifts in this space is MCP (Model Context Protocol).
Anthropic introduced MCP as an open standard for connecting AI tools to data sources and external systems. The ecosystem is now real enough to matter in day-to-day tool decisions. Slack has an official MCP server. Atlassian supports remote MCP workflows for Jira and Confluence. IBM Bob integrates MCP into its tool model, including terminal workflows.
MCP does not make the model correct. It gives the model fewer excuses to guess.
If the agent can pull the actual ticket, real internal docs, or real team conversation, work depends less on invented context. In enterprise settings, that matters because the gap between code and business context is where expensive mistakes happen.
But MCP is not magic. It reduces one failure class and introduces more systems responsibility. You still own permissions, tool boundaries, approvals, and review. And next to MCP, there’s also CLI tools.
Safety Is Still Not Solved
This market is still too casual about safety.
Prompt injection is real. Tool misuse is real. Approval fatigue is real. OWASP explicitly calls out prompt injection and insecure tool behavior as major risks for LLM applications, and IBM security material around Bob says the same in enterprise terms: once agents gain tool access, prompt injection, jailbreaks, and poisoned context become practical attack paths.
So my rule stays simple:
Automate only what you can explain.
If you cannot say what the agent is allowed to touch, why it is allowed to touch it, and how you will review the result, do not let it run.
That rule applies equally to Bob, Bob Shell, Claude Code, and full app generators.
Share
What Actually Works
If you are a senior engineer moving into this space, optimize for control before capability shopping.
Start in the IDE. Learn the operating model. Learn tool scope, execution behavior, approval flow, and context boundaries. That is why I like IBM Bob as a starting point for serious teams: The control surface is easier to see.
Then move up the ladder when the task really requires it:
Use the CLI when repository-level action is justified and you are ready to audit the result
Use generators when ideation speed matters more than architectural clarity
That is the map.
Not beginner to advanced
Not weak to powerful
Narrower blast radius to wider blast radius
In 2026, the winning skill is not prompting.
It is change control.
Subscribe now



Real-Time Bitcoin Analytics in Java with Quarkus
Markus Eisele — Wed, 08 Apr 2026 06:08:42 GMT
Most developers think technical indicators are a frontend problem. You fetch some prices, calculate a few averages, and draw lines on a chart. That mental model breaks the moment you try to do this in real time.
Live market data is infinite, bursty, and noisy. If you process every tick synchronously, your UI freezes. If you buffer too much, your signals lag behind reality. If you calculate indicators incorrectly under load, you don’t just get wrong charts, you get wrong trading signals.
Bollinger Bands make this problem obvious. They depend on sliding windows, statistical calculations, and consistent ordering. A single dropped or reordered event skews the bands. A single blocking call backpressures the entire pipeline.
In this tutorial, we build a real-time Bollinger Band monitor for Bitcoin that survives these realities. We ingest live trade data from Binance, process it using a sliding window pipeline, and stream clean, throttled signals to a browser dashboard. This is a stream-processing walkthrough in Java, not trading advice.
What you’ll build
By the end, you have a small Quarkus app that:
Connects to Binance over WebSocket and parses trade ticks
Debounces and window trades, then computes Bollinger Bands on the server
Serves a dark-themed dashboard with Chart.js and live updates over Server-Sent Events (SSE)
You can follow the steps in a fresh project, or open the companion bollinger-monitor sources next to this article. 
Prerequisites
Java 21 or newer (the companion pom.xml sets maven.compiler.release; align it with the JDK you run)
Apache Maven
Quarkus CLI or familiarity with mvn quarkus:dev
Project setup
Step 1: Create the Quarkus application
Let’s start with a new Quarkus app. We use the reactive REST stack, Qute for server-side templates, and WebSockets for ingestion.
quarkus create app org.acme:bollinger-monitor \
  --extensions=quarkus-rest-jackson,quarkus-rest-qute,websockets-next \
  --java=21
cd bollinger-monitor
If you open the companion project from this repo, check maven.compiler.release in the root pom.xml and make it match the JDK you run (the generated CLI project uses whatever you passed to --java=).
Step 2: Add Gatherers4j to your build
Add Gatherers4j to pom.xml:

    com.ginsberg
    gatherers4j
    0.13.0
Gatherers4j is a small library of stream gatherers—custom intermediate operations you plug into a Java Stream with .gather(...). The classic Stream API made it straightforward to define terminal behavior with Collector, but reusable intermediate steps such as sliding windows, debouncing, and throttling were not first-class; teams often reimplemented them or jumped to a separate streaming runtime. Gatherers close that gap: you keep an ordinary in-process stream (here, fed from our queue), compose operators like debounce and window, and avoid pulling in a distributed stream-processing framework. 
Implementation
Map Binance trades to a Java record
Binance trade messages are compact JSON objects. We only care about price and timestamp.
package org.acme.domain;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;

@JsonIgnoreProperties(ignoreUnknown = true)
public record TradeData(
                @JsonProperty("p") double price,
                @JsonProperty("T") long timestamp) {
}
Ignoring unknown fields protects us from API changes. If Binance adds fields tomorrow, your pipeline keeps running.
Define the signal you stream to the UI
This record is what we send to the browser: raw band values plus a simple label the UI can show.
package org.acme.domain;

public record BollingerSignal(
        double currentPrice,
        double upperBand,
        double lowerBand,
        double middleBand,
        String signal) {
}
The UI never recalculates indicators. That logic belongs on the server, where correctness is easier to test.
Extract Bollinger math for reuse and tests
The companion project moves the band and signal logic into a small static helper so BollingerService stays focused on streaming and you can unit test the formula without the queue or Gatherers.
package org.acme.service;

import java.util.List;

import org.acme.domain.BollingerSignal;
import org.acme.domain.TradeData;

public final class BollingerCalculator {

    private BollingerCalculator() {
    }

    public static BollingerSignal calculate(List window, double k) {
        double currentPrice = window.getLast().price();

        double mean = window.stream()
                .mapToDouble(TradeData::price)
                .average()
                .orElse(0.0);

        double variance = window.stream()
                .mapToDouble(t -> Math.pow(t.price() - mean, 2))
                .average()
                .orElse(0.0);

        double stdDev = Math.sqrt(variance);

        double upper = mean + (k * stdDev);
        double lower = mean - (k * stdDev);

        String status = "NORMAL";
        if (currentPrice >= upper) {
            status = "BREAKOUT_UP";
        } else if (currentPrice <= lower) {
            status = "BREAKOUT_DOWN";
        } else if (stdDev < mean * 0.0001) {
            status = "SQUEEZE";
        }

        return new BollingerSignal(currentPrice, upper, lower, mean, status);
    }
}
Signal edge case: when every price in the window is identical, the bands collapse to a single level and the last price satisfies currentPrice >= upper before the squeeze check runs, so the label is BREAKOUT_UP, not SQUEEZE. The companion tests document that ordering.
Ingest trades without blocking the socket thread
WebSocket callbacks must stay fast. Any blocking work here will drop messages.
We buffer incoming trades into a queue and process them elsewhere.
package org.acme.ingest;

import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;

import org.acme.domain.TradeData;

import com.fasterxml.jackson.databind.ObjectMapper;

import io.vertx.core.http.WebSocketClient;
import io.vertx.core.http.WebSocketConnectOptions;
import io.vertx.mutiny.core.Vertx;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

@ApplicationScoped
public class BinanceClient {

    public static final BlockingQueue BUFFER = new LinkedBlockingQueue<>();

    @Inject
    Vertx vertx;

    private final ObjectMapper mapper = new ObjectMapper();
    private io.vertx.mutiny.core.http.WebSocket webSocket;

    public void connect(String uri) {
        // Check for proxy environment variables that might interfere
        String httpProxy = System.getenv("HTTP_PROXY");
        String httpsProxy = System.getenv("HTTPS_PROXY");
        if (httpProxy != null || httpsProxy != null) {
            io.quarkus.logging.Log.warn(
                    "Proxy environment variables detected - HTTP_PROXY: " + httpProxy + ", HTTPS_PROXY: " + httpsProxy);
            io.quarkus.logging.Log.warn("Using direct connection to Binance (bypassing proxy)");
        }

        WebSocketClient client = vertx.getDelegate().createWebSocketClient();

        // Parse URI to extract host, port, and path
        // Format: wss://stream.binance.com:9443/ws/btcusdt@trade
        java.net.URI parsedUri = java.net.URI.create(uri);
        String host = parsedUri.getHost();
        int port = parsedUri.getPort() != -1 ? parsedUri.getPort() : (uri.startsWith("wss://") ? 443 : 80);
        String path = parsedUri.getPath() + (parsedUri.getQuery() != null ? "?" + parsedUri.getQuery() : "");
        boolean ssl = uri.startsWith("wss://");

        // Use host/port directly to bypass proxy resolution
        WebSocketConnectOptions options = new WebSocketConnectOptions()
                .setHost(host)
                .setPort(port)
                .setURI(path)
                .setSsl(ssl);

        io.quarkus.logging.Log.info("Connecting to Binance WebSocket: " + host + ":" + port + path);

        client.connect(options)
                .onSuccess(ws -> {
                    this.webSocket = new io.vertx.mutiny.core.http.WebSocket(ws);
                    io.quarkus.logging.Log.info("Binance WebSocket connected successfully");
                    ws.textMessageHandler(message -> {
                        try {
                            TradeData data = mapper.readValue(message, TradeData.class);
                            BUFFER.offer(data);
                        } catch (Exception e) {
                            io.quarkus.logging.Log.warn("Failed to parse trade data: " + e.getMessage());
                        }
                    });
                    ws.closeHandler(v -> {
                        io.quarkus.logging.Log.warn("Binance WebSocket closed");
                    });
                })
                .onFailure(throwable -> {
                    io.quarkus.logging.Log.error("Failed to connect to Binance WebSocket", throwable);
                });
    }

    public void disconnect() {
        if (webSocket != null) {
            webSocket.close();
        }
    }
}
This queue is a pressure boundary. If downstream slows down, we drop or delay work without blocking the socket thread. Parsing the WebSocket URI into host, port, path, and SSL (instead of only setURI) helps in environments where HTTP(S) proxies would otherwise intercept wss:// connections.
Turn the queue into a windowed stream
This is the core of the system. We convert an infinite queue into a controlled, windowed stream.
package org.acme.service;

import java.time.Duration;
import java.util.List;
import java.util.Objects;
import java.util.concurrent.Executors;

import org.acme.domain.BollingerSignal;
import org.acme.domain.TradeData;
import org.acme.ingest.BinanceClient;
import org.jspecify.annotations.NonNull;

import com.ginsberg.gatherers4j.Gatherers4j;

import io.quarkus.logging.Log;
import io.quarkus.runtime.StartupEvent;
import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.subscription.MultiEmitter;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;

@ApplicationScoped
public class BollingerService {

    @Inject
    BinanceClient binanceClient;

    private volatile MultiEmitter currentEmitter;
    private volatile boolean processingStarted = false;

    private static final int WINDOW_SIZE = 20;
    private static final double K = 2.0;
    private static final @NonNull Duration DEBOUNCE_DURATION = Objects.requireNonNull(Duration.ofMillis(50));

    public Multi stream() {
        return Multi.createFrom().emitter(emitter -> {
            this.currentEmitter = emitter;
            Log.info("New subscriber connected to stream");
            // Start processing if not already started
            if (!processingStarted) {
                synchronized (this) {
                    if (!processingStarted) {
                        processingStarted = true;
                        Executors.newSingleThreadExecutor().submit(this::processStream);
                    }
                }
            }
        });
    }

    void onStart(@Observes StartupEvent ev) {
        connectToBinance();
    }

    private void processStream() {
        Log.info("Starting stream processing - waiting for trade data...");
        try {
            java.util.stream.Stream.generate(() -> {
                try {
                    TradeData data = BinanceClient.BUFFER.take();
                    return data;
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    Log.warn("Stream processing interrupted");
                    return null;
                }
            })
                    .takeWhile(data -> data != null)
                    .gather(Gatherers4j.debounce(1, DEBOUNCE_DURATION))
                    .gather(Gatherers4j.window(WINDOW_SIZE, 1, true))
                    .map(this::calculateBollinger)
                    .forEach(signal -> {
                        MultiEmitter emitter = this.currentEmitter;
                        if (emitter != null && !emitter.isCancelled()) {
                            emitter.emit(signal);
                        }
                    });
        } catch (Exception e) {
            Log.error("Error in processing stream", e);
            MultiEmitter emitter = this.currentEmitter;
            if (emitter != null && !emitter.isCancelled()) {
                emitter.fail(e);
            }
        }
    }

    private BollingerSignal calculateBollinger(List window) {
        return BollingerCalculator.calculate(window, K);
    }

    private void connectToBinance() {
        Log.info("Attempting to connect to Binance WebSocket...");
        try {
            binanceClient.connect("wss://stream.binance.com:9443/ws/btcusdt@trade");
            // Note: connection is asynchronous, success/failure logged in BinanceClient
        } catch (Exception e) {
            Log.error("Failed to initiate Binance connection", e);
        }
    }
}
Lifecycle in the companion project: onStart only opens the Binance WebSocket. The consumer thread starts when the first client subscribes to the SSE Multi (first browser hitting /stream). Until then, trades accumulate in the buffer. That avoids a dedicated blocked thread when nobody is watching.
Multiple dashboards: the service keeps a single currentEmitter. The last subscriber wins; earlier SSE clients will not receive new signals unless you introduce a broadcast Multi or a shared processor. For a single-tab demo this is fine.
Interrupt handling: after take() is interrupted, the generator returns null; takeWhile ends the stream so null never reaches Gatherers4j.
Gatherers4j and nullness: the library uses JSpecify; some IDEs warn when passing a bare Duration.ofMillis(50) into debounce. A static final @NonNull Duration initialized with Objects.requireNonNull(Duration.ofMillis(50)) satisfies those checkers without changing runtime behavior.
Why debounce before windowing? Raw ticks arrive very fast. If we window every tick, the chart and the browser work too hard, and the bands jump on noise. A short debounce collapses bursts so the window sees a steadier stream, and the UI still feels live.
We debounce first, then window. That keeps the UI responsive and the math stable.
Serve the dashboard and an SSE stream
We serve a simple HTML page and expose an SSE stream for live updates.
package org.acme;

import org.acme.domain.BollingerSignal;
import org.acme.service.BollingerService;
import org.jboss.resteasy.reactive.RestStreamElementType;

import io.quarkus.qute.Template;
import io.quarkus.qute.TemplateInstance;
import io.smallrye.mutiny.Multi;
import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/")
public class DashboardResource {

    @Inject
    Template index;

    @Inject
    BollingerService service;

    @GET
    @Produces(MediaType.TEXT_HTML)
    public TemplateInstance get() {
        return index.data("title", "BTC Bollinger Bands");
    }

    @GET
    @Path("/stream")
    @Produces(MediaType.SERVER_SENT_EVENTS)
    @RestStreamElementType(MediaType.APPLICATION_JSON)
    public Multi stream() {
        return service.stream();
    }
}
SSE gives us ordered, one-way streaming from the server without extra WebSocket wiring in the page.
Build the Chart.js dashboard
We visualize the “tunnel” effect of Bollinger Bands. In Chart.js, pairing fill on the lower band with the upper band dataset gives a shaded band between them (see the fill values in the snippet below).
The companion repo keeps the same Chart.js and EventSource("/stream") pattern but expands index.html with extra layout and a metrics panel; the following snippet is the minimal version from the original walkthrough.
src/main/resources/templates/index.html:



    {title}
    
    




    Bitcoin (BTC/USDT) Real-Time Bollinger Bands
    INITIALIZING STREAM...
    




Configuration
This example runs with defaults. In production, you would externalize:
WebSocket endpoint URL
Window size and multiplier
Debounce duration
Those values directly affect signal sensitivity and system load.
Automated tests
The companion project replaces the default Quarkus greeting tests with:
BollingerCalculatorTest — pure unit tests for band math and signal labels (including the “flat window” edge case).
TradeDataMappingTest — Jackson deserializes a Binance-style JSON line (p as string, unknown fields ignored).
DashboardResourceTest — GET / returns HTML containing the page title and the EventSource("/stream") client.
DashboardResourceIT — runs the same HTTP checks in packaged mode when you enable integration tests (for example -DskipITs=false on verify).
Run unit tests with:
mvn test
Production hardening
Stream backpressure
The blocking queue isolates ingestion from processing. If calculations slow down, the WebSocket thread stays alive. Without this boundary, Binance would disconnect you under load.
Ordering guarantees
This pipeline assumes trade events arrive in order. Binance guarantees ordering per symbol. If you merge multiple symbols, you must reorder by timestamp before windowing.
Numerical stability
Standard deviation is recalculated per window. For large windows or high-frequency data, you would use incremental variance algorithms to reduce floating-point drift.
Common pitfalls
Unbounded buffer: LinkedBlockingQueue without a cap can grow until you run out of memory if the consumer stops or cannot keep up. In production, pick a capacity and a clear policy (drop, sample, or spill to disk).
Parse errors: the companion client logs parse failures; you should still add metrics or counters in production so you can alert on a unhealthy feed.
No reconnect: if the WebSocket drops, this sample does not reconnect. Production clients need backoff, resubscribe, and maybe snapshot + replay from REST.
null from take(): on interrupt, the companion pipeline restores the interrupt flag, returns null, and ends the stream with takeWhile so Gatherers never see a null element.
One emitter: a single currentEmitter field means only the latest SSE subscriber receives signals; use a broadcast stream if you need multiple concurrent dashboards.
Verification
Run the application:
quarkus dev
In your browser, open the dashboard at http://localhost:8080.
You should see:
A moving BTC price line
A shaded Bollinger Band tunnel
A signal badge that switches between NORMAL, SQUEEZE, and breakouts
When the price exits the band, the signal changes immediately. That confirms the windowing and streaming logic is working.
Conclusion
You now have a real-time Bollinger Band monitor that handles infinite streams, sliding windows, and live visualization without blocking the UI thread or letting the socket handler do heavy work. The important parts are controlled ingestion, explicit windowing, and server-side signal calculation.
From here, adding trade execution, alerts, or persistence is an architectural decision, not a rewrite.
If you want to go further, a natural next step is a guarded trade-execution endpoint that only runs on a breakout signal, with a short note on why that kind of safeguard matters in production.
Subscribe now



When Code Gets Cheap, Quality Becomes the Strategy
Markus Eisele — Tue, 07 Apr 2026 06:08:50 GMT
The biggest problem in agent-driven development is not code generation.
It is trust.
The tools can already produce code, tests, refactorings, documentation, and pull requests at a speed that would have looked ridiculous not long ago. That part is real. What is far less mature is everything around it: how teams review that output, how they prove it is correct, how they trace decisions back to a responsible human, and how they stop architecture from slowly dissolving under a flood of plausible machine-produced changes.
Software delivery is becoming easier to accelerate and harder to trust. That changes the conversation completely. We are no longer only talking about developer productivity. We are talking about responsibility. About whether engineering teams can still explain the systems they ship. About whether passing tests still mean what they used to mean. About whether critical software can be built in a process where output is cheap, judgment is expensive, and certainty is always slightly out of reach.
A lot of teams are learning this the hard way. Some ignore the AI slop and merge too much. Others compensate with impressive-looking test coverage wrapped around shallow engineering decisions. Many are experimenting. Many are failing. The teams seeing real success are usually not the ones moving fastest. They are the ones applying these tools with restraint, experience, and a clear sense of where not to trust them.
That is why I think agent-driven SDLC has a standards problem long before it solves its tooling problem.
We are redistributing engineering responsibility
One of the easiest mistakes to make in this discussion is to frame the whole shift as simple automation. The agent writes more code, the developer writes less code, productivity goes up. That is the surface-level version. It misses the more important change underneath.
Developers are not just writing less. They are spending more time steering, constraining, verifying, and cleaning up. In the old model, authorship and responsibility were closely linked. You wrote the code, so you were expected to understand it. In the new model, that path becomes less direct. A human starts the task, an agent explores a solution, another tool edits files, the IDE suggests changes, a review assistant comments, and a human signs off at the end.
The code still ships under human responsibility, but the relationship between producing it and understanding it is getting weaker.
That changes what it means to be good at software engineering. It also raises the cost of weak judgment. A team with poor architectural instincts does not suddenly become strong because an agent can produce more code. It just creates larger amounts of weak software more quickly. Strong teams can absolutely get leverage from these tools, but they do so because they already know what good looks like, where risk hides, and when to stop the machine from confidently going in the wrong direction.
These tools amplify judgment. They do not replace it.
The industry is mistaking activity for progress
This is where a lot of companies get into trouble.
Agent-driven workflows generate visible motion. More code. More commits. More pull requests. More generated tests. More automated fixes. More demos. More App Store updates. More experiments. More output everywhere. It looks like acceleration because everything is moving.
But visible motion and meaningful progress are not the same thing.
Teams are starting to treat the artifacts of agent-driven development as proof that the underlying engineering is sound. A large test suite gets mistaken for rigor even when it mostly validates the agent’s own assumptions. A working demo gets treated as evidence of maintainability. A huge refactoring diff feels like success because the tool completed it in minutes. A ticket-to-PR pipeline gets presented as maturity because it resembles industrial scale.
The easy parts of software delivery are the easiest parts to automate and the easiest parts to measure. That creates a dangerous illusion. You can improve the metrics that are most visible while making the system itself harder to reason about. Data boundaries get weaker. Error handling stays shallow. Edge cases remain undiscovered. Architecture accumulates local optimizations that nobody planned. A passing pipeline starts to hide a declining engineering baseline.
That pattern is not a minor quirk of early tooling. It is one of the natural failure modes of this model.
When everyone can ship faster, quality becomes the strategy
There is another pressure building here, especially for established software vendors.
When the cost of producing software drops, competition changes shape. Smaller players can suddenly launch products, features, copilots, and agentic workflows at a speed that would have been much harder to match a few years ago. From the outside, that can make the market look flooded with innovation. Every week brings more announcements, more updates, more assistants, more products that appear to do everything. For larger and more established vendors, that creates a dangerous temptation: respond to the pressure by embracing every agentic pattern at once, ship faster than feels comfortable, and try to hold ground through visible momentum alone.
That response is understandable. It is also risky.
Once code becomes cheap, quantity stops being a meaningful signal of quality. More features do not automatically mean better software. More releases do not mean stronger products. More AI-generated surface area does not mean the product underneath is easier to operate, easier to secure, easier to integrate, or easier to trust. Many markets are about to relearn that the hard way.
Software is more than just code. It is the long tail that begins after the demo works. It is whether the architecture holds together as complexity grows. Whether support teams can diagnose failures. Whether customers can rely on behavior staying consistent. Whether integrations survive change. Whether security holds up under real use. Whether a vendor can explain design decisions, fix regressions without breaking everything else, and still be there to maintain the product when the excitement of launch day is long gone.
This is also where betting on no-name products becomes more complicated than it first appears. In a market shaped by agentic development, a small team can produce something impressive very quickly. But customers are not only buying a set of generated features. They are buying a future: maintenance, accountability, resilience, product direction, support, and staying power. When those things are weak, the apparent speed advantage can turn into long-term cost for everyone involved.
That is why quality versus quantity is no longer just an engineering argument. It is becoming a strategic one. In a market full of fast-moving products, durable software will stand out less by how much it can generate and more by how well it survives contact with reality.
Zero trust becomes the default working posture
There is also a human cost to all of this.
A lot of agent-driven development ends up creating a zero-trust environment by necessity. You do not fully trust the output. You do not fully trust the tests. You do not fully trust the explanation. You do not fully trust the refactoring. You definitely do not trust that all the edge cases have been found.
So you inspect. Then you verify. Then you add rules, prompts, templates, policy files, review gates, local conventions, evaluation harnesses, and more tooling around the tooling. All of that is rational. All of that is also expensive.
The promise was reduced toil. In many teams, the toil has simply changed shape.
Instead of typing every line directly, developers become permanent supervisors of a fast, confident, and uneven collaborator. Sometimes that trade is worth it. Sometimes it is not. Sometimes the productivity gain is obvious. Sometimes it disappears into review overhead and the mental drain of never being able to fully relax.
That constant wariness matters more than many people admit. It affects concentration, ownership, onboarding, and engineering culture. It changes the emotional texture of software development. It is one thing to collaborate with a tool you trust. It is another to work beside a system that is often useful, occasionally brilliant, and always suspect.
The reliable pockets are real, but narrower than the hype suggests
This is not a pessimistic case against the whole category. There are clearly places where agent-based development already works well.
It works better when the task is bounded. It works better when correctness is visible. It works better when rollback is cheap. It works better when the surrounding architecture is already strong. It works better when an experienced engineer can quickly tell when something feels wrong.
That is why scaffolding, boilerplate reduction, repetitive migrations, documentation support, low-risk internal tooling, and some forms of test assistance can be genuinely useful. The problem starts when success in these pockets gets generalized into confidence about everything else.
That leap is where teams get hurt.
Once software starts carrying serious business criticality, regulatory weight, safety implications, or long maintenance horizons, the question changes. It is no longer enough to ask whether an agent can produce acceptable code. The more important question is what evidence exists that the system, the workflow, and the chain of decisions are trustworthy enough for the domain.
That is a much harder standard to meet.
Critical systems are where the romance ends
This is where the strategic question gets serious.
Using agents for a dashboard, an internal admin tool, or a side project is one thing. Using them in software that can influence medical devices, medication workflows, vehicles, industrial controls, or other embedded systems with real-world failure consequences is something else entirely.
In those environments, generated code is not just a productivity artifact. It becomes part of an assurance story.
Who reviewed it? Against which standard? With what traceability? Can the team explain why a decision was made? Can it show the origin of a generated change? Can it reproduce the workflow that produced it? Can it prove the tests are meaningful rather than cosmetic? Can it demonstrate that safety constraints were actually enforced and not just described in a prompt somewhere?
Those are not anti-AI questions. They are normal engineering questions in environments where failure is expensive and sometimes irreversible.
This is where current agent tooling still feels immature. It is very good at producing output. It is much less mature when it comes to producing evidence. And in critical systems, evidence is what matters.
We are rebuilding trust layers from scratch in every company
Almost every serious company experimenting with agent-driven SDLC is inventing its own local operating system for trust.
Different prompt conventions. Different repository instructions. Different policy files. Different approval flows. Different evaluation harnesses. Different logging setups. Different provenance strategies. Different rules about where autonomy is allowed and where it stops. Different expectations for what a human reviewer must verify before approving a change.
Some of this is healthy experimentation. Some of it is duplicated labor on a massive scale.
That usually means the industry has entered a pre-standards phase.
Standardization tends to matter when fragmentation starts becoming expensive. Incompatibility increases. Portability gets worse. Safety becomes harder to reason about. Teams duplicate the same work in parallel. Trust does not travel well across organizational boundaries. DIN itself was founded in 1917, and DIN describes standardization in Germany as a form of industry self-regulation. The point is not to force a historical analogy too far. The point is simpler. Ad hoc solutions work for a while. Then the cost of living without common agreements becomes too high.
Agent-driven development feels like it is moving toward that moment.
The missing standards are operational, not just technical
When people hear the word standards, they often think about protocols, file formats, or APIs. Those matter, but the more urgent gap is operational.
We still do not have widely shared norms for questions like these:
What counts as acceptable evidence for an agent-generated change?
What level of traceability should be required for generated code in regulated environments?
What must a human reviewer verify before approving an agent-produced pull request?
How should teams document architectural intent in a way that agents can use without slowly corrupting it?
What does a meaningful evaluation harness look like beyond “the tests passed”?
What levels of autonomy are acceptable in different domains?
How do you onboard junior developers into a world where they can generate implementations faster than they can judge them?
Those are not just model questions. They are software delivery questions. They cut across engineering, architecture, governance, and risk.
We already have broad AI governance frameworks. NIST’s AI Risk Management Framework and its Generative AI profile exist, and ISO/IEC 42001 defines a management system standard for AI. But those frameworks do not answer the practical SDLC question of how agent-based delivery should be reviewed, evidenced, and controlled inside real software teams. That part is still being invented ad hoc.
If the industry does not shape those norms together, vendors and individual enterprises will shape them separately. That leads to the usual outcome: fragmented practices, hard-to-transfer skills, audit pain, and a lot of expensive reinvention.
Senior engineering judgment matters more now
One of the strangest ideas in the current conversation is that agent-driven development reduces the need for deep engineering experience.
Everything I see points the other way.
When output becomes cheap, judgment becomes expensive.
The ability to notice where a design is weak, where a test is shallow, where a refactoring quietly damages a boundary, where a generated abstraction will become tomorrow’s maintenance burden, where an agent is confidently wrong, where a missing edge case can still trigger an incident, these skills matter more in an agent-driven SDLC, not less.
This is why some of the current misuse feels so predictable. If a company believes it can compensate for weak architectural thinking by adding more generation, more prompt chains, and more superficial test automation, it is not modernizing. It is scaling confusion.
The teams getting real value are usually not the most aggressive. They are the most deliberate. They know where the tools help. They know where they do not. They know that a passing suite is not the same thing as a sound system. They know that human responsibility cannot be outsourced simply because the implementation path became machine-assisted.
That is not resistance. It is engineering maturity.
The next standards battle in software will be about trust
This is the strategic point that I keep coming back to.
The companies that benefit most from agent-driven development will not be the ones generating the most code. They will be the ones building the best systems of control around it. In the next few years, the real advantage will not come from speed alone. It will come from knowing what can be trusted, what must be checked, what needs a human decision, and what should never be delegated at all.
That is the part of this shift the industry still understates. Code generation is improving fast. Confidence is not. Until we build stronger standards for traceability, review, accountability, and evidence, agent-driven SDLC will remain powerful, useful, and fundamentally unstable. The teams that understand this early will not just ship more. They will ship with fewer illusions.
The future belongs to the teams that can prove their software deserves trust, not just produce it faster.
Subscribe now



Hybrid Search in Quarkus: Full-Text and Vector Together
Markus Eisele — Mon, 06 Apr 2026 06:08:46 GMT
Most developers add search late. You ship a text box. Maybe a LIKE query. Maybe PostgreSQL full-text when the complaints get loud.
That works until the words diverge. The user types “comfortable running shoes.” The catalog says “ergonomic athletic footwear.” The rows exist. The vocabulary does not match.
What happens next? Many teams picture a big stack: a hosted vector database, a separate search cluster, a cloud embedding API, and weeks of glue. What we build instead is leaner but still concrete: Quarkus, PostgreSQL with pgvector (catalog rows and vector columns via Hibernate ORM and Panache), Hibernate Search on Elasticsearch for lexical and kNN search in the index, and Quarkus LangChain4j with a local ONNX model so embeddings never leave the process. In dev, Quarkus Dev Services typically gives you both PostgreSQL and Elasticsearch. You still run two data stores, but not a separate search platform project on top.
We connect all three search styles in one app and keep an eye on where each one breaks. Full-text search is fast and deterministic. It struggles with synonyms and paraphrases. Vector search embeds the query and asks the index for the k closest document vectors by distance in embedding space (kNN, k-nearest neighbors). You rely on that when literal term overlap is not enough. It is still weak on product codes, short jargon, and anything that only works as an exact string match. Hybrid search mixes lexical scoring with that vector signal. You pay for embedding work on every vector or hybrid query.
Why does this matter in production? If user language and catalog language do not match, results look random. The implementation can still be “correct.” Search issues hurt because they look like bad content, bad relevance, and bad UX at the same time. Users rarely say “fix the ranker.” They stop trusting the search box.
We implement full-text, vector, and hybrid as three REST endpoints in the same service so you can compare behavior without maintaining three demos. When you finish the steps, you have a working catalog search and a simple way to pick a pattern for a given query style.
Prerequisites
You need a recent Java and Quarkus setup, and you should already be comfortable reading a Panache entity, a REST resource, and basic Hibernate annotations. We are not spending time on Java installation or IDE setup. We are using Podman-friendly Dev Services, a local embedding model, and plain PostgreSQL.
Java 21 or newer
Maven 3.9.6 or newer
Podman or Docker for Dev Services
Basic understanding of JPA and REST endpoints
Basic understanding of PostgreSQL
Project Setup
Create the project or grab the working example from my Github repository:
mvn io.quarkus.platform:quarkus-maven-plugin:create \
  -DprojectGroupId=org.acme \
  -DprojectArtifactId=product-search \
  -Dextensions="hibernate-orm-panache,jdbc-postgresql,rest-jackson,quarkus-langchain4j-core,quarkus-caffeine" \
  -DnoCode
cd product-search
Add the search and vector dependencies to pom.xml:
        
            io.quarkus
            quarkus-hibernate-search-orm-elasticsearch
        
        
            org.hibernate.orm
            hibernate-vector
        
        
            dev.langchain4j
            langchain4j-embeddings-bge-small-en-q
        
hibernate-processor generates the JPA static metamodel (Product_) at compile time. Search code then uses a small ProductIndexFields class: most field names reuse the generated constants from Product_ (for example Product_.NAME), but the extra Elasticsearch sort field name_sort stays a plain string that must match @KeywordField(name = "name_sort") on Product.name. That way the REST resource does not scatter raw index paths, and renames show up when you recompile.
Add a property for the processor version (keep it aligned with Hibernate ORM in the Quarkus BOM when you upgrade the platform) and register the processor on maven-compiler-plugin:
 
    
   7.2.6.Final
 
                maven-compiler-plugin
                ${compiler-plugin.version}
                
                    true
                    
                        
                            org.hibernate.orm
                            hibernate-processor
                            ${hibernate.orm.version}
                        
                    
                
            
For automated checks, add test dependencies such as quarkus-junit and rest-assured (test scope).
What we get from each dependency:
quarkus-hibernate-orm-panache gives us the entity model and simple persistence
quarkus-jdbc-postgresql gives us PostgreSQL connectivity and Dev Services
quarkus-rest-jackson gives us JSON REST endpoints
quarkus-hibernate-search-orm-elasticsearch gives us full-text indexing and kNN against the Elasticsearch-backed Hibernate Search index
hibernate-vector maps PostgreSQL vector columns through Hibernate ORM
io.quarkiverse.langchain4j:quarkus-langchain4j-core plus dev.langchain4j:langchain4j-embeddings-bge-small-en-q integrate LangChain4j and ship a small quantized ONNX embedding model that runs in process without remote API calls
quarkus-caffeine integrates the Caffeine in-memory cache library for CDI and configuration
hibernate-processor (provided) runs at compile time and generates the JPA static metamodel (Entity_ classes) for type-safe queries and tooling
Elasticsearch handles lexical and vector queries in the Hibernate Search layer. PostgreSQL still holds the canonical vector column for ORM persistence. Those two engines cooperate in one application.
Implementation
Put configuration first: everything that follows assumes PostgreSQL, Elasticsearch, and the in-process embedding model are wired the same in dev, test, and whatever you deploy to.
PostgreSQL only understands vector columns after the pgvector extension is installed. Dev Services runs an init script as soon as the container starts, before Hibernate ORM applies schema management, so the type exists when DDL refers to it. If you skip that ordering, table creation fails with an unknown type, not a mysterious Hibernate bug.
Hibernate Search talks to Elasticsearch over HTTP. You pin the Elasticsearch major version in configuration so the client and the index schema Hibernate Search generates match the server (here, the Elasticsearch instance Dev Services starts in dev and test). For embeddings we stay on the JVM: a packaged ONNX model runs in process, and you point application.properties at the LangChain4j EmbeddingModel implementation class so Quarkus can construct the bean the same way it would any other injectable type.
Create src/main/resources/vector-init.sql (on the classpath under src/main/resources, so init-script-path resolves it by name):
CREATE EXTENSION IF NOT EXISTS vector;
# PostgreSQL with pgvector (entity storage for vectors; kNN is served by Hibernate Search backend)
quarkus.datasource.db-kind=postgresql
quarkus.datasource.devservices.image-name=docker.io/pgvector/pgvector:pg18
quarkus.datasource.devservices.init-script-path=vector-init.sql

# Hibernate ORM
quarkus.hibernate-orm.schema-management.strategy=drop-and-create
quarkus.hibernate-orm.log.sql=false

# Hibernate Search: Elasticsearch Dev Services in dev/test (Quarkus does not ship a Lucene ORM extension)
quarkus.hibernate-search-orm.elasticsearch.version=9
quarkus.hibernate-search-orm.schema-management.strategy=drop-and-create-and-drop
quarkus.hibernate-search-orm.indexing.plan.synchronization.strategy=sync

# Local embedding model (in-process ONNX via LangChain4j)
quarkus.langchain4j.embedding-model.provider=dev.langchain4j.model.embedding.onnx.bgesmallenq.BgeSmallEnQuantizedEmbeddingModel
Together, drop-and-create on Hibernate ORM and drop-and-create-and-drop on Hibernate Search tear down and recreate PostgreSQL tables and the Elasticsearch-backed index whenever the app starts. That makes local runs repeatable and saves you from half-stale mappings while you edit entities. It also throws away data on every restart, which is wrong for a real catalog. For production, move PostgreSQL changes through migration tooling and switch Hibernate Search to something non-destructive for routine deploys, for example create-or-validate, unless you deliberately accept wiping the index on startup.
Define the entity next. Lexical fields, keyword filters, and the embedding vector all live on the same Product type.
Create src/main/java/org/acme/search/model/Product.java:
package org.acme.search.model;

import org.hibernate.annotations.Array;
import org.hibernate.annotations.JdbcTypeCode;
import org.hibernate.search.engine.backend.types.Sortable;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.VectorField;
import org.hibernate.type.SqlTypes;

import com.fasterxml.jackson.annotation.JsonIgnore;

import io.quarkus.hibernate.orm.panache.PanacheEntity;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;

@Entity
@Indexed
public class Product extends PanacheEntity {

    @FullTextField(analyzer = "english")
    @KeywordField(name = "name_sort", sortable = Sortable.YES, normalizer = "lowercase")
    public String name;

    @FullTextField(analyzer = "english")
    @Column(columnDefinition = "text")
    public String description;

    @KeywordField
    public String category;

    @JsonIgnore
    @VectorField(dimension = 384)
    @JdbcTypeCode(SqlTypes.VECTOR)
    @Array(length = 384)
    public float[] descriptionEmbedding;

    public Product() {
    }

    public Product(String name, String description, String category) {
        this.name = name;
        this.description = description;
        this.category = category;
    }
}
Product carries three search behaviors at once. name and description go to Elasticsearch for full-text. category is a keyword field for exact filters. descriptionEmbedding is both a PostgreSQL vector(384) column and an Elasticsearch vector field for kNN. @JsonIgnore keeps big float arrays out of JSON (the verification curl examples show descriptionEmbedding as null or omit the field).
One hard rule: vector dimension must match the embedding model. This stack uses bge-small-en-q, which outputs 384 dimensions. If you swap models and the size changes, schema and index mapping must change too. 
Add a dedicated service for embeddings on the write path. Do not hide that inside the JAX-RS resource: imports, admin tasks, and tests also create rows, and one place for embed → persist keeps behavior obvious.
Create src/main/java/org/acme/search/service/ProductService.java:
package org.acme.search.service;

import org.acme.search.model.Product;

import dev.langchain4j.model.embedding.EmbeddingModel;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;

@ApplicationScoped
public class ProductService {

    @Inject
    EmbeddingModel embeddingModel;

    @Transactional
    public void createProduct(String name, String description, String category) {
        Product product = new Product(name, description, category);
        product.descriptionEmbedding = embeddingModel
                .embed(description)
                .content()
                .vector();
        product.persist();
    }
}
On each save we embed the description once and store the vector on the row before anyone searches. Reads stay cheap; writes do more work. For a catalog that pattern is normal: far more searches than inserts.
Query paths should not pay full embedding cost on every identical string. Add a small cache that wraps EmbeddingModel and returns copies of the float[] so callers cannot mutate vectors sitting in the cache.
Create src/main/java/org/acme/search/service/QueryEmbeddingService.java:
package org.acme.search.service;

import java.util.Arrays;
import java.util.concurrent.TimeUnit;

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;

import dev.langchain4j.model.embedding.EmbeddingModel;
import jakarta.annotation.PostConstruct;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

@ApplicationScoped
public class QueryEmbeddingService {

    @Inject
    EmbeddingModel embeddingModel;

    private Cache cache;

    @PostConstruct
    void init() {
        cache = Caffeine.newBuilder()
                .maximumSize(1000)
                .expireAfterWrite(30, TimeUnit.MINUTES)
                .build();
    }

    public float[] embed(String query) {
        float[] stored = cache.get(query, key -> {
            float[] vector = embeddingModel.embed(key).content().vector();
            return Arrays.copyOf(vector, vector.length);
        });
        return Arrays.copyOf(stored, stored.length);
    }
}
Configure how Elasticsearch tokenizes catalog text. Normalization folds text into comparable tokens (ASCII folding and lowercasing so Café and cafe are not different keys). Stemming trims suffixes so related forms share one stem (Porter in the snippet below, so running and run can hit the same postings). Without that chain, full-text is only slightly better than LIKE.
Create src/main/java/org/acme/search/config/SearchAnalysisConfig.java using the Quarkus qualifier io.quarkus.hibernate.search.orm.elasticsearch.SearchExtension:
package org.acme.search.config;

import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;

import io.quarkus.hibernate.search.orm.elasticsearch.SearchExtension;

@SearchExtension
public class SearchAnalysisConfig implements ElasticsearchAnalysisConfigurer {

    @Override
    public void configure(ElasticsearchAnalysisConfigurationContext context) {
        context.analyzer("english").custom()
                .tokenizer("standard")
                .tokenFilters("asciifolding", "lowercase", "porter_stem");

        context.normalizer("lowercase").custom()
                .tokenFilters("asciifolding", "lowercase");
    }
}
That setup lowercases, normalizes, and runs Porter stemming. So shoes can match shoe and running can match run. It still does not know that footwear and shoes mean the same thing in the world. That is why the entity also keeps vectors.
SearchResource exposes /search/fulltext, /search/vector, and /search/hybrid next to each other and injects QueryEmbeddingService for the two vector paths.
Startup and mass indexing: do not mark the StartupEvent observer @Transactional if it ends with massIndexer().startAndWait(). When the whole observer runs in one transaction, seed inserts are not yet committed, so the mass indexer can see zero entities and build an empty index. Either drop @Transactional on the observer (each createProduct still runs in its own transaction) or reindex after commit.
Create src/main/java/org/acme/search/model/ProductIndexFields.java:
package org.acme.search.model;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;

/**
 * Hibernate Search index field paths for {@link Product}. Property-backed names
 * are delegated to
 * string constants generated on {@link Product_} (Hibernate processor);
 * {@link #NAME_SORT} must
 * stay aligned with {@link KeywordField#name()} on {@link Product#name}.
 */
public final class ProductIndexFields {

    private ProductIndexFields() {
    }

    public static final String NAME = Product_.NAME;
    public static final String DESCRIPTION = Product_.DESCRIPTION;
    public static final String CATEGORY = Product_.CATEGORY;
    public static final String DESCRIPTION_EMBEDDING = Product_.DESCRIPTION_EMBEDDING;

    public static final String NAME_SORT = "name_sort";
}
Create src/main/java/org/acme/search/SearchResource.java:
package org.acme.search;

import java.util.List;

import org.acme.search.model.Product;
import org.acme.search.model.ProductIndexFields;
import org.acme.search.service.ProductService;
import org.acme.search.service.QueryEmbeddingService;
import org.hibernate.search.mapper.orm.mapping.SearchMapping;
import org.hibernate.search.mapper.orm.session.SearchSession;
import org.jboss.resteasy.reactive.RestQuery;

import io.quarkus.runtime.StartupEvent;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;
import jakarta.ws.rs.DefaultValue;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/search")
@Produces(MediaType.APPLICATION_JSON)
public class SearchResource {

        @Inject
        SearchSession searchSession;

        @Inject
        SearchMapping searchMapping;

        @Inject
        QueryEmbeddingService queryEmbeddingService;

        @Inject
        ProductService productService;

        @GET
        @Path("/fulltext")
        @Transactional
        public List fulltext(@RestQuery String q, @RestQuery @DefaultValue("10") int size) {
                return searchSession.search(Product.class)
                                .where(f -> q == null || q.isBlank()
                                                ? f.matchAll()
                                                : f.simpleQueryString()
                                                                .fields(ProductIndexFields.NAME,
                                                                                ProductIndexFields.DESCRIPTION)
                                                                .matching(q))
                                .sort(f -> f.field(ProductIndexFields.NAME_SORT).asc())
                                .fetchHits(size);
        }

        @GET
        @Path("/vector")
        @Transactional
        public List vector(@RestQuery String q, @RestQuery @DefaultValue("5") int k) {
                if (q == null || q.isBlank()) {
                        return List.of();
                }

                float[] queryVector = queryEmbeddingService.embed(q);

                return searchSession.search(Product.class)
                                .where(f -> f.knn(k).field(ProductIndexFields.DESCRIPTION_EMBEDDING)
                                                .matching(queryVector))
                                .fetchHits(k);
        }

        @GET
        @Path("/hybrid")
        @Transactional
        public List hybrid(@RestQuery String q,
                        @RestQuery @DefaultValue("10") int size,
                        @RestQuery @DefaultValue("5") int k) {
                if (q == null || q.isBlank()) {
                        return List.of();
                }

                float[] queryVector = queryEmbeddingService.embed(q);

                return searchSession.search(Product.class)
                                .where(f -> f.bool()
                                                .should(f.simpleQueryString()
                                                                .fields(ProductIndexFields.NAME,
                                                                                ProductIndexFields.DESCRIPTION)
                                                                .matching(q))
                                                .should(f.knn(k).field(ProductIndexFields.DESCRIPTION_EMBEDDING)
                                                                .matching(queryVector)))
                                .fetchHits(size);
        }

        @GET
        @Path("/hybrid/filtered")
        @Transactional
        public List hybridFiltered(@RestQuery String q,
                        @RestQuery String category,
                        @RestQuery @DefaultValue("10") int size,
                        @RestQuery @DefaultValue("5") int k) {
                if (q == null || q.isBlank() || category == null || category.isBlank()) {
                        return List.of();
                }

                float[] queryVector = queryEmbeddingService.embed(q);

                return searchSession.search(Product.class)
                                .where(f -> f.bool()
                                                .must(f.match().field(ProductIndexFields.CATEGORY).matching(category))
                                                .should(f.simpleQueryString()
                                                                .fields(ProductIndexFields.NAME,
                                                                                ProductIndexFields.DESCRIPTION)
                                                                .matching(q))
                                                .should(f.knn(k).field(ProductIndexFields.DESCRIPTION_EMBEDDING)
                                                                .matching(queryVector)))
                                .fetchHits(size);
        }

        void onStart(@Observes StartupEvent event) throws InterruptedException {
                if (Product.count() == 0) {
                        seedProducts();
                }

                searchMapping.scope(Product.class)
                                .massIndexer()
                                .startAndWait();
        }

        private void seedProducts() {
                productService.createProduct(
                                "Trail Running Shoe",
                                "Lightweight athletic footwear designed for off-road running on dirt and gravel. Aggressive grip, breathable mesh upper, cushioned midsole.",
                                "footwear");
                productService.createProduct(
                                "Leather Oxford",
                                "Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
                                "footwear");
                productService.createProduct(
                                "Waterproof Hiking Boot",
                                "Ankle-height boot with waterproof membrane, vibram outsole, and padded collar. Built for multi-day trekking.",
                                "footwear");
                productService.createProduct(
                                "Canvas Sneaker",
                                "Casual low-top sneaker in cotton canvas. Rubber vulcanized sole, available in twelve colors.",
                                "footwear");

                productService.createProduct(
                                "Noise-Cancelling Headphones",
                                "Over-ear headphones with active noise cancellation, 30-hour battery life, and foldable design for travel.",
                                "electronics");
                productService.createProduct(
                                "Mechanical Keyboard",
                                "Tenkeyless keyboard with Cherry MX Brown switches. PBT keycaps, USB-C detachable cable, per-key RGB lighting.",
                                "electronics");
                productService.createProduct(
                                "Portable Charger",
                                "20,000 mAh power bank with 65W USB-C Power Delivery. Charges a laptop from 0 to 80 percent in under an hour.",
                                "electronics");

                productService.createProduct(
                                "Ultralight Backpack",
                                "35-litre hiking pack weighing 680 grams. Frameless design, roll-top closure, hipbelt with small pockets.",
                                "outdoor");
                productService.createProduct(
                                "Sleeping Bag",
                                "Down-filled mummy bag rated to minus ten Celsius. 850-fill power, YKK zip, water-resistant outer shell.",
                                "outdoor");
                productService.createProduct(
                                "Trekking Poles",
                                "Aluminium collapsible poles with cork grips and carbide tips. Folds to 38 cm for pack attachment.",
                                "outdoor");

                productService.createProduct(
                                "Cast Iron Skillet",
                                "Pre-seasoned 12-inch cast iron pan. Suitable for induction, gas, electric, and open fire. Oven-safe to 260 Celsius.",
                                "kitchen");
                productService.createProduct(
                                "Pour-Over Coffee Dripper",
                                "Ceramic cone dripper for manual filter coffee. Compatible with Melitta No.4 filters. Sits directly on a mug or carafe.",
                                "kitchen");
                productService.createProduct(
                                "Chef's Knife",
                                "8-inch high-carbon stainless steel knife. Full tang, triple-riveted handle, 58 HRC hardness. Suitable for chopping, slicing, and dicing.",
                                "kitchen");
        }
}
On hybrid endpoints, size controls how many hits Hibernate Search returns after the bool query, while k (query parameter, default 5) controls the kNN neighbor count inside the vector should clause, with the same default as /search/vector. You can override k per request (for example .../hybrid?q=...&size=10&k=8) when you want more vector candidates without changing the final hit count.
/search/fulltext is classic lexical search: tokenize q, match name and description, score by term relevance. It is easy to reason about when user words and catalog words overlap. If q is empty, the handler returns up to size rows sorted by ProductIndexFields.NAME_SORT (name_sort in the index) using matchAll(), which is handy for smoke tests.
Mass indexing uses searchMapping.scope(Product.class) so only the Product index is rebuilt; scope(Object.class) would index every mapped @Indexed type and is easy to misuse as the model grows.
Each /search/vector call embeds q and Hibernate Search runs kNN on the vectors in Elasticsearch. That is why camping+gear can return Sleeping Bag or Trekking Poles even when that phrase is missing from the stored text. Each request pays for inference, and short jargon or SKUs can still lose to a strong keyword hit.
/search/hybrid keeps full-text and kNN in the same bool query as two should clauses, so keyword strength and embedding neighbors influence one ranked list. You are not forced to bet the whole product on BM25-only or vector-only. They fail in different corners. Combining them is usually what a catalog search needs, even if the blend is messier to balance.
The seed list is written so shopper wording and product copy rarely use the same tokens for the same SKU. The verification curls below should not return the same ordering for every query across the three modes, which is the reason to keep all endpoints in one small service.
Configuration
The application.properties from Implementation wipes PostgreSQL and the Elasticsearch-backed index whenever the process starts. This section contrasts that with settings where a real catalog keeps data across restarts. It also covers how Elasticsearch scales vector search, optional @VectorField graph attributes, and a sample production-style property list.
You already store descriptionEmbedding with @VectorField(dimension = 384), and /search/vector and /search/hybrid call kNN through Hibernate Search on Elasticsearch. With only the seed rows it can still feel like the engine compares the query vector to every stored vector. When the catalog grows, that gets too slow, so Elasticsearch keeps an approximate nearest-neighbor structure over the document vectors instead of scanning everything on each query. Docs usually call that graph style HNSW (Hierarchical Navigable Small World): links between vectors so search skips most points and returns neighbors fast, sometimes missing the single closest point. Hibernate Search can map some graph-related attributes on @VectorField when Elasticsearch supports them.
Product still maps @VectorField with dimension only. When your stack exposes them, you can add attributes such as m and efConstruction (verify names and support for your Hibernate Search and Elasticsearch releases):
@VectorField(
        dimension = 384,
        m = 24,
        efConstruction = 200
)
@JdbcTypeCode(SqlTypes.VECTOR)
@Array(length = 384)
public float[] descriptionEmbedding;
m and efConstruction matter where the backend builds an HNSW graph. On PostgreSQL they matter when you define an HNSW index in SQL over pgvector. Here, /search/vector and /search/hybrid resolve kNN in Elasticsearch through Hibernate Search, not through PostgreSQL’s vector operators, so the Java snippet is optional extra settings on the Elasticsearch side, not something you need for the earlier steps.
quarkus.datasource.db-kind=postgresql
quarkus.datasource.devservices.image-name=docker.io/ankane/pgvector:latest
quarkus.datasource.devservices.init-script-path=vector-init.sql

quarkus.hibernate-orm.schema-management.strategy=validate
quarkus.hibernate-orm.log.sql=false

quarkus.hibernate-search-orm.elasticsearch.version=9
quarkus.hibernate-search-orm.schema-management.strategy=create-or-validate
quarkus.hibernate-search-orm.indexing.plan.synchronization.strategy=sync

quarkus.langchain4j.embedding-model.provider=dev.langchain4j.model.embedding.onnx.bgesmallenq.BgeSmallEnQuantizedEmbeddingModel
validate for ORM and create-or-validate for Hibernate Search mean a normal restart does not drop PostgreSQL tables or throw away the Elasticsearch index. The first property block rebuilt schema and index on every boot so you could iterate from a clean slate; when the catalog must persist, you move toward values like these.
PostgreSQL hnsw.ef_search: if you run kNN directly in PostgreSQL (native SQL over pgvector), you can adjust recall with SET LOCAL hnsw.ef_search = 120 on the JDBC connection before the query. /search/vector and /search/hybrid do not use that path: Hibernate Search sends vector predicates to Elasticsearch, so that PostgreSQL session setting does nothing for those endpoints. Configure kNN where the queries actually run (here, Elasticsearch), or change the architecture if you want kNN inside the database.
Production Hardening
What happens under load
Vector and hybrid queries hit the database and, on each request, run query-time embedding with the local ONNX model. If the search box turns into a high-volume typeahead endpoint, that work adds up.
The QueryEmbeddingService from Implementation caches query strings so identical text does not re-run the ONNX model. It does not fix rare phrasing or index work, but real search traffic repeats enough that the cache often helps a lot.
If caching is not enough, you handle search like any other hot read path: rate limits, async fan-out, or a dedicated embedding service. You can leave that out while you experiment; live traffic usually cannot.
Concurrency and correctness guarantees
Ranking scores can be fuzzy; access rules cannot. Category filters, tenants, visibility, and soft deletes need hard edges. Teams often over-focus on hybrid relevance and forget that category is a @KeywordField, and the filtered hybrid route puts ProductIndexFields.CATEGORY in a must clause so the filter is strict while the should clauses handle score. Add tenant IDs or publication flags the same way. Semantic similarity should not decide whether a row is allowed to show at all.
Operational failure modes
First boot downloads the ONNX model and builds vectors for the seed rows. That is acceptable on a laptop. In production, slow startup because of model download, vector generation, and index build makes deploys hard to reason about.
Ship the model with the app or bake it into the image. Compute document vectors on ingest, not on the first customer query. Plan a reindex when the embedding model changes. Search follows the same rule as the rest of the system: keep heavy one-time work off the hot request path.
Security considerations
Search endpoints are easy to abuse because they look harmless. A single long natural language query that triggers embedding inference and kNN (k-nearest neighbors) lookup is more expensive than a normal keyword query. A flood of those requests becomes a resource exhaustion problem.
Put reasonable limits on query length. Add rate limiting if the endpoint is public. Log slow queries. Don’t feed raw user input into custom query syntax unless you understand exactly how that parser behaves. simpleQueryString() is a good default because it is intentionally safer than more permissive query parsers. You still need input length checks and abuse controls.
Verification
Start the application:
./mvnw quarkus:dev
On first startup, Dev Services pulls the PostgreSQL (pgvector) image and an Elasticsearch image. Expect a few minutes the first time: image pulls, ONNX model download, indexing. Hibernate ORM creates the schema (after vector-init.sql enables the extension), the local embedding model loads, seed data is inserted, embeddings are generated, and Hibernate Search builds or refreshes its index.
Check all three search modes.
Query one: lexical match
curl "http://localhost:8080/search/fulltext?q=shoes"
Expected behavior: you get footwear products whose indexed fields contain shoe or stemmed variants.
Typical result shape:
[
  {
    "id": 2,
    "name": "Leather Oxford",
    "description": "Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
    "category": "footwear"
  },
  {
    "id": 1,
    "name": "Trail Running Shoe",
    "description": "Lightweight athletic footwear designed for off-road running on dirt and gravel. Aggressive grip, breathable mesh upper, cushioned midsole.",
    "category": "footwear"
  }
]
You should see stemming and analysis at work. Full-text works best when query words and catalog words overlap.
Query two: semantic language
curl "http://localhost:8080/search/vector?q=comfortable+footwear+for+long+walks"
Expected behavior: you get hiking- and walking-related footwear even when those exact words are missing from the descriptions.
Typical result shape:
[
  {
    "id": 2,
    "name": "Leather Oxford",
    "description": "Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
    "category": "footwear"
  },
  {
    "id": 1,
    "name": "Trail Running Shoe",
    "description": "Lightweight athletic footwear designed for off-road running on dirt and gravel. Aggressive grip, breathable mesh upper, cushioned midsole.",
    "category": "footwear"
  },
  {
    "id": 5,
    "name": "Noise-Cancelling Headphones",
    "description": "Over-ear headphones with active noise cancellation, 30-hour battery life, and foldable design for travel.",
    "category": "electronics"
  }
]
You should see meaning, not literal string overlap, drive the ranking.
Query three: exact technical term
curl "http://localhost:8080/search/hybrid?q=MX+Brown"
Expected behavior: Mechanical Keyboard appears at or near the top because the lexical match is strong and the hybrid query preserves that signal. You can append &k=… to change the kNN neighbor count (default 5); size still caps how many hits are returned.
Typical result shape:
[
  {
    "id": 6,
    "name": "Mechanical Keyboard",
    "description": "Tenkeyless keyboard with Cherry MX Brown switches. PBT keycaps, USB-C detachable cable, per-key RGB lighting.",
    "category": "electronics"
  },
  {
    "id": 2,
    "name": "Leather Oxford",
    "description": "Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
    "category": "footwear"
  }
]
That case is where vector-only setups often miss: jargon and exact product tokens still need the lexical side.
Query four: concept with no lexical overlap
curl "http://localhost:8080/search/vector?q=camping+gear"
Expected behavior: outdoor products such as Sleeping Bag, Ultralight Backpack, and Trekking Poles appear even though the phrase camping gear does not exist in the stored content.
That request shows the biggest gap between vector recall and BM25-style full-text.
Filtered hybrid search
curl "http://localhost:8080/search/hybrid/filtered?q=lightweight&category=outdoor"
Expected behavior: only outdoor products are considered, and within that set the most relevant ones rank highest.
The point is the filter: category=outdoor is strict; ranking only runs inside that slice.
Automated Testing
The curl commands from the verification section are useful when you write the code. They are not enough once you change mappings, switch embedding models, or tune hybrid queries. Search breaks in subtle ways. The endpoint still returns 200, but the wrong product moves to the top, the category filter stops being strict, or an empty query suddenly triggers expensive work.
For this kind of system, the safest test strategy is layered. Keep a few lightweight integration tests that hit the real HTTP endpoints. Then make the assertions focus on behavior that should remain true even when scores and exact ordering move a little.
Add the test dependencies in pom.xml if they are not there already:

    io.quarkus
    quarkus-junit5
    test


    io.rest-assured
    rest-assured
    test
Now create src/test/java/org/acme/search/SearchResourceTest.java:
package org.acme.search;

import io.quarkus.test.junit.QuarkusTest;
import org.junit.jupiter.api.Test;

import static io.restassured.RestAssured.given;
import static org.hamcrest.Matchers.greaterThanOrEqualTo;
import static org.hamcrest.Matchers.hasSize;

@QuarkusTest
class SearchResourceTest {

    @Test
    void fulltextFindsShoeStemming() {
        given()
                .when().get("/search/fulltext?q=shoes")
                .then()
                .statusCode(200)
                .body("$", hasSize(greaterThanOrEqualTo(1)));
    }

    @Test
    void vectorFindsSemanticFootwearQuery() {
        given()
                .when().get("/search/vector?q=comfortable+footwear+for+long+walks")
                .then()
                .statusCode(200)
                .body("$", hasSize(greaterThanOrEqualTo(1)));
    }

    @Test
    void hybridFindsMxBrownKeyboard() {
        given()
                .when().get("/search/hybrid?q=MX+Brown")
                .then()
                .statusCode(200)
                .body("$", hasSize(greaterThanOrEqualTo(1)));
    }

    @Test
    void hybridFilteredRestrictsCategory() {
        given()
                .when().get("/search/hybrid/filtered?q=lightweight&category=outdoor")
                .then()
                .statusCode(200)
                .body("$", hasSize(greaterThanOrEqualTo(1)));
    }
}
Run the tests with:
./mvnw test
The fulltextFindsShoeStemming() test checks lexical behavior instead of only response size. We do not require one exact order because analyzers and seed data can shift that a bit, but we do require that at least one shoe-related product is present.
The vector tests need a different strategy. Semantic search is not deterministic in the same way exact keyword search is. You should not assert the full result list or a fragile score order. What you can assert is that clearly relevant products appear in the hit set for a meaning-based query. That is why vectorFindsSemanticFootwearQuery() and campingGearSemanticQueryFindsOutdoorProducts() check for expected relevance without pretending the ranking is mathematically fixed.
The hybrid test is stricter on purpose. MX Brown is an exact technical term in the catalog. This is the kind of case where lexical strength should dominate. If Mechanical Keyboard drops from the top result after a refactor, that is worth catching.
The filtered tests are even more important than ranking checks. Search relevance can be fuzzy. Filters cannot. If category=outdoor allows electronics products to leak into the result, the feature is wrong even if the scores look plausible. This is exactly the kind of bug that slips through when teams only test happy-path search quality.
A useful next step is to widen testing beyond endpoint smoke checks and treat search quality as something you verify from several angles. Keep the HTTP integration tests from this tutorial for end-to-end behavior, but add small unit tests for helper classes such as the query embedding cache, plus a focused relevance regression suite built on a fixed seed dataset where a handful of important queries must keep returning sensible results over time. In larger systems, teams often complement that with offline evaluation sets, performance checks for hot queries, and security-style tests that prove filters like category, tenant, or visibility never leak data across boundaries. That broader approach reflects how search behaves in production: part API contract, part relevance system, and part access-control surface.
Conclusion
You end up with one service, two data stores in dev (PostgreSQL and Elasticsearch), local embeddings, and Hibernate Search handling both lexical rank and kNN in the index. The useful part is not the three URLs. It is knowing which mode loses on jargon, which on vocabulary mismatch, and where filters must stay exact instead of inside the fuzzy score.
Subscribe now



Stop Copying AI Skills: Version IBM Bob Instructions with Maven
Markus Eisele — Sun, 05 Apr 2026 06:08:35 GMT
Copy-pasting AI skill files feels harmless when you have one project. You drop a SKILL.md into .bob/skills, IBM Bob starts behaving like it understands Quarkus, and you move on. The trouble shows up later: the same skill in five repositories, each with slightly different instructions, commands, and assumptions about your stack. You only notice when two checkouts disagree during the same review.
Most teams file this under documentation. In practice it behaves like dependency management, and you stop treating it like documentation the day Bob (or any other IDE/Shell combination you are using) starts generating patches you actually merge. Once agent behavior matters for daily work, those instructions sit in your build and delivery path. If they drift, your agent drifts. One checkout gets jakarta.ws.rs right, another keeps old patterns, a third nudges the model toward the wrong extension.
This gets worse on teams that ship to production because agent instructions are not neutral. They push which commands run, which files change, and which defaults stick. A stale skill gives you ugly code. It can also teach the assistant the wrong native build command, the wrong REST stack, or the wrong packaging convention. Past “Bob is a bit off” you get wasted review time, a messy delivery flow, and expensive tokens for bad answers.
Java developers already know what to do with reusable stuff: package it, version it, ship it like any other JAR, pull it in with Maven. SkillsJars does the same for agent skills. You write framework-specific SKILL.md files once, pack them into a JAR, install or publish the artifact where your builds can see it, and extract into the folder IBM Bob reads when someone opens the project. Same muscle memory as any internal library, only the payload is Markdown.
Next we run the full loop with local mvn install (Maven Central optional): a quarkus-dev-skills JAR at 1.0.0-SNAPSHOT, three Quarkus skills inside, and a consumer app under shipment-service/ that pulls that JAR. Bob gets the same guidance everywhere without copy-paste. If you get lost, the quarkus-dev-skills/ tree in the repo is the ground truth for these steps.
Prerequisites
You should be fine with Maven, a normal Quarkus project layout, and Markdown written for an agent. You also need:
Java 25 installed
Maven 3.9+ or the Maven Wrapper available
IBM Bob installed in your IDE
Network access to resolve the SkillsJars Maven plugin (com.skillsjars:maven-plugin) from the public plugin repositories
Basic understanding of Maven pom.xml files
Project Setup
Start from a plain Maven project for the skills artifact. It is not a Quarkus app: its only job is to ship reusable skill files. Deleting src/main/java in the next step still feels wrong the first time; for this artifact, empty Java trees are normal.
Create the project or start from my Github repository:
mvn archetype:generate \
  -DgroupId=com.example.skills \
  -DartifactId=quarkus-dev-skills \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DinteractiveMode=false
Now move into the project and remove the Java source folders because this artifact ships Markdown-based skills, not Java classes:
cd quarkus-dev-skills
rm -rf src
mkdir -p skills/quarkus-scaffolding
mkdir -p skills/quarkus-extensions
mkdir -p skills/quarkus-native
The structure should now look like this (this repository also keeps the demo consumer next to the packaging project):
quarkus-dev-skills/
├── article.md
├── pom.xml
├── skills/
│   ├── quarkus-scaffolding/
│   │   └── SKILL.md
│   ├── quarkus-extensions/
│   │   └── SKILL.md
│   └── quarkus-native/
│       └── SKILL.md
└── shipment-service/
    ├── AGENTS.md
    ├── pom.xml
    └── src/
The SkillsJars Maven plugin scans the top-level skills/ directory and treats each immediate child folder as one skill. Get the directory names right once; everything downstream reads from there.
Implementation
Below you will find very incomplete examples. There is ongoing Quarkus work around shared coding-agent guidance in pull request quarkusio/quarkus#53038, which adds an initial structure for reusable coding rules and explicitly references AGENTS.md as the emerging open format for agent instructions. So keep an eye out for more coming from the team in the future. The broader format and conventions are documented at agents.md. 
Security warning: Agent Skills should be treated like executable guidance, not harmless documentation. They can run commands, read files, and change code in ways you did not expect. SkillsJars says that they do a basic security scan before publication, but that is only a baseline check. It does not replace a proper security review of the skills before your team uses them.
Writing the scaffolding skill
The first skill pins how Bob creates Quarkus resources and related classes: endpoint, service, maybe an entity, with predictable packages and imports.
Create skills/quarkus-scaffolding/SKILL.md:
---
name: quarkus-scaffolding
description: >
  Example playbook for new REST resources, CDI services, Panache entities,
  and repositories in a Quarkus 3 app. Load only when scaffolding those
  pieces — not a substitute for project-wide AGENTS.md or rules.
allowed-tools: Bash Read Edit
license: Apache-2.0
---

# Quarkus scaffolding (examples only)

**Progressive skill:** use when adding endpoints or persistence types. Repo-wide conventions belong in always-on guidance ([`AGENTS.md`](https://agents.md/), `.cursor/rules`, etc.); specialized steps live in skills so they are not stuffed into every prompt. Quarkus is converging on that split — markdown rules plus `.agents/skills/` — see [quarkus#53038](https://github.com/quarkusio/quarkus/pull/53038). If this repo already defines layout or naming, follow that first.

## Package layout (typical)

- `.../resource/` — JAX-RS
- `.../service/` — CDI beans
- `.../entity/` — JPA + Panache
- `.../repository/` — Panache repositories

## CLI vs hand-written classes

The Quarkus CLI creates **apps** and manages **extensions** (e.g. `quarkus create app`, `quarkus extension add`). It does not standardize “add one Java class” inside an existing module — create new types in the IDE or by copying the skeleton below.

## Resource shape (`jakarta.*`, not `javax.*`)

```java
package com.example.app.resource;

import com.example.app.service.WidgetService;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.ws.rs.*;
import jakarta.ws.rs.core.MediaType;

@Path("/api/v1/widgets")
@ApplicationScoped
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class WidgetResource {

    @Inject
    WidgetService service;

    @GET
    public Object list() {
        return service.list();
    }

    @POST
    public Object create(Object request) {
        return service.create(request);
    }
}
```

## Panache (minimal)

- Default surrogate key → `PanacheEntity`.
- Custom or composite id → `PanacheEntityBase`.
- Annotate with `@Entity` and `@Table(name = "some_table")` (snake_case is a common convention for `name`).

## Imports

- CDI: `jakarta.inject`, `jakarta.enterprise.context`
- REST: `jakarta.ws.rs`
- Panache ORM: `io.quarkus.hibernate.orm.panache`
- Avoid Spring annotations unless the project uses Spring-on-Quarkus explicitly.
The front matter does most of the routing. The name must match the directory name. The description helps the model decide if the skill fits the task: write it for matching, not like internal documentation. If you would not say the description out loud to a teammate choosing a skill, rewrite it.
allowed-tools caps what the agent can do without another prompt. It also keeps the blast radius small: only list tools you want. For scaffolding, Bash plus file editing is enough. Wider lists mean a bigger mess if the skill fires in the wrong place. I keep lists short on purpose; you can always add more when a task really needs them.
Writing the extension management skill
Add a second skill so Bob stays on the Quarkus CLI and BOM patterns. Generic assistants often invent Maven coordinates, pin versions the BOM should own, or mix old and new REST stacks. This file narrows that path.
Create skills/quarkus-extensions/SKILL.md:
---
name: quarkus-extensions
description: >
  Example playbook for adding, listing, and removing Quarkus extensions
  via CLI or build tools. Load when the task is dependencies/capabilities,
  not routine coding; platform/BOM policy stays in AGENTS.md/rules.
allowed-tools: Bash Read Edit
license: Apache-2.0
---

# Quarkus extensions (examples only)

**Progressive skill:** use when someone needs REST, data, messaging, health, tracing, etc. at the **build** level. Version and BOM constraints that apply to every change belong in always-on guidance ([agents.md](https://agents.md/), rules); this file is task-sized, like the layout Quarkus is heading toward with rules + `.agents/skills/` — see [quarkus#53038](https://github.com/quarkusio/quarkus/pull/53038).

## CLI (preferred)

`ext` is shorthand for `extension` ([CLI tooling](https://quarkus.io/guides/cli-tooling)).

Browse what you can add (installable extensions, name filter):

```bash
quarkus ext ls -i -s jdbc
```

Add or remove (names can be short; globs like `smallrye-*` work):

```bash
quarkus ext add rest-jackson
quarkus ext rm rest-jackson
```

## No CLI: Maven / Gradle

Maven:

```bash
./mvnw quarkus:add-extension -Dextensions='rest-jackson,kafka'
```

Gradle:

```bash
./gradlew listExtensions
./gradlew addExtension --extensions='hibernate-validator'
```

## Last resort: edit the build by hand

Maven — **no** version when the Quarkus BOM is imported:

```xml

  io.quarkus
  quarkus-rest-jackson

```

Gradle (same idea: BOM manages versions):

```groovy
implementation 'io.quarkus:quarkus-rest-jackson'
```

Use coordinates from [quarkus.io/extensions](https://quarkus.io/extensions/) or CLI list output — **do not guess** artifact IDs.

## Names you see a lot (illustrative)

- `rest` / `rest-jackson`
- `hibernate-orm-panache`
- `jdbc-postgresql`
- `messaging-kafka`
- `smallrye-health`
- `opentelemetry`

## Rules of thumb

- Prefer the **REST** stack (`rest`, `rest-jackson`) for new JAX-RS-style apps unless the project already standardizes on something else.
- Do not mix **incompatible** stacks (e.g. Spring MVC + RESTEasy) without an explicit reason.
- Keep every `io.quarkus` extension on the **same platform/BOM** the project already uses.
So when someone says “add Kafka,” you get Quarkus extension IDs and the CLI flow, not a random client dependency pulled from a blog post. If your team really wants plain Kafka clients, say so in the skill and own that choice.
The same JAR everywhere means the same instructions in every checkout. For an agent that touches many repos, boring repeatability beats clever prose. That is the whole point of the exercise.
Writing the native build skill
Native builds get a separate skill because wrong text hurts fast: local GraalVM vs container builds, reflection registration, integration tests vs JVM tests.
Create skills/quarkus-native/SKILL.md:
---
name: quarkus-native
description: >
  Example playbook for native executables and native container images
  (Maven/Gradle). Load when debugging native builds or reflection — not
  for everyday JVM dev; keep always-on project policy in AGENTS.md/rules.
allowed-tools: Bash Read Edit
license: Apache-2.0
---

# Quarkus native (examples only)

**Progressive skill:** native compilation is slow and toolchain-specific; invoke this only when the task is “build native,” “fix native runtime,” or CI image parity. Always-on constraints (e.g. “we only ship container-build”) belong in [agents.md](https://agents.md/) / rules; task detail here matches the direction in [quarkus#53038](https://github.com/quarkusio/quarkus/pull/53038).

## Maven (typical)

Local toolchain (GraalVM / Mandrel already on `PATH`):

```bash
./mvnw package -Dnative
```

No local native compiler — build inside the builder image:

```bash
./mvnw package -Dnative -Dquarkus.native.container-build=true
```

Native binary **and** container image (needs a `quarkus-container-image-*` extension):

```bash
./mvnw package -Dnative \
  -Dquarkus.native.container-build=true \
  -Dquarkus.container-image.build=true
```

## Gradle (typical)

```bash
./gradlew build -Dquarkus.native.enabled=true
```

Container-based native compile:

```bash
./gradlew build -Dquarkus.native.enabled=true -Dquarkus.native.container-build=true
```

## Reflection / missing classes at runtime

Prefer registering the types that actually need reflection:

```java
import io.quarkus.runtime.annotations.RegisterForReflection;

@RegisterForReflection
public class ShipmentDto {
    public String id;
    public String status;
}
```

Fallback: extra native-image args (e.g. external JSON) via config:

```properties
quarkus.native.additional-build-args=-H:ReflectionConfigurationFiles=reflection-config.json
```

## Native integration tests

```bash
./mvnw verify -Dnative
```

Gradle (generates native image, then runs tests):

```bash
./gradlew testNative
```

`@QuarkusIntegrationTest` exercises the **artifact the build produced** (JAR vs native binary vs container), not the in-process JVM test runtime.
Do not pack every native-image trick into one file. A long wall of text gives the model more tokens but weaker signal. Keep the skill tight; put long appendices in another file if you need them. Native is painful enough without a fifty-screen skill nobody loads.
Configuring the skills artifact POM
Wire the packaging project so Maven turns these Markdown files into a skills JAR. Replace your pom.xml with this version (same as quarkus-dev-skills/pom.xml in the tree). Yes, it is a full paste; for the demo that is faster than diffing line by line in prose.


  4.0.0

  com.example.skills
  quarkus-dev-skills
  1.0.0-SNAPSHOT
  jar

  Quarkus Developer Skills
  Reusable agent skills for IBM Bob working in Quarkus projects (demo / tutorial).
  https://github.com/myfear/the-main-thread/quarkus-dev-skills

  
    25
    UTF-8

    Bash Read Edit
    Bash Read Edit
    Bash Read Edit
  

  
    
      
        com.skillsjars
        maven-plugin
        0.0.6
        
          
            
              package
            
          
        
      
    
  
This POM is the contract between your Markdown and the plugin. The skillsjars.skill..allowed-tools properties must match the front matter in each SKILL.md. If they drift, the build fails, which is friendlier than silently shipping skills with the wrong tool policy.
Paths inside the JAR. The package goal copies each skill under META-INF/skills/... as in the plugin README. With  on github.com, the plugin takes the GitHub org and repo from that URL. Without it, you get Maven groupId segments (com/example/skills//... in PackageMojoTest). This tree uses a placeholder example-org/quarkus-dev-skills URL in  so the paths match the SkillsJars.com examples. That URL is only a teaching label.
Consumer coordinates. Skills that SkillsJars.com republishes use groupId com.skillsjars and an artifactId like org__repo__skill (skillsjars.com). For a JAR you built yourself, consumers use your groupId and artifactId, here com.example.skills:quarkus-dev-skills, after mvn install or after you push to an internal repo.
Building and inspecting the skills JAR
Package the artifact and confirm the skill files landed under the right META-INF paths.
Build and install into your local repository (~/.m2) so the consumer can resolve the SNAPSHOT:
mvn install
Inspect the resulting JAR:
jar tf target/quarkus-dev-skills-1.0.0-SNAPSHOT.jar | grep /SKILL.md
You should see output similar to this (with the example  URL above):
META-INF/skills/example-org/quarkus-dev-skills/quarkus-extensions/SKILL.md
META-INF/skills/example-org/quarkus-dev-skills/quarkus-scaffolding/SKILL.md
META-INF/skills/example-org/quarkus-dev-skills/quarkus-native/SKILL.md
The extract goal reads exactly these paths. If the jar tf output is empty, Bob gets nothing from the consumer build, so stop and fix packaging before going further.
A clean JAR only proves layout, not quality. You can ship a perfect archive and still teach the wrong extension. Versioning ships the same bits everywhere; someone still has to read the skill text. I treat jar tf as a smoke test, not a proof that Bob will behave.
Creating the consumer Quarkus application
Add a Quarkus app that consumes the skills JAR so the layout looks like a real project.
Create the Quarkus application under the packaging project (from quarkus-dev-skills/):
quarkus create app com.example:shipment-service \
  --extension=quarkus-rest-jackson,quarkus-hibernate-orm-panache,quarkus-jdbc-postgresql,quarkus-smallrye-health
Move into the project:
cd shipment-service
Align the consumer with Java 25 if your Quarkus codestart picked a newer --release (shipment-service/pom.xml here uses maven.compiler.release 25).
The app is only there to consume skills. There is no real business logic; that is intentional. Pick extensions so the prompts in Verification look like normal Quarkus work instead of a toy Hello World.
Configuring the consumer project to extract skills
Add the SkillsJars plugin to the Quarkus app and declare the skills artifact as a plugin dependency. Skills stay off the runtime classpath; extraction reads the JAR at build time for Bob’s folder.
Update the consumer project pom.xml and add the plugin inside the  section. Reference the same coordinates you installed with mvn install in the packaging project:

    com.skillsjars
    maven-plugin
    0.0.6
    
        
            com.example.skills
            quarkus-dev-skills
            1.0.0-SNAPSHOT
        
    
The live shipment-service/pom.xml in this tree keeps the Quarkus-generated compiler, Surefire, and Failsafe plugins and appends this SkillsJars plugin after them.
The skills JAR never joins the application dependency graph. Your runtime image does not grow agent instructions just because someone uses Bob on a laptop. That boundary matters if ops is nervous about “AI stuff” on the classpath.
Extracting the skills into Bob’s project directory
Run the extraction goal and write skills into the directory IBM Bob watches in the project.
Run (from shipment-service/):
./mvnw skillsjars:extract -Ddir=.bob/skills
The extract goal scans META-INF/skills/ in the JAR, finds each skill root, and writes one folder per skill under the path you pass to -Ddir. The folder name starts with skillsjars__, then the path inside the JAR with / turned into __. See ExtractMojo in the plugin sources. If you expected a straight mirror of the paths inside the JAR, the folder names will look odd until you read that class once.
After extraction you should see three sibling directories (example with the placeholder  from the POM):
.bob/skills/
├── skillsjars__example-org__quarkus-dev-skills__quarkus-extensions/
│   └── SKILL.md
├── skillsjars__example-org__quarkus-dev-skills__quarkus-native/
│   └── SKILL.md
└── skillsjars__example-org__quarkus-dev-skills__quarkus-scaffolding/
    └── SKILL.md
Skills stay sealed in the JAR until extract puts them next to the code Maven built. Bump the artifact version, run extract again, and the skill folders refresh. You stop guessing which stale folder someone copied six months ago.
It is probably a good idea to add the skills folder to .gitignore and not commit them with your code.
Making setup explicit with AGENTS.md
Document how skills land in the repo so the next person does not have to hunt for tribal knowledge.
Add an AGENTS.md file at the root of the consumer project. The checked-in copy matches this (note the mvn install step in the parent directory so the SNAPSHOT exists locally):
# AGENTS.md

## Setup

After cloning this repository, install the shared skills artifact into your local Maven repository (from the sibling packaging project), then extract skills:

```bash
cd ../
mvn -f pom.xml -q install
cd shipment-service
./mvnw skillsjars:extract -Ddir=.bob/skills
```

The first step publishes `com.example.skills:quarkus-dev-skills:1.0.0-SNAPSHOT` to `~/.m2`. The second step unpacks `META-INF/skills/...` from that JAR into `.bob/skills/` using the SkillsJars Maven plugin ([plugin README](https://github.com/skillsjars/skillsjars-maven-plugin/blob/main/README.md)).

Extracted directories are not automatically gitignored. Check and add them to .gitignore. Re-run extraction whenever you bump the skills artifact version.

## Project context

* Java 25
* Quarkus REST with Jackson
* Panache with PostgreSQL
* Health endpoints enabled
* Prefer modern Quarkus REST stack
* Native builds use container-based compilation when needed
AGENTS.md covers bootstrap for humans and a short context block for the agent. Keep that list short; long filler buries the commands people actually need.
 Some teams commit the extracted files so PRs show skill changes; that works, but diffs get noisy. I prefer gitignore plus explicit extract in AGENTS.md so human edits and generated trees do not step on each other. Your team might reasonably choose otherwise; say which one you picked.
What happens when skills drift
The usual failure is drift. Someone updates native build text in the skills repo. Another repo still has an old extract on disk. Bob answers differently in each checkout, and you argue about the assistant instead of the code. Packaging and versions help only if you bump versions like normal dependency work.
Make the version visible in review. Bump the artifact in the consumer pom.xml, run extract again, and read the .bob/skills diff if you commit those files so behavior changes show up in git. In this repo extract output should be gitignored, so bump the version in the packaging pom.xml and in the consumer plugin dependency together and tell people to reinstall and re-extract. I have watched teams “fix” Bob locally while CI still had last month’s skills; aligning those two numbers is the boring part that actually fixes it.
Tool permissions are a real security boundary
A skill with allowed-tools: Bash Read Edit can run shell commands and edit files. That is the point, and that is also where accidents happen. A sloppy skill, or the right skill in the wrong place, can change more than you meant.
Keep the tool list small. Skip network, broad shell, or “run anything” patterns unless you really need them. Skills are closer to scripts than to comments. Review them like scripts.
Versioning does not replace code review
A versioned skills artifact ships the same bits everywhere: local ~/.m2, internal Nexus, SkillsJars.com, same idea. It does not check whether those bits are correct. If a skill names the wrong Quarkus extension, every consumer picks up the same mistake.
Versioning still helps. Patch for typos and small fixes. Minor when you add a skill or real content. Major when Bob’s behavior on real tasks will change. Downstream teams get a number to plan around. They still need to read the diff.
Context overload weakens skill quality
Do not cram everything into one giant skill. Huge files turn to mush. The model sees more lines and picks the wrong ones. Small focused files usually win.
One skill per problem area works here: scaffolding, extensions, native. If you need a long appendix later, add another file. Do not hide it all in the skill that should fire on a short prompt.
Share
Verification
Verify Bob behavior with concrete prompts
Open the Quarkus consumer in your IDE with IBM Bob enabled and try these prompts in Code mode. This is the part no amount of Maven XML replaces: you are checking whether the words in the skills survive contact with a real model.
Prompt one:
Create a ShipmentResource with GET /api/v1/shipments and POST /api/v1/shipments.
Check that:
Bob creates the resource in a resource package
The code uses jakarta.ws.rs imports
A matching service class is suggested or created
The generated code reads as Quarkus, not Spring
Prompt two:
Add Kafka support to this project.
Watch for:
Bob uses a Quarkus extension workflow
It does not invent random Maven versions
It picks Quarkus extension IDs, not random generic dependencies
Prompt three:
Build a native container image for this project.
Expect:
Bob suggests a container-based native build when appropriate
It distinguishes between local native toolchains and container builds
It does not collapse everything into one vague “use GraalVM” answer
Those checks are about how Bob behaves. The JAR and extract steps can be perfect and the skill still does nothing useful if the text does not stick. Ship packaging first, then iterate on words; both are allowed to be wrong, but usually not in the same release.
Conclusion
SkillsJars fits the same habit you already have for libraries: package once, version it, let Maven resolve it, extract into .bob/skills on the consumer. One good SKILL.md is quick to write. The long fight is the same as with any shared library: keeping one truth across many repos without silent fork drift.
After that you can split skills or add reference files so Bob loads a thin layer first and pulls depth only when the task needs it. That is optional polish; the baseline win is already “same JAR, same extract, same words.”
Subscribe now



Quarkus 3.31 Security Upgrade: Pushed Authorization Requests with Keycloak
Markus Eisele — Sat, 04 Apr 2026 06:08:10 GMT
Classic authorization code flow looks clean until you inspect the redirect URL. Then you see everything in the query string: client_id, scope, redirect_uri, state, nonce, and whatever else your client sends. That is normal OAuth behavior. Most teams stop thinking about it once login works.
These parameters are not secret in the cryptographic sense. They still travel through the browser. They end up in address bars, history, reverse proxy logs, analytics tools, screenshots, and referrer headers. On a laptop demo that feels harmless. In production, with shared logging and support tooling, they can leak further than you meant.
RFC 9126 defines pushed authorization requests (PAR). Your application sends the full authorization request to the authorization server on a back channel first. The browser only follows a redirect that carries a short request_uri instead of the full parameter list.
Quarkus OpenID Connect (OIDC) supports PAR with a dedicated configuration switch. With PAR enabled, Quarkus pushes the authorization request first, receives a short-lived request_uri, and only then redirects the browser. The browser no longer carries the full request payload. If the server advertises pushed_authorization_request_endpoint in its metadata, Quarkus can discover the PAR endpoint automatically. Details are in the Quarkus OIDC configuration reference.
There is a real security angle here too. PAR shows less on the front channel. It also makes casual tampering harder because the client must authenticate when it pushes the request. Many stricter deployments pair PAR with PKCE (Proof Key for Code Exchange, an extra check on the authorization code exchange). The Keycloak OIDC documentation recommends that combination for stronger profiles. Quarkus documents the matching client settings in the Quarkus OIDC authorization code flow guide.
What We’ll Build
Let’s build a small Quarkus app that uses OIDC to protect /account, turns on PAR for the login redirect, and talks to Keycloak locally. We will add /account/tokens as JSON so you can see that you still get normal ID, access, and refresh tokens after login. The only behavioral change we care about is how the authorization request reaches Keycloak.
You can run Keycloak in two ways: Dev Services for Keycloak (Quarkus starts a container for you in dev mode) or Podman on a fixed port. The steps below use the same Quarkus and Keycloak settings; only how you launch Keycloak changes.
Prerequisites
You do not need a big setup. You need a current Quarkus CLI and a JDK (17 or newer matches current Quarkus guides; this article uses Java 21). For Dev Services you need Docker or Podman available to Quarkus. Let’s assume you already know the usual OIDC authorization code flow in Quarkus and what a confidential client is.
Java 21 installed (or JDK 17+)
Quarkus CLI installed
Docker or Podman installed (for Dev Services, or for manual Keycloak below)
Basic familiarity with Quarkus OIDC web-app authentication
Project Setup
Let’s create the project or you can also start from my Github repository:
quarkus create app org.acme:par-demo \
  --extension='oidc,rest-jackson' \
  --no-code
cd par-demo
Extensions explained:
oidc - enables Quarkus OpenID Connect support for web-app authentication
rest-jackson - gives us REST endpoints and JSON serialization for the token inspection endpoint
We keep this small on purpose. No database, no template engine, no extra moving parts. The goal is to isolate the authorization flow.
Start Keycloak with Dev Services
Quarkus Dev Services for Keycloak is enabled by default when you run quarkus dev with the oidc extension, as long as quarkus.oidc.auth-server-url is not set for that mode. Quarkus then starts a Keycloak container (by default quay.io/keycloak/keycloak:26.5.4), creates a quarkus realm, a confidential client quarkus-app with secret secret, and users alice / bob (passwords match the usernames) with sample roles. Admin console access uses admin / admin. See Dev Services and Dev UI for OpenID Connect (OIDC).
Why this matters for PAR:
You get a confidential client out of the box, which PAR expects for the back-channel push.
Quarkus injects the correct issuer URL for the ephemeral container port, so you do not hardcode localhost:8180 in dev.
Optional parameters:
Realm file - If your flow needs a fixed realm export (for example, stricter PAR policies), set quarkus.keycloak.devservices.realm-path=your-realm.json on the classpath or filesystem. Dev Services imports that realm instead of only the defaults.
Fixed Keycloak port - You can use quarkus.keycloak.devservices.port=8180 to bind the Keycloak Dev Service to a specific port.
Shared container - By default Quarkus may reuse a container labeled quarkus-dev-service-keycloak; set quarkus.keycloak.devservices.shared=false if you want an isolated container per run.
After you start the app (see Configure and Run), open the Dev UI (or /q/dev depending on your Quarkus version). Use the OpenID Connect card and the Keycloak provider link to inspect tokens or, for web-app, use Log in to your web application against a path like /account. The same guide describes authorization code, password, and client-credentials grants for service-style testing.
If you already set quarkus.oidc.auth-server-url (for example to a manually run Keycloak), Dev Services does not start; you get the generic OIDC Dev Console instead. The Keycloak authorization quickstart uses a %prod. prefix on quarkus.oidc.auth-server-url so dev keeps Dev Services while prod points at a real URL—see Using OpenID Connect (OIDC) and Keycloak to centralize authorization.
Verify PAR in discovery once Keycloak is up. For Dev Services, take the host and port from the Dev UI or like in our example here, the fixed startup port:
curl -s "http://localhost:8180/realms/quarkus/.well-known/openid-configuration" | grep pushed_authorization
Swap localhost:8180 for your real Keycloak base URL when it differs.
You should see pushed_authorization_request_endpoint. Quarkus discovers it from metadata when the server publishes it. The Quarkus OIDC authorization code flow guide describes discovery behavior.
Start Keycloak in Podman (Fixed Port)
Use this path when you want a stable URL, CI-like setup, or to match production hostnames without Dev Services.
Start Keycloak:
podman run --name keycloak \
  -e KC_BOOTSTRAP_ADMIN_USERNAME=admin \
  -e KC_BOOTSTRAP_ADMIN_PASSWORD=admin \
  -p 8180:8080 \
  quay.io/keycloak/keycloak:26.5.4 \
  start-dev
Current Quarkus Keycloak examples and Dev Services use the Keycloak 26.x line and KC_BOOTSTRAP_ADMIN_USERNAME / KC_BOOTSTRAP_ADMIN_PASSWORD (not older KEYCLOAK_* admin variables). See the Quarkus OpenID Connect client quickstart.
Wait until Keycloak prints that it is running in development mode. Then open http://localhost:8180
 and log in with admin / admin.
Create a new realm named quarkus.
Create a confidential client:
Open the quarkus realm
Go to Clients
Create a client with client ID quarkus-app
Keep the client protocol as OpenID Connect
Enable Client authentication
Enable the standard authorization code flow
Set the redirect URI to http://localhost:8080/*
Set the web origin to http://localhost:8080
You need client authentication because PAR is a back-channel client request. The authorization server must know which client pushed the request. That is part of how PAR is defined in RFC 9126.
Open the Credentials tab and copy the client secret.
Create a test user named alice with password alice.
Verify the PAR endpoint (same as for Dev Services, with your fixed port):
curl -s http://localhost:8180/realms/quarkus/.well-known/openid-configuration | grep pushed_authorization
Implement the Application
We need two resources. One public landing page gives us a safe place to land after logout. One protected resource starts the OIDC code flow, shows the signed-in user, and exposes JSON so we can inspect the tokens Quarkus got after the code exchange.
Create src/main/java/org/acme/HomeResource.java:
package org.acme;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/")
public class HomeResource {

    @GET
    @Produces(MediaType.TEXT_HTML)
    public String home() {
        return """
            
              
                PAR demo
                This application protects the account page with Quarkus OIDC and Pushed Authorization Requests.
                Open the protected account page
              
            
            """;
    }
}
This is intentionally plain. We only need one public entry point. After logout, Quarkus can redirect back here without creating an authentication loop.
Now let’s add src/main/java/org/acme/AccountResource.java:
package org.acme;

import org.eclipse.microprofile.jwt.Claims;
import org.eclipse.microprofile.jwt.JsonWebToken;

import io.quarkus.oidc.IdToken;
import io.quarkus.oidc.RefreshToken;
import io.quarkus.security.Authenticated;
import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/account")
public class AccountResource {

  @Inject
  @IdToken
  JsonWebToken idToken;

  @Inject
  JsonWebToken accessToken;

  @Inject
  RefreshToken refreshToken;

  @GET
  @Authenticated
  @Produces(MediaType.TEXT_HTML)
  public String account() {
    Object givenName = idToken.getClaim(Claims.given_name.name());
    String displayName = givenName != null ? givenName.toString() : idToken.getName();

    return """
        
          
            Hello, %s
            You authenticated through Quarkus OIDC with PAR enabled.
            Inspect tokens
            Logout
          
        
        """.formatted(displayName);
  }

  @GET
  @Path("/tokens")
  @Authenticated
  @Produces(MediaType.APPLICATION_JSON)
  public TokenInfo tokens() {
    return new TokenInfo(
        idToken.getName(),
        idToken.getSubject(),
        accessToken.getExpirationTime(),
        refreshToken.getToken() != null);
  }

  public record TokenInfo(
      String principalName,
      String subject,
      long accessTokenExpirationTime,
      boolean hasRefreshToken) {
  }
}
This resource shows something important about the Quarkus web-app model. Redirect-based login still ends with the same token set: ID token, access token, and optionally a refresh token. The Quarkus OIDC authorization code flow guide documents how web-app uses the authorization code flow and how you can inject the access token as JsonWebToken.
PAR changes an earlier step. Your endpoint code still reads tokens the same way. Your session model stays the same. After Keycloak returns the authorization code, behavior matches a normal code flow. PAR hardens the redirect leg without forcing you to redesign everything else.
There is also a limit here. PAR does not protect you from weak session handling, bad redirect URI registration, or sloppy token use after login. If you take the access token and write it into logs, PAR does nothing for you. It narrows one attack surface. It does not replace the rest of your OIDC hygiene.
Configure Quarkus OIDC and Enable PAR
Configure src/main/resources/application.properties.
If you use Dev Services in dev mode, omit quarkus.oidc.auth-server-url for %dev (or leave it unset globally in dev) so Quarkus starts Keycloak. Use the default client secret secret. For production (or when you always point at a fixed Keycloak), set the issuer on the prod profile as in the Keycloak authorization guide:
# Dev: omit auth-server-url so Dev Services for Keycloak starts Keycloak and injects the issuer.
# Prod (or manual Keycloak on 8180): use %prod profile or set quarkus.oidc.auth-server-url globally.
%prod.quarkus.oidc.auth-server-url=http://localhost:8180/realms/quarkus

quarkus.oidc.client-id=quarkus-app
quarkus.oidc.credentials.secret=secret
quarkus.oidc.application-type=web-app

quarkus.oidc.authentication.par.enabled=true

# Default Dev Services use a random host port; pin 8180 so manual curl examples match startup.
quarkus.keycloak.devservices.port=8180

# PAR + PKCE (recommended for stricter profiles; see article.md)
quarkus.oidc.authentication.pkce-required=true
quarkus.oidc.authentication.state-secret=8f2ef0d782b24016a4a998f5d8b1a2ce

quarkus.oidc.logout.path=/logout
quarkus.oidc.logout.post-logout-path=/

quarkus.http.auth.permission.authenticated.paths=/account,/account/*,/logout
quarkus.http.auth.permission.authenticated.policy=authenticated

quarkus.log.category."io.quarkus.oidc".level=DEBUG
The critical property is quarkus.oidc.authentication.par.enabled=true. Compare the Quarkus OIDC configuration reference. If you do not set an explicit PAR path, Quarkus uses pushed_authorization_request_endpoint from the authorization server metadata.
The quarkus.oidc.application-type=web-app property selects the OIDC authorization code flow for browser login. .
The logout settings are first-class Quarkus features too. quarkus.oidc.logout.path and quarkus.oidc.logout.post-logout-path trigger RP-initiated logout and send the user back to a local page when logout finishes. Same guide covers those properties.
The debug log category is there because you want proof. When this works, you want to see the server-side behavior before the browser redirect.
Run the Application
Start the app in dev mode:
quarkus dev
The first time you use Dev Services, watch the log for Dev Services for Keycloak started. 
Open http://localhost:8080/account.
Because /account is protected and you have no session yet, Quarkus starts the OIDC authorization code flow. With PAR on, Quarkus posts the authorization request to the PAR endpoint on the back channel, gets a request_uri, and only then redirects your browser to the authorization endpoint. That matches the model in RFC 9126 and the PAR settings in the Quarkus OIDC configuration reference.
Look at the browser address bar on the Keycloak login page. With a classic front-channel request you often see a long URL full of scope, redirect_uri, state, and nonce. With PAR, the redirect shrinks to something that carries the client ID and a request_uri reference. That visible difference is what this tutorial is about. RFC 9126 describes the pattern.
Now log in as alice with the password alice.
After the redirect back to Quarkus, open http://localhost:8080/account/tokens. You should see JSON similar to this:
{
  "principalName": "alice",
  "subject": "8e4615ab-b442-4f1d-b036-0d556ce55a2b",
  "accessTokenExpirationTime": 1774326152,
  "hasRefreshToken": true
}
The exact values will differ, but the structure should match.
What Happens in the Flow
At this point, let’s spell the flow out:
When PKCE is enabled (see the section Add PKCE on Top), the token request also includes code_verifier; PAR and PKCE address different legs of the same overall flow.
Browser → GET /account
Quarkus → POST /realms/quarkus/protocol/openid-connect/ext/par/request with the authorization request parameters and client authentication
Keycloak → returns request_uri and expires_in
Quarkus → redirects the browser to /protocol/openid-connect/auth with client_id and request_uri
User logs in on Keycloak
Keycloak → redirects the browser back to Quarkus with the authorization code
Quarkus → exchanges the code for ID token, access token, and refresh token
Browser → sees the protected page
So here is the win in one sentence: the browser still does login, but the full authorization request does not ride through it anymore. RFC 9126 defines this push-plus-request_uri handoff, and Quarkus lines up with it through configuration.
One production detail matters. The request_uri is short-lived. If login takes too long and the reference expires before authorization finishes, the flow fails. That is expected. The short lifetime helps with replay resistance. Keep it in mind when you debug slow or interrupted logins.
Add PKCE on Top
PAR alone is useful. For sensitive apps, PAR plus PKCE is the baseline you want. We already introduced PKCE in the opening; now let’s turn it on.
Add these properties to application.properties:
quarkus.oidc.authentication.pkce-required=true
quarkus.oidc.authentication.state-secret=8f2ef0d782b24016a4a998f5d8b1a2ce
quarkus.oidc.authentication.pkce-required=true turns PKCE on. You also need quarkus.oidc.authentication.state-secret so Quarkus can encrypt the PKCE verifier in the state cookie. The Quarkus OIDC authorization code flow guide shows a 32-character example secret.
Generate one with OpenSSL if you want a fresh value:
openssl rand -hex 16
The Keycloak OIDC documentation recommends PKCE together with PAR in stronger profiles. PAR keeps the heavy request off the front channel. PKCE ties the code exchange back to the original client. They solve different steps; use both.
PKCE does not replace PAR, and PAR does not replace PKCE. One protects the request leg. The other protects the code exchange leg. Use both.
Require PAR on the Keycloak Side
Right now your Quarkus client uses PAR because you told it to. That is a client-side choice. In stricter environments you also want the authorization server to reject non-PAR authorization requests.
Keycloak can publish require_pushed_authorization_requests in metadata when you enforce PAR. Quarkus can also turn PAR on automatically when discovery says pushed authorization requests are required. See the Quarkus OIDC configuration reference.
In practice, enforce it in Keycloak for the realm or client, then verify the discovery document again:
curl -s http://localhost:8180/realms/quarkus/.well-known/openid-configuration | grep require_pushed_authorization_requests
When that setting becomes true, clients that try to send a normal front-channel authorization request without PAR will be rejected. That is the point where PAR stops being a nice hardening option and becomes policy.
For day-to-day PAR experiments, Dev Services plus default quarkus-app / secret is enough. For a shared team baseline, I still like a checked-in realm file (quarkus.keycloak.devservices.realm-path) or the explicit Podman setup so client type and secrets stay visible in review.
Production Hardening
What Happens Under Load
PAR adds one back-channel request before the redirect. Login now depends on the PAR endpoint, the authorization endpoint, and the token endpoint all being reachable. If Keycloak is slow or the network between Quarkus and Keycloak is unhealthy, login can fail earlier in the flow. That is expected. You moved work off the browser leg onto a server-to-server leg, so monitor that path too. 
Tampering and Trust Boundaries
With a classic flow, the authorization request becomes a front-channel redirect URL. With PAR, the client authenticates to the PAR endpoint and pushes the request directly. That tightens who can create the request. It does not fix bad intent. If your Quarkus client asks for too many scopes, PAR still protects exactly that request.
Session and Logout Behavior
PAR does not change how Quarkus handles sessions. After the code exchange you still have a normal web-app with Quarkus-managed tokens and cookies. You still need solid logout, tight cookie scope, HTTPS in real deployments, and consistent secrets across instances. The Quarkus OIDC authorization code flow guide covers logout paths; everything else about session hygiene is still on you.
Conclusion
We built a Quarkus OIDC app that protects a real endpoint, runs against local Keycloak, uses PAR to keep authorization request data off the browser URL, and still ends with the same code-flow tokens after login. Your resource code stays familiar. The shift is the trust boundary on the login redirect: the browser no longer carries the full authorization request, Quarkus pushes it to Keycloak, and the redirect only carries a short-lived request_uri. That is a real hardening step for sensitive apps. For stricter deployments, add PKCE and require PAR on the server too. 
Subscribe now