<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Main Thread]]></title><description><![CDATA[Deep dives into Quarkus, AI tooling, and the architecture decisions that actually matter for senior Java engineers.]]></description><link>https://www.the-main-thread.com</link><image><url>https://substackcdn.com/image/fetch/$s_!8sdd!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81643b8a-6240-4cd1-9f3a-8fd19cc3a455_254x254.png</url><title>The Main Thread</title><link>https://www.the-main-thread.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 09 Jun 2026 16:57:57 GMT</lastBuildDate><atom:link href="https://www.the-main-thread.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Markus Eisele]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[myfear@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[myfear@substack.com]]></itunes:email><itunes:name><![CDATA[Markus Eisele]]></itunes:name></itunes:owner><itunes:author><![CDATA[Markus Eisele]]></itunes:author><googleplay:owner><![CDATA[myfear@substack.com]]></googleplay:owner><googleplay:email><![CDATA[myfear@substack.com]]></googleplay:email><googleplay:author><![CDATA[Markus Eisele]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Quarkus REST Client: Timeouts, Retries, and Redaction]]></title><description><![CDATA[Build an outbound HTTP template with explicit time budgets, one safe retry, useful API errors, and tests for slow or flaky dependencies.]]></description><link>https://www.the-main-thread.com/p/quarkus-rest-client-timeouts-retries-redaction</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-rest-client-timeouts-retries-redaction</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Tue, 09 Jun 2026 06:08:31 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/035f64d9-0e45-4f15-b836-1d44cdf2a21c_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I do not trust outbound HTTP code that only ran against a fast local stub. That is enough for a demo. It is not enough when your API has a real latency budget and a dependency that likes to go slow at exactly the wrong time.</p><p>Quarkus REST Client still defaults to a <strong>15 second connect timeout</strong> and a <strong>30 second read timeout</strong>. I would not call that resilience. Your request can sit there burning somebody else&#8217;s budget until the failure finally shows up in the wrong place. Turn on request-response logging with no masking and the bearer token you needed for debugging is now in the log stream. Add retries on the wrong call and one flaky dependency can turn into duplicate work.</p><p>This tutorial builds a small service with explicit per-client timeouts, one bounded retry on a safe read path, masked outbound logs, useful downstream error translation, and tests that cover the bad days as well as the happy path.</p><h2><strong>What we build</strong></h2><p>We will build <code>carrier-bridge</code>, a Quarkus service that exposes <code>GET /tracking/{trackingId}</code> and calls a downstream carrier status API through a declarative REST client. By the end you will have:</p><ul><li><p>a declarative REST client using <code>rest-client-jackson</code></p></li><li><p>explicit <code>connect-timeout</code> and <code>read-timeout</code> on that client</p></li><li><p>one bounded retry for a transient <code>503</code> on an idempotent read</p></li><li><p>request-response logging with redacted auth headers</p></li><li><p>client-side and server-side error mapping into clean JSON</p></li><li><p>WireMock Dev Service tests through the Quarkus WireMock extension for slow, flaky, and permanently broken downstream responses</p></li></ul><p>The retry example is a <strong>read</strong> operation on purpose. Automatic retry is only safe when the call is safe to repeat. If you retry writes, charges, or submit flows, you need idempotency keys or another guardrail. That rule comes from HTTP semantics, not from Quarkus magic.</p><h2><strong>What you need</strong></h2><p>You need Java 21, the Quarkus CLI, and basic familiarity with declarative REST clients. The walkthrough takes about two &#9749;&#65039;&#9749;&#65039;.</p><ul><li><p>Java 21</p></li><li><p>Quarkus CLI (<code>quarkus create app</code>)</p></li><li><p>Basic Quarkus REST Client knowledge</p></li></ul><h2><strong>Build the base</strong></h2><p>This article uses Quarkus <strong>3.36.1</strong> and Java 21. Create the project:</p><pre><code><code>quarkus create app org.acme:carrier-bridge \
  --extension='rest-jackson,rest-client-jackson,smallrye-fault-tolerance,smallrye-openapi' \
  --java=21 \
  --no-code</code></code></pre><p>Extensions:</p><ul><li><p><code>rest-jackson</code> for the inbound API</p></li><li><p><code>rest-client-jackson</code> for the declarative outbound client</p></li><li><p><code>smallrye-fault-tolerance</code> for retry policy</p></li><li><p><code>smallrye-openapi</code> so the service still looks like something a team would ship</p></li></ul><p>Add the <a href="https://docs.quarkiverse.io/quarkus-wiremock/dev/index.html">Quarkus WireMock extension</a>. It starts WireMock as a Dev Service in <code>dev</code> and <code>test</code> mode, so you do not manage server lifecycle yourself:</p><pre><code><code>./mvnw quarkus:add-extension -Dextensions="io.quarkiverse.wiremock:quarkus-wiremock"</code></code></pre><p>And yes. I am using the mvn command here instead the CLI. I keep mixing both, depending on which command I remember first. So please bear with me. Both integrations are great and I do not want anybody to feel like they have to use the Quarkus CLI.</p><p>The command adds <code>quarkus-wiremock</code> to the build. Pin the Quarkiverse version and add the test helper module in <code>pom.xml</code>:</p><pre><code><code>&lt;properties&gt;
    &lt;quarkus-wiremock.version&gt;1.6.3&lt;/quarkus-wiremock.version&gt;
&lt;/properties&gt;

&lt;dependencies&gt;
    &lt;dependency&gt;
        &lt;groupId&gt;io.quarkiverse.wiremock&lt;/groupId&gt;
        &lt;artifactId&gt;quarkus-wiremock&lt;/artifactId&gt;
        &lt;version&gt;${quarkus-wiremock.version}&lt;/version&gt;
        &lt;scope&gt;provided&lt;/scope&gt;
    &lt;/dependency&gt;
    &lt;dependency&gt;
        &lt;groupId&gt;io.quarkiverse.wiremock&lt;/groupId&gt;
        &lt;artifactId&gt;quarkus-wiremock-test&lt;/artifactId&gt;
        &lt;version&gt;${quarkus-wiremock.version}&lt;/version&gt;
    &lt;/dependency&gt;
&lt;/dependencies&gt;</code></code></pre><p><code>quarkus-wiremock-test</code> brings in <code>@ConnectWireMock</code>, which injects a <code>WireMock</code> client into your tests. The extension publishes <code>quarkus.wiremock.devservices.port</code> so the REST client URL can point at the running stub without hard-coding a port.</p><h2><strong>Make it work</strong></h2><p>Let&#8217;s follow one boring happy path first, then widen it. Start with the payload types and failure exceptions, then the client, service, and resource.</p><h3><strong>Payload and error types</strong></h3><p>The downstream carrier returns a full tracking document. Our public API returns a trimmed response. Failures become typed exceptions with a stable <code>downstreamStatus</code> field for the caller.</p><p>Create <code>src/main/java/org/acme/carrier/bridge/CarrierTrackingPayload.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import java.time.Instant;

record CarrierTrackingPayload(String trackingId, String carrier, String status, Instant lastUpdated) {
}</code></code></pre><p>Create <code>src/main/java/org/acme/carrier/bridge/TrackingResponse.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import java.time.Instant;

public record TrackingResponse(String trackingId, String carrier, String status, Instant lastUpdated) {
}
</code></code></pre><p>Create <code>src/main/java/org/acme/carrier/bridge/ApiError.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

public record ApiError(String code, String message, Integer downstreamStatus) {
}</code></code></pre><p>Create <code>src/main/java/org/acme/carrier/bridge/CarrierFailures.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

abstract class CarrierFailure extends RuntimeException {

    private final Integer downstreamStatus;

    CarrierFailure(String message, Integer downstreamStatus) {
        super(message);
        this.downstreamStatus = downstreamStatus;
    }

    CarrierFailure(String message, Integer downstreamStatus, Throwable cause) {
        super(message, cause);
        this.downstreamStatus = downstreamStatus;
    }

    Integer downstreamStatus() {
        return downstreamStatus;
    }
}

final class TrackingNotFoundException extends CarrierFailure {

    TrackingNotFoundException(String trackingId) {
        super("Carrier API could not find tracking ID '%s'.".formatted(trackingId), 404);
    }
}

final class CarrierUnavailableException extends CarrierFailure {

    CarrierUnavailableException() {
        super("Carrier API is temporarily unavailable.", 503);
    }
}

final class CarrierTimeoutException extends CarrierFailure {

    CarrierTimeoutException(Throwable cause) {
        super("Carrier API did not respond before the outbound read timeout.", null, cause);
    }
}

final class CarrierInvocationException extends CarrierFailure {

    CarrierInvocationException(Throwable cause) {
        super("Carrier API call failed before a usable response was returned.", null, cause);
    }
}</code></code></pre><p>Each exception maps to one caller-facing HTTP status later. A timeout is not the same shape as a missing tracking ID, and neither should look like a generic transport failure.</p><h3><strong>Outbound auth filter</strong></h3><p>Real carrier APIs expect credentials on every call. A <code>ClientRequestFilter</code> is the right place to attach them so the rest of the code stays focused on business logic.</p><p>Create <code>src/main/java/org/acme/carrier/bridge/CarrierAuthFilter.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import io.quarkus.arc.Unremovable;
import jakarta.ws.rs.client.ClientRequestContext;
import jakarta.ws.rs.client.ClientRequestFilter;
import jakarta.ws.rs.core.HttpHeaders;

@Unremovable
public class CarrierAuthFilter implements ClientRequestFilter {

    static final String DEMO_BEARER_TOKEN = "carrier-demo-bearer-token";
    static final String DEMO_API_KEY = "carrier-demo-api-key";

    @Override
    public void filter(ClientRequestContext requestContext) {
        requestContext.getHeaders().putSingle(HttpHeaders.AUTHORIZATION, "Bearer " + DEMO_BEARER_TOKEN);
        requestContext.getHeaders().putSingle("X-Carrier-Key", DEMO_API_KEY);
    }
}</code></code></pre><p><code>@Unremovable</code> keeps Arc from dropping the filter when nothing else injects it directly. In production these values come from configuration or a secrets store, not constants. The test suite uses the constants to prove redaction works.</p><h3><strong>Declarative REST client</strong></h3><p>The client interface is the outbound contract. Register it with <code>configKey = "carrier-api"</code> so timeouts, URL, and logging stay in one configuration namespace.</p><p>Create <code>src/main/java/org/acme/carrier/bridge/CarrierStatusClient.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import java.net.URI;

import io.quarkus.rest.client.reactive.ClientExceptionMapper;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.core.Response;
import org.eclipse.microprofile.rest.client.annotation.RegisterProvider;
import org.eclipse.microprofile.rest.client.inject.RegisterRestClient;

@Path("/carrier-api")
@RegisterRestClient(configKey = "carrier-api")
@RegisterProvider(CarrierAuthFilter.class)
public interface CarrierStatusClient {

    @GET
    @Path("/tracking/{trackingId}")
    CarrierTrackingPayload getTracking(@PathParam("trackingId") String trackingId);

    @ClientExceptionMapper
    static RuntimeException toException(Response response, URI uri) {
        return switch (response.getStatus()) {
            case 404 -&gt; new TrackingNotFoundException(uri.getPath().substring(uri.getPath().lastIndexOf('/') + 1));
            case 503 -&gt; new CarrierUnavailableException();
            default -&gt; null;
        };
    }
}</code></code></pre><p><code>@ClientExceptionMapper</code> runs before the response body is unmarshalled into <code>CarrierTrackingPayload</code>. That is why a <code>404</code> with an error JSON body does not explode into a Jackson mapping exception. Returning <code>null</code> leaves other status codes to the default client error handling.</p><p><code>@RegisterProvider(CarrierAuthFilter.class)</code> wires the auth filter without touching the interface method signatures.</p><h3><strong>Service with retry and timeout handling</strong></h3><p>Retries belong on the service method, not on the client interface. That keeps the retry policy tied to one business operation and one failure type.</p><p>Create <code>src/main/java/org/acme/carrier/bridge/TrackingService.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import java.net.SocketTimeoutException;
import java.util.concurrent.TimeoutException;

import org.eclipse.microprofile.faulttolerance.Retry;
import org.eclipse.microprofile.rest.client.inject.RestClient;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.ws.rs.ProcessingException;

@ApplicationScoped
class TrackingService {

    private final CarrierStatusClient carrierStatusClient;

    TrackingService(@RestClient CarrierStatusClient carrierStatusClient) {
        this.carrierStatusClient = carrierStatusClient;
    }

    @Retry(retryOn = CarrierUnavailableException.class)
    TrackingResponse fetchTracking(String trackingId) {
        try {
            CarrierTrackingPayload payload = carrierStatusClient.getTracking(trackingId);
            return new TrackingResponse(
                    payload.trackingId(),
                    payload.carrier(),
                    payload.status(),
                    payload.lastUpdated());
        } catch (ProcessingException e) {
            if (hasTimeoutCause(e)) {
                throw new CarrierTimeoutException(e);
            }
            throw new CarrierInvocationException(e);
        }
    }

    private boolean hasTimeoutCause(Throwable throwable) {
        Throwable current = throwable;
        while (current != null) {
            if ((current instanceof SocketTimeoutException) || (current instanceof TimeoutException)) {
                return true;
            }
            String simpleName = current.getClass().getSimpleName();
            if (simpleName.contains("Timeout")) {
                return true;
            }
            current = current.getCause();
        }
        return false;
    }
}</code></code></pre><p><code>@Retry(retryOn = CarrierUnavailableException.class)</code> retries only transient carrier outages. A <code>404</code> does not retry. A read timeout does not retry either, because <code>CarrierTimeoutException</code> is outside <code>retryOn</code>. That is the behavior you want: one safe retry for a blip, not a second slow wait on an already late call.</p><p><code>ProcessingException</code> wraps transport-level failures from the REST client. The <code>hasTimeoutCause</code> walk is ugly but practical. Vert.x timeout types do not always surface as <code>SocketTimeoutException</code> at the top of the stack.</p><h3><strong>Inbound resource and API error mapping</strong></h3><p>The resource stays thin. Exception mapping turns typed failures into stable JSON for callers.</p><p>Create <code>src/main/java/org/acme/carrier/bridge/TrackingResource.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
import org.jboss.resteasy.reactive.RestResponse;
import org.jboss.resteasy.reactive.server.ServerExceptionMapper;

@Path("/tracking")
@Produces(MediaType.APPLICATION_JSON)
public class TrackingResource {

    private final TrackingService trackingService;

    TrackingResource(TrackingService trackingService) {
        this.trackingService = trackingService;
    }

    @GET
    @Path("/{trackingId}")
    public RestResponse&lt;TrackingResponse&gt; tracking(@PathParam("trackingId") String trackingId) {
        return RestResponse.ok(trackingService.fetchTracking(trackingId));
    }

    @ServerExceptionMapper
    RestResponse&lt;ApiError&gt; mapTrackingNotFound(TrackingNotFoundException exception) {
        return RestResponse.status(
                Response.Status.NOT_FOUND,
                new ApiError("tracking_not_found", exception.getMessage(), exception.downstreamStatus()));
    }

    @ServerExceptionMapper
    RestResponse&lt;ApiError&gt; mapCarrierUnavailable(CarrierUnavailableException exception) {
        return RestResponse.status(
                Response.Status.SERVICE_UNAVAILABLE,
                new ApiError("carrier_unavailable", exception.getMessage(), exception.downstreamStatus()));
    }

    @ServerExceptionMapper
    RestResponse&lt;ApiError&gt; mapCarrierTimeout(CarrierTimeoutException exception) {
        return RestResponse.status(
                Response.Status.GATEWAY_TIMEOUT,
                new ApiError("carrier_timeout", exception.getMessage(), exception.downstreamStatus()));
    }

    @ServerExceptionMapper
    RestResponse&lt;ApiError&gt; mapCarrierInvocation(CarrierInvocationException exception) {
        return RestResponse.status(
                Response.Status.BAD_GATEWAY,
                new ApiError("carrier_invocation_failed", exception.getMessage(), exception.downstreamStatus()));
    }
}</code></code></pre><p>Each mapper picks a deliberate HTTP status. <code>504</code> for timeout, <code>503</code> for downstream outage, <code>404</code> for unknown tracking ID, <code>502</code> for everything else that failed before a usable response arrived. Callers and monitors can tell these apart without reading stack traces.</p><h2><strong>Configure it</strong></h2><p>Add <code>src/main/resources/application.properties</code>:</p><pre><code><code>carrier.api.url=http://localhost:8089

quarkus.rest-client."carrier-api".url=${carrier.api.url}
quarkus.rest-client."carrier-api".connect-timeout=100
quarkus.rest-client."carrier-api".read-timeout=200

quarkus.rest-client.logging.scope=request-response
quarkus.rest-client.logging.body-limit=120
quarkus.rest-client.logging.masked-headers=Authorization,Cookie,X-Carrier-Key

quarkus.log.category."org.jboss.resteasy.reactive.client.logging".min-level=DEBUG
quarkus.log.category."org.jboss.resteasy.reactive.client.logging".level=DEBUG

quarkus.fault-tolerance."org.acme.carrier.bridge.TrackingService/fetchTracking".retry.max-retries=1
quarkus.fault-tolerance."org.acme.carrier.bridge.TrackingService/fetchTracking".retry.delay=50
quarkus.fault-tolerance."org.acme.carrier.bridge.TrackingService/fetchTracking".retry.delay-unit=millis
quarkus.fault-tolerance."org.acme.carrier.bridge.TrackingService/fetchTracking".retry.jitter=25
quarkus.fault-tolerance."org.acme.carrier.bridge.TrackingService/fetchTracking".retry.jitter-unit=millis</code></code></pre><p><code>carrier.api.url</code> - Base URL for the downstream carrier. In tests, the <code>%test.</code> profile override below points this at the WireMock Dev Service port.</p><p><code>connect-timeout=100</code><strong> and </strong><code>read-timeout=200</code> - One attempt gets about 300 ms before the client gives up. Without these, Quarkus falls back to 15 s connect and 30 s read. That is a long time to block a caller-facing thread.</p><p><code>logging.scope=request-response</code> - Logs outbound requests and responses. Useful for debugging, dangerous without header masking.</p><p><code>logging.body-limit=120</code> - Truncates logged response bodies. Downstream JSON may still appear in logs at this limit. We are not logging full payloads to callers; this setting only caps what operations sees in log lines.</p><p><code>logging.masked-headers</code> - Replaces matching header values with <code>&lt;hidden&gt;</code>. <strong>Setting this property replaces the default list.</strong> If you want <code>Authorization</code> and <code>Cookie</code> masked, keep them in your explicit list alongside any custom headers like <code>X-Carrier-Key</code>.</p><p><strong>REST client log category at DEBUG</strong> - Request-response logging does not show up in tests unless this category is enabled. The redaction test depends on it.</p><p><strong>Fault tolerance properties</strong> - One retry, 50 ms base delay, 25 ms jitter. Worst case for a permanent <code>503</code> is two outbound attempts plus roughly 75 ms of backoff, still inside a sub-second caller budget.</p><p>Point the REST client at the WireMock Dev Service during tests. Create <code>src/test/resources/application.properties</code>:</p><pre><code><code>%test.carrier.api.url=http://localhost:${quarkus.wiremock.devservices.port}</code></code></pre><p>The <code>${quarkus.wiremock.devservices.port}</code> expression is published by the <a href="https://docs.quarkiverse.io/quarkus-wiremock/dev/index.html">Quarkus WireMock extension</a> when the Dev Service starts. Main <code>application.properties</code> keeps <code>carrier.api.url=http://localhost:8089</code> for manual dev runs against a real or standalone stub.</p><h2><strong>Make it survive</strong></h2><h3><strong>Retry only where repetition is safe</strong></h3><p>The <code>@Retry</code> on <code>fetchTracking</code> is scoped to <code>CarrierUnavailableException</code>. That is a read and a transient outage shape. Do not copy this pattern onto charge, create, or submit endpoints without idempotency keys. One duplicate tracking lookup is annoying. One duplicate charge is a incident.</p><h3><strong>Keep the retry budget small</strong></h3><p><code>max-retries=1</code> means one extra attempt after the first failure, not an open-ended loop. Under load, generous retry policies turn one slow dependency into a traffic multiplier. If you need more than one retry, you probably need a circuit breaker or async recovery path instead of another blind repeat.</p><h3><strong>Separate failure shapes for callers and operators</strong></h3><p>A timeout (<code>504</code>), a downstream outage (<code>503</code>), a missing ID (<code>404</code>), and a broken transport call (<code>502</code>) should not collapse into one generic error. The mappers above give callers that separation. Quarkus REST Client also exposes <code>http.clients</code> metrics for declarative clients when Micrometer is on the classpath. That is worth wiring in production, but metrics deserve their own article once the failure mapping is correct.</p><p>Response body logging can still leak operational detail even when headers are masked. Treat log access with the same care as credential storage. Do not forward raw downstream error JSON to your public API; the <code>ApiError</code> record is the contract.</p><h2><strong>Prove it</strong></h2><h3><strong>Stub helper</strong></h3><p>The Quarkus WireMock extension injects a <code>WireMock</code> client when the test class carries <code>@ConnectWireMock</code>. Keep the stub recipes in a small helper so the test class stays readable.</p><p>Create <code>src/test/java/org/acme/carrier/bridge/CarrierStubs.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import static com.github.tomakehurst.wiremock.client.WireMock.aResponse;
import static com.github.tomakehurst.wiremock.client.WireMock.get;
import static com.github.tomakehurst.wiremock.client.WireMock.getRequestedFor;
import static com.github.tomakehurst.wiremock.client.WireMock.okJson;
import static com.github.tomakehurst.wiremock.client.WireMock.urlEqualTo;

import com.github.tomakehurst.wiremock.client.WireMock;
import com.github.tomakehurst.wiremock.stubbing.Scenario;

final class CarrierStubs {

    private CarrierStubs() {
    }

    static void reset(WireMock wireMock) {
        wireMock.resetMappings();
        wireMock.resetRequests();
        wireMock.resetScenarios();
    }

    static void stubSuccess(WireMock wireMock, String trackingId) {
        wireMock.register(get(urlEqualTo(path(trackingId)))
                .willReturn(okJson(successBody(trackingId, "IN_TRANSIT", "2026-06-05T12:30:00Z"))));
    }

    static void stubSlow(WireMock wireMock, String trackingId) {
        wireMock.register(get(urlEqualTo(path(trackingId)))
                .willReturn(aResponse()
                        .withStatus(200)
                        .withHeader("Content-Type", "application/json")
                        .withFixedDelay(450)
                        .withBody(successBody(trackingId, "IN_TRANSIT", "2026-06-05T12:30:00Z"))));
    }

    static void stubUnavailableThenSuccess(WireMock wireMock, String trackingId) {
        String scenarioName = "carrier-retry-" + trackingId;
        wireMock.register(get(urlEqualTo(path(trackingId)))
                .inScenario(scenarioName)
                .whenScenarioStateIs(Scenario.STARTED)
                .willReturn(aResponse()
                        .withStatus(503)
                        .withHeader("Content-Type", "application/json")
                        .withBody(errorBody("carrier unavailable")))
                .willSetStateTo("recovered"));

        wireMock.register(get(urlEqualTo(path(trackingId)))
                .inScenario(scenarioName)
                .whenScenarioStateIs("recovered")
                .willReturn(okJson(successBody(trackingId, "DELIVERED", "2026-06-05T12:31:00Z"))));
    }

    static void stubUnavailable(WireMock wireMock, String trackingId) {
        wireMock.register(get(urlEqualTo(path(trackingId)))
                .willReturn(aResponse()
                        .withStatus(503)
                        .withHeader("Content-Type", "application/json")
                        .withBody(errorBody("carrier unavailable"))));
    }

    static void stubNotFound(WireMock wireMock, String trackingId) {
        wireMock.register(get(urlEqualTo(path(trackingId)))
                .willReturn(aResponse()
                        .withStatus(404)
                        .withHeader("Content-Type", "application/json")
                        .withBody(errorBody("unknown tracking"))));
    }

    static int requestCount(WireMock wireMock, String trackingId) {
        return wireMock.findAll(getRequestedFor(urlEqualTo(path(trackingId)))).size();
    }

    private static String path(String trackingId) {
        return "/carrier-api/tracking/" + trackingId;
    }

    private static String successBody(String trackingId, String status, String lastUpdated) {
        return """
                {
                  "trackingId": "%s",
                  "carrier": "Parcel Rocket",
                  "status": "%s",
                  "lastUpdated": "%s"
                }
                """.formatted(trackingId, status, lastUpdated);
    }

    private static String errorBody(String message) {
        return """
                {
                  "error": "%s"
                }
                """.formatted(message);
    }
}</code></code></pre><p><code>stubSlow</code> uses a 450 ms fixed delay against a 200 ms read timeout, so the timeout test fails for the right reason. <code>stubUnavailableThenSuccess</code> uses WireMock scenarios to return <code>503</code> once, then <code>200</code> on the second call.</p><h3><strong>Log capture helper</strong></h3><p>Create <code>src/test/java/org/acme/carrier/bridge/InMemoryLogHandler.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.logging.Handler;
import java.util.logging.LogRecord;

final class InMemoryLogHandler extends Handler {

    private final List&lt;String&gt; messages = new CopyOnWriteArrayList&lt;&gt;();

    @Override
    public void publish(LogRecord record) {
        if (record != null) {
            messages.add(record.getMessage());
        }
    }

    @Override
    public void flush() {
        // nothing to flush
    }

    @Override
    public void close() {
        messages.clear();
    }

    void clear() {
        messages.clear();
    }

    String joinedMessages() {
        return String.join("\n", messages);
    }
}</code></code></pre><h3><strong>Failure tests</strong></h3><p>Create <code>src/test/java/org/acme/carrier/bridge/TrackingResourceTest.java</code>:</p><pre><code><code>package org.acme.carrier.bridge;

import static io.restassured.RestAssured.given;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.nullValue;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;

import java.util.logging.Level;
import java.util.logging.Logger;

import com.github.tomakehurst.wiremock.client.WireMock;

import io.quarkiverse.wiremock.devservice.ConnectWireMock;
import io.quarkus.test.common.http.TestHTTPEndpoint;
import io.quarkus.test.junit.QuarkusTest;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

@QuarkusTest
@ConnectWireMock
@TestHTTPEndpoint(TrackingResource.class)
class TrackingResourceTest {

    private static final String REST_CLIENT_LOG_CATEGORY = "org.jboss.resteasy.reactive.client.logging";

    private static Logger restClientLogger;
    private static InMemoryLogHandler logHandler;

    WireMock wiremock;

    @BeforeAll
    static void installLogHandler() {
        restClientLogger = Logger.getLogger(REST_CLIENT_LOG_CATEGORY);
        logHandler = new InMemoryLogHandler();
        restClientLogger.addHandler(logHandler);
        restClientLogger.setLevel(Level.FINE);
    }

    @AfterAll
    static void removeLogHandler() {
        if (restClientLogger != null &amp;&amp; logHandler != null) {
            restClientLogger.removeHandler(logHandler);
        }
    }

    @BeforeEach
    void resetState() {
        CarrierStubs.reset(wiremock);
        logHandler.clear();
    }

    @Test
    void returnsTrackingStatus() {
        CarrierStubs.stubSuccess(wiremock, "TRACK-123");

        given()
                .when().get("/TRACK-123")
                .then()
                .statusCode(200)
                .body("trackingId", equalTo("TRACK-123"))
                .body("carrier", equalTo("Parcel Rocket"))
                .body("status", equalTo("IN_TRANSIT"))
                .body("lastUpdated", equalTo("2026-06-05T12:30:00Z"));
    }

    @Test
    void returnsGatewayTimeoutWhenCarrierIsSlow() {
        CarrierStubs.stubSlow(wiremock, "TRACK-SLOW");

        given()
                .when().get("/TRACK-SLOW")
                .then()
                .statusCode(504)
                .body("code", equalTo("carrier_timeout"))
                .body("message", equalTo("Carrier API did not respond before the outbound read timeout."))
                .body("downstreamStatus", nullValue());
    }

    @Test
    void retriesOnceAndSucceedsAfterTransientFailure() {
        CarrierStubs.stubUnavailableThenSuccess(wiremock, "TRACK-RETRY");

        given()
                .when().get("/TRACK-RETRY")
                .then()
                .statusCode(200)
                .body("trackingId", equalTo("TRACK-RETRY"))
                .body("status", equalTo("DELIVERED"));

        assertEquals(2, CarrierStubs.requestCount(wiremock, "TRACK-RETRY"));
    }

    @Test
    void returnsServiceUnavailableAfterPermanentCarrierFailure() {
        CarrierStubs.stubUnavailable(wiremock, "TRACK-DOWN");

        given()
                .when().get("/TRACK-DOWN")
                .then()
                .statusCode(503)
                .body("code", equalTo("carrier_unavailable"))
                .body("message", equalTo("Carrier API is temporarily unavailable."))
                .body("downstreamStatus", equalTo(503));

        assertEquals(2, CarrierStubs.requestCount(wiremock, "TRACK-DOWN"));
    }

    @Test
    void returnsNotFoundWhenCarrierDoesNotKnowTrackingId() {
        CarrierStubs.stubNotFound(wiremock, "TRACK-MISSING");

        given()
                .when().get("/TRACK-MISSING")
                .then()
                .statusCode(404)
                .body("code", equalTo("tracking_not_found"))
                .body("message", equalTo("Carrier API could not find tracking ID 'TRACK-MISSING'."))
                .body("downstreamStatus", equalTo(404));

        assertEquals(1, CarrierStubs.requestCount(wiremock, "TRACK-MISSING"));
    }

    @Test
    void masksSensitiveHeadersInRestClientLogs() {
        CarrierStubs.stubSuccess(wiremock, "TRACK-LOGS");

        given()
                .when().get("/TRACK-LOGS")
                .then()
                .statusCode(200);

        String logs = logHandler.joinedMessages();
        assertFalse(logs.contains(CarrierAuthFilter.DEMO_BEARER_TOKEN));
        assertFalse(logs.contains(CarrierAuthFilter.DEMO_API_KEY));
        assertTrue(logs.contains("Authorization"));
        assertTrue(logs.contains("X-Carrier-Key"));
        assertTrue(logs.contains("&lt;hidden&gt;"));
    }
}</code></code></pre><p>Before you run the retry test, predict how many calls WireMock should see for <code>TRACK-RETRY</code>. The answer is two: first call gets <code>503</code>, retry gets <code>200</code>. For <code>TRACK-DOWN</code> it is also two, then the API returns <code>503</code> to the caller. For <code>TRACK-MISSING</code> it is one, because <code>404</code> is not in <code>retryOn</code>.</p><p>Run the suite:</p><pre><code><code>./mvnw test</code></code></pre><p>All six tests should pass. The slow-downstream test proves the read timeout surfaces as <code>504</code>. The permanent failure test proves retries stop after one extra attempt. The redaction test proves fake secrets never appear in captured log lines.</p><h2><strong>Conclusion</strong></h2><p>We built an outbound HTTP path that fails on purpose instead of by accident. Explicit timeouts stop slow dependencies from eating the caller&#8217;s budget, one bounded retry handles a transient <code>503</code> on a safe read, masked headers keep credentials out of logs, and separate error codes give callers something useful when the carrier misbehaves.</p><p>The complete code is on my <a href="https://github.com/myfear/the-main-thread/tree/main/carrier-bridge">GitHub</a> for you to check out.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[AI Coding Governance for Real Software Teams]]></title><description><![CDATA[A practical JCon recap on context engineering, bounded tasks, MCP-style tooling, review fatigue, and why Java teams still give agents better rails.]]></description><link>https://www.the-main-thread.com/p/ai-coding-real-systems</link><guid isPermaLink="false">https://www.the-main-thread.com/p/ai-coding-real-systems</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Mon, 08 Jun 2026 06:08:11 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2b3e167e-a07f-4f8c-92db-b4c1f741b060_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The slide I care about most from my recent <a href="https://jcon.one/">JCon</a> keynote only had six words on it: code is cheap, software isn&#8217;t.</p><p>If you want the full talk, <a href="https://www.youtube.com/watch?v=Di4ii6Xsb9A">the recording is here</a> and <a href="https://speakerdeck.com/myfear/code-is-cheap-software-isnt">you can find the slides here</a>. This article is the shorter version I would hand to a team before they let an agent loose in a real repository.</p><p>The exciting part of AI coding is obvious. You ask for a REST endpoint, a refactor, a migration sketch, or a first-pass UI, and the blank page disappears. That part is real. I use these tools too. They save time, remove some boring work, and make it easier to try things quickly.</p><p>The problem starts when people confuse easier code generation with easier software engineering.</p><p>Real systems are full of hidden contracts. Authorization rules, deployment assumptions, stale docs, weird integration edges, and business logic nobody wrote down but everybody now depends on. A model can infer some of that from nearby code. It cannot infer all of it just because the prompt sounded confident.</p><p>That is still the main lesson for me after the last 18 months. AI can draft a lot of syntax. It does not automatically understand the software around it.</p><h2><strong>Prompting is not the hard part</strong></h2><p>I keep seeing teams treat AI coding as a prompting problem. Write better prompts. Add more detail. Find the right magic spell. That helps a bit, but it is not the part I trust.</p><p>The bigger lever is context.</p><p>If the agent does not know your architecture constraints, your business boundary, your non-negotiable tests, or the parts of the repository it should not touch, it will make something up. That is not a moral failure. That is how the system works. Statistical tools fill gaps with likely-looking answers.</p><p>So I would spend less time chasing clever wording and more time building context the model can actually use.</p><p>That means repository rules. It means ADRs. It means decision logs. It means tests that define behavior clearly enough that the agent has something better than vibes. It also means being selective. Dumping a pile of stale internal docs into the context window is not context engineering. It is just a different way to confuse the model.</p><p>The real question is not &#8220;how much context can I fit?&#8221; It is &#8220;which context changes the quality of the decision?&#8221;</p><h2><strong>Big tasks still fail for familiar reasons</strong></h2><p>AI has not rescued us from decomposition.</p><p>If a change is too messy to explain in two or three clear sentences, it is probably too messy to hand to an agent as one job. The tool may still produce a lot of output. That is not the same as producing a controlled result.</p><p>This is one place where the AI discourse sometimes sounds weirdly ahistorical. We already know big-bang rewrites fail. We already know broad &#8220;clean this whole area up while you are there&#8221; work spreads risk faster than teams can review it. Why would an agent be the magical exception?</p><p>The pattern that keeps holding up is still the old one:</p><ul><li><p>Smaller tasks</p></li><li><p>Tight boundaries</p></li><li><p>Fast verification</p></li><li><p>Easy rollback</p></li><li><p>Explicit ownership</p></li></ul><p>That is less glamorous than &#8220;fully autonomous development,&#8221; but it is much closer to how production teams stay sane.</p><p>I also think local git matters more than people admit. When an agent goes off the rails, the difference between a useful experiment and an annoying afternoon is often whether you can inspect the diff quickly, throw it away, and try again with a narrower request.</p><h2><strong>The useful tooling exposes systems, not just text</strong></h2><p>One reason I care about MCP and similar tool surfaces is that they shift the interaction away from pure guessing.</p><p>A model is much more useful when it can inspect runtime state, read the logs that matter, query the system you are actually changing, or reach structured docs instead of paraphrasing what it half-remembers. That does not make the model magical. It just gives it better ground to stand on.</p><p>For me, that is the real promise of this layer. Not &#8220;the model can use tools&#8221; as a demo trick. The better promise is that the model can stop pretending text alone is enough to understand a live system.</p><p>The same rule still applies, though: more tools are not automatically better. Tool sprawl creates its own tax. A giant tool catalog, a huge context payload, and 10 overlapping ways to do the same thing can make the session worse, not better. Good AI workflows need a shaped surface just as much as they need a capable model.</p><h2><strong>The bill comes back during review</strong></h2><p>This is the part I think teams still underestimate.</p><p>AI output is fast to produce and often expensive to verify. That cost does not show up in the demo. It shows up when a senior engineer has to read a clean-looking diff and decide whether the system still deserves trust.</p><p>That review load is real work. You are checking hidden assumptions, edge cases, failure paths, auth behavior, operational risk, naming drift, and whether the tests prove anything useful or just mirror the implementation. If the change touches infrastructure, security, or business rules, the cognitive bill gets even higher.</p><p>This is why I do not trust raw productivity claims that stop at code generation speed. A fast draft plus a slow, exhausting review loop is not automatically a win. Sometimes it is. Sometimes it is just a different queue.</p><p>The failure mode is subtle because tired reviewers still look productive from the outside. Files changed. The pull request is large. CI passed. Everybody feels movement. But fatigue is not the same as confidence, and momentum is not the same as understanding.</p><p>If I had to keep one professional rule from all of this, it would be simple: if you do not understand an AI-generated change well enough to explain its failure mode, do not merge it.</p><h2><strong>Java is in a stronger position than the hype suggests</strong></h2><p>A lot of AI coding conversation still defaults to Python because the AI ecosystem grew up around Python-first tooling. That does not mean Python teams are automatically in a better engineering position.</p><p>Java has a very practical advantage here. The code is explicit. Types are visible. Contracts are clearer. Framework conventions are stronger. Build and runtime boundaries are easier to follow than in many loosely structured stacks. That kind of shape helps humans review faster, and it helps models stay closer to the rails.</p><p>I do not mean Java makes AI safe. It does not. I mean Java gives both the model and the reviewer more structure to work with, which is one reason I think enterprise Java teams are better positioned for this transition than the public AI narrative suggests.</p><p>If anything, Java teams should lean into that advantage on purpose. Strong tests, typed config, explicit boundaries, and boring conventions are not old habits the AI era made irrelevant. They are what make the AI era survivable.</p><h2><strong>What I would actually tell a team</strong></h2><p>If I had to compress the whole keynote into one short working agreement, it would look like this:</p><ol><li><p>Give the model better context, not just longer prompts.</p></li><li><p>Keep tasks small enough that verification stays cheap.</p></li><li><p>Use tools that expose real system state when the task depends on real system state.</p></li><li><p>Treat review load as part of the cost, not as free cleanup after the model is done.</p></li><li><p>Keep a human owner attached to every production change.</p></li></ol><p>That is not a revolutionary message. It is mostly software engineering refusing to disappear just because the draft got cheaper.</p><p>AI is changing how we build. I do not think that part is controversial anymore. The part people still keep relearning is that faster code generation does not remove the expensive parts of software. Intent is still expensive. Architecture is still expensive. Verification is still expensive. Ownership is still expensive.</p><p>That is why the sentence on the slide still holds up.</p><p>Code is cheap. Software isn&#8217;t.</p><div id="youtube2-Di4ii6Xsb9A" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Di4ii6Xsb9A&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Di4ii6Xsb9A?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Quarkus Signals: Build In-Process Messaging Without a Broker]]></title><description><![CDATA[Use experimental Quarkus Signals for publish, send, and request-reply inside one app, and see when CDI events, Reactive Messaging, or the Vert.x EventBus still fit better.]]></description><link>https://www.the-main-thread.com/p/quarkus-signals-vs-cdi-events</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-signals-vs-cdi-events</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Sun, 07 Jun 2026 06:08:27 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c7fdafdc-f8bb-4022-a243-29fe56ff6673_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Quarkus has <a href="https://quarkus.io/guides/signals">Signals</a> now. That is easy to say, but it does not tell you why you would use them.</p><p>I was curious about what Signals replaces. Y&#8217;all probably ask the same question right away: why not CDI events?</p><p>If the answer were &#8220;CDI events, but async,&#8221; this would be a short article and not a very interesting feature. CDI already gives us multicast events and async observers. Signals matter because they pull three in-process messaging patterns into one API:</p><ul><li><p><code>publish()</code> for multicast notification</p></li><li><p><code>send()</code> for unicast work dispatch</p></li><li><p><code>request()</code> for typed request-reply</p></li></ul><p>That is the gap. CDI events are still great for observer-style fan-out. Reactive Messaging is still the right tool when a broker, connectors, or backpressure are part of the problem. Vert.x EventBus can cover the same ground too, but with string addresses and a lower-level model. Signals sit in the middle: in-process, type-safe, async by default, and explicit about whether you are broadcasting, dispatching work, or asking another component for an answer.</p><p><a href="https://quarkus.io/guides/signals">Signals</a> ship as a new <strong>experimental</strong> extension in <a href="https://quarkus.io/blog/quarkus-3-36-released/">Quarkus 3.36.0</a>.</p><h2><strong>What we build</strong></h2><p>We build <strong>NebulaTrack</strong>, a small cloud cost monitor in a command-mode Quarkus app (no REST). It detects cost anomalies, fans out alerts, dispatches remediation work to one worker, and asks a pricing component for estimates.</p><p>When you finish, you have:</p><ul><li><p>three signal patterns wired with <code>@Receives</code> receivers</p></li><li><p>qualifier lanes (<code>@Default</code>, <code>@Critical</code>, <code>@Any</code>)</p></li><li><p>metadata on emissions</p></li><li><p>programmatic receiver registration</p></li><li><p><code>@QuarkusTest</code> proof for every behavior</p></li></ul><p>Signals are not an HTTP feature. A command-mode app keeps that obvious.</p><h2><strong>Prerequisites</strong></h2><p>You need a current JDK, Maven or the Quarkus CLI, and about one &#9749;&#65039;.</p><ul><li><p><strong>JDK 25</strong> (this project targets Java 25)</p></li><li><p><strong>Maven 3.9+</strong> or the <a href="https://quarkus.io/guides/cli">Quarkus CLI</a></p></li><li><p>Familiarity with CDI injection</p></li></ul><p>You can grab the <a href="https://github.com/myfear/the-main-thread/tree/main/nebulatrack-signals">full source code from my repository</a> if you don&#8217;t want to follow along.</p><h2><strong>Project setup</strong></h2><p>Create the application without codestarts so we stay out of the web stack:</p><pre><code><code>quarkus create app dev.quarkex:nebulatrack-signals \
  --extension='quarkus-signals' \
  --java=25 \
  --no-code</code></code></pre><p>Under <code>src/main/java</code>, create the package <code>dev.quarkex.nebulatrack</code> and the subpackages used below (<code>model</code>, <code>qualifier</code>, <code>service</code>, <code>support</code>).</p><p>Add two test dependencies for AssertJ and Awaitility:</p><pre><code><code>&lt;dependency&gt;
  &lt;groupId&gt;org.assertj&lt;/groupId&gt;
  &lt;artifactId&gt;assertj-core&lt;/artifactId&gt;
  &lt;version&gt;3.27.3&lt;/version&gt;
  &lt;scope&gt;test&lt;/scope&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
  &lt;groupId&gt;org.awaitility&lt;/groupId&gt;
  &lt;artifactId&gt;awaitility&lt;/artifactId&gt;
  &lt;scope&gt;test&lt;/scope&gt;
&lt;/dependency&gt;</code></code></pre><p>The only production extension we need is <code>quarkus-signals</code>. You do not need Vert.x on the classpath for blocking receivers. Vert.x only enters the story when you move into non-blocking execution or <code>Uni</code>-based receivers.</p><p><strong>Experimental:</strong> treat API and semantics as subject to change until the extension graduates.</p><h3><strong>Verify</strong></h3><p>From the project root:</p><pre><code><code>./mvnw test</code></code></pre><p>An empty or placeholder test should compile and pass before we add behavior.</p><h2><strong>Domain records</strong></h2><p>Create <code>src/main/java/dev/quarkex/nebulatrack/model/Severity.java</code>:</p><pre><code><code>package dev.quarkex.nebulatrack.model;

public enum Severity {
    NORMAL,
    CRITICAL
}</code></code></pre><p>Create <code>CostAnomaly.java</code>, <code>RemediationRequest.java</code>, <code>EstimateRequest.java</code>, and <code>CostEstimate.java</code>:</p><pre><code><code>package dev.quarkex.nebulatrack.model;

public record CostAnomaly(String region, double hourlyDelta, Severity severity) {
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.model;

public record RemediationRequest(String region, String action) {
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.model;

public record EstimateRequest(String service, int units) {
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.model;

import java.math.BigDecimal;

public record CostEstimate(String service, int units, BigDecimal monthlyCost) {
}</code></code></pre><p>For the &#8220;no matching receiver&#8221; test later, add <code>UnmatchedEstimateRequest.java</code> with no receivers:</p><pre><code><code>package dev.quarkex.nebulatrack.model;

/**
 * Signal type with no registered receivers &#8212; used to prove {@code request()} returns {@code null}.
 */
public record UnmatchedEstimateRequest(String service, int units) {
}</code></code></pre><h2><strong>Test ledger</strong></h2><p>Receivers run asynchronously, so tests need a place they can poll without guessing about timing. This bean is plain on purpose. Each receiver just records what happened:</p><pre><code><code>package dev.quarkex.nebulatrack.support;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.atomic.AtomicInteger;

import jakarta.enterprise.context.ApplicationScoped;

import dev.quarkex.nebulatrack.model.CostAnomaly;

@ApplicationScoped
public class InMemoryLedger {

    private final CopyOnWriteArrayList&lt;CostAnomaly&gt; anomalyEvents = new CopyOnWriteArrayList&lt;&gt;();
    private final AtomicInteger alertCount = new AtomicInteger();
    private final AtomicInteger auditCount = new AtomicInteger();
    private final AtomicInteger dashboardCount = new AtomicInteger();
    private final AtomicInteger workerACount = new AtomicInteger();
    private final AtomicInteger workerBCount = new AtomicInteger();
    private final AtomicInteger defaultLaneCount = new AtomicInteger();
    private final AtomicInteger criticalLaneCount = new AtomicInteger();
    private final AtomicInteger catchAllCount = new AtomicInteger();
    private final AtomicInteger pluginCount = new AtomicInteger();
    private final CopyOnWriteArrayList&lt;Map&lt;String, Object&gt;&gt; metadataSnapshots = new CopyOnWriteArrayList&lt;&gt;();
    private final CopyOnWriteArrayList&lt;UUID&gt; requestScopeIds = new CopyOnWriteArrayList&lt;&gt;();

    public void recordAnomaly(CostAnomaly anomaly) {
        anomalyEvents.add(anomaly);
    }

    public void recordAlert() {
        alertCount.incrementAndGet();
    }

    public void recordAudit() {
        auditCount.incrementAndGet();
    }

    public void recordDashboardRefresh() {
        dashboardCount.incrementAndGet();
    }

    public void recordWorkerA() {
        workerACount.incrementAndGet();
    }

    public void recordWorkerB() {
        workerBCount.incrementAndGet();
    }

    public void recordDefaultLane() {
        defaultLaneCount.incrementAndGet();
    }

    public void recordCriticalLane() {
        criticalLaneCount.incrementAndGet();
    }

    public void recordCatchAll() {
        catchAllCount.incrementAndGet();
    }

    public void recordPlugin() {
        pluginCount.incrementAndGet();
    }

    public void recordMetadata(Map&lt;String, Object&gt; metadata) {
        metadataSnapshots.add(Map.copyOf(metadata));
    }

    public void recordRequestScopeId(UUID id) {
        requestScopeIds.add(id);
    }

    public List&lt;CostAnomaly&gt; anomalyEvents() {
        return Collections.unmodifiableList(new ArrayList&lt;&gt;(anomalyEvents));
    }

    public int alertCount() {
        return alertCount.get();
    }

    public int auditCount() {
        return auditCount.get();
    }

    public int dashboardCount() {
        return dashboardCount.get();
    }

    public int workerACount() {
        return workerACount.get();
    }

    public int workerBCount() {
        return workerBCount.get();
    }

    public int defaultLaneCount() {
        return defaultLaneCount.get();
    }

    public int criticalLaneCount() {
        return criticalLaneCount.get();
    }

    public int catchAllCount() {
        return catchAllCount.get();
    }

    public int pluginCount() {
        return pluginCount.get();
    }

    public List&lt;Map&lt;String, Object&gt;&gt; metadataSnapshots() {
        return Collections.unmodifiableList(new ArrayList&lt;&gt;(metadataSnapshots));
    }

    public List&lt;UUID&gt; requestScopeIds() {
        return Collections.unmodifiableList(new ArrayList&lt;&gt;(requestScopeIds));
    }

    public void reset() {
        anomalyEvents.clear();
        alertCount.set(0);
        auditCount.set(0);
        dashboardCount.set(0);
        workerACount.set(0);
        workerBCount.set(0);
        defaultLaneCount.set(0);
        criticalLaneCount.set(0);
        catchAllCount.set(0);
        pluginCount.set(0);
        metadataSnapshots.clear();
        requestScopeIds.clear();
    }
}</code></code></pre><h2><strong>Pattern 1: Publish for multicast</strong></h2><p><code>CostMonitor</code> emits an anomaly on the default lane:</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.model.Severity;
import io.quarkus.signals.Signal;

@ApplicationScoped
public class CostMonitor {

    private final Signal&lt;CostAnomaly&gt; anomalySignal;

    @Inject
    public CostMonitor(Signal&lt;CostAnomaly&gt; anomalySignal) {
        this.anomalySignal = anomalySignal;
    }

    public void detect() {
        anomalySignal.publish(new CostAnomaly("us-east-1", 340.0, Severity.NORMAL));
    }
}</code></code></pre><p>Add three receivers. <code>AlertService</code> and <code>DashboardRefresher</code> take the signal directly; <code>AuditTrail</code> already uses <code>SignalContext</code>, which pays off again when we attach metadata later:</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class AlertService {

    private final InMemoryLedger ledger;

    @Inject
    public AlertService(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void onAnomaly(@Receives CostAnomaly anomaly) {
        ledger.recordAlert();
    }
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;
import io.quarkus.signals.SignalContext;

@ApplicationScoped
public class AuditTrail {

    private final InMemoryLedger ledger;

    @Inject
    public AuditTrail(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void onAnomaly(@Receives SignalContext&lt;dev.quarkex.nebulatrack.model.CostAnomaly&gt; ctx) {
        ledger.recordAudit();
        ledger.recordMetadata(ctx.metadata());
    }
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class DashboardRefresher {

    private final InMemoryLedger ledger;

    @Inject
    public DashboardRefresher(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void onAnomaly(@Receives CostAnomaly anomaly) {
        ledger.recordDashboardRefresh();
    }
}</code></code></pre><p><code>publish()</code> delivers to <strong>every</strong> matching receiver, asynchronously. Each receiver gets its own CDI request context. That is a bigger shift than &#8220;observer on another thread.&#8221;</p><h3><strong>Verify</strong></h3><p>Add <code>src/test/java/dev/quarkex/nebulatrack/NebulaTrackSignalsTest.java</code> with <code>@QuarkusTest</code>, inject <code>CostMonitor</code> and <code>InMemoryLedger</code>, and implement <code>publishNotifiesAllReceivers()</code>:</p><pre><code><code>@Test
void publishNotifiesAllReceivers() {
    costMonitor.detect();

    await().atMost(10, TimeUnit.SECONDS).untilAsserted(() -&gt; {
        assertThat(ledger.alertCount()).isGreaterThanOrEqualTo(1);
        assertThat(ledger.auditCount()).isGreaterThanOrEqualTo(1);
        assertThat(ledger.dashboardCount()).isGreaterThanOrEqualTo(1);
    });
}</code></code></pre><p>Before you run the test, predict the shape of the result. If <code>publish()</code> behaved like queue dispatch, only one counter would move. Here all three should move.</p><p>Run <code>./mvnw test -Dtest=NebulaTrackSignalsTest#publishNotifiesAllReceivers</code>. All three counters should move.</p><h2><strong>Qualifier primer: </strong><code>@Default</code><strong> is not catch-all</strong></h2><p>Before more patterns, one CDI trap: a receiver with no qualifier is <code>@Default</code>, not &#8220;any signal of this type.&#8221;</p><pre><code><code>void onAnomaly(@Receives CostAnomaly anomaly) {
    // default lane only
}</code></code></pre><p>For every lane, use <code>@Any</code>:</p><pre><code><code>void onAnyAnomaly(@Receives @Any CostAnomaly anomaly) {
    // all qualifier lanes
}</code></code></pre><p>That difference matters enough to model directly:</p><pre><code><code>package dev.quarkex.nebulatrack.qualifier;

import java.lang.annotation.Retention;
import java.lang.annotation.Target;

import jakarta.enterprise.util.AnnotationLiteral;
import jakarta.inject.Qualifier;

import static java.lang.annotation.ElementType.FIELD;
import static java.lang.annotation.ElementType.METHOD;
import static java.lang.annotation.ElementType.PARAMETER;
import static java.lang.annotation.ElementType.TYPE;
import static java.lang.annotation.RetentionPolicy.RUNTIME;

@Qualifier
@Retention(RUNTIME)
@Target({ FIELD, METHOD, PARAMETER, TYPE })
public @interface Critical {

    final class Literal extends AnnotationLiteral&lt;Critical&gt; implements Critical {
        public static final Literal INSTANCE = new Literal();
    }
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class DefaultLaneReceiver {

    private final InMemoryLedger ledger;

    @Inject
    public DefaultLaneReceiver(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void general(@Receives CostAnomaly anomaly) {
        ledger.recordDefaultLane();
    }
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.qualifier.Critical;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class CriticalLaneReceiver {

    private final InMemoryLedger ledger;

    @Inject
    public CriticalLaneReceiver(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void critical(@Receives @Critical CostAnomaly anomaly) {
        ledger.recordCriticalLane();
    }
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Any;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class CatchAllAnomalyReceiver {

    private final InMemoryLedger ledger;

    @Inject
    public CatchAllAnomalyReceiver(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void catchAll(@Receives @Any CostAnomaly anomaly) {
        ledger.recordCatchAll();
    }
}</code></code></pre><p>Emit critical anomalies from an emitter that injects <strong>both</strong> <code>Signal&lt;CostAnomaly&gt;</code> (default lane) and <code>@Any Signal&lt;CostAnomaly&gt;</code> (for <code>select(Critical.Literal.INSTANCE)</code>):</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Any;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.model.Severity;
import dev.quarkex.nebulatrack.qualifier.Critical;
import io.quarkus.signals.Signal;

@ApplicationScoped
public class CriticalAnomalyEmitter {

    private final Signal&lt;CostAnomaly&gt; defaultAnomalySignal;
    private final Signal&lt;CostAnomaly&gt; anyAnomalySignal;

    @Inject
    public CriticalAnomalyEmitter(
            Signal&lt;CostAnomaly&gt; defaultAnomalySignal,
            @Any Signal&lt;CostAnomaly&gt; anyAnomalySignal) {
        this.defaultAnomalySignal = defaultAnomalySignal;
        this.anyAnomalySignal = anyAnomalySignal;
    }

    public void publishCritical() {
        anyAnomalySignal.select(Critical.Literal.INSTANCE)
                .publish(new CostAnomaly("eu-west-1", 900.0, Severity.CRITICAL));
    }

    public void publishDefault() {
        defaultAnomalySignal.publish(new CostAnomaly("us-west-2", 120.0, Severity.NORMAL));
    }
}</code></code></pre><p>Publishing only on the <code>@Any</code> bean does <strong>not</strong> hit <code>@Default</code> receivers. That is the behavior we test next.</p><h3><strong>Verify</strong></h3><p>Before you run <code>defaultReceiverIgnoresCriticalLane()</code>, decide which counter should stay flat. If the answer is not <code>defaultLaneCount</code>, the CDI mental model is still in the driver&#8217;s seat.</p><p><code>defaultReceiverIgnoresCriticalLane()</code> publishes only on the critical lane and asserts the default-lane counter stays at zero while the critical counter moves.</p><h2><strong>Pattern 2: Send for unicast work</strong></h2><p>Sometimes you want one worker, not fan-out. <code>RemediationDispatcher</code> uses <code>send()</code>:</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.RemediationRequest;
import io.quarkus.signals.Signal;

@ApplicationScoped
public class RemediationDispatcher {

    private final Signal&lt;RemediationRequest&gt; remediationSignal;

    @Inject
    public RemediationDispatcher(Signal&lt;RemediationRequest&gt; remediationSignal) {
        this.remediationSignal = remediationSignal;
    }

    public void dispatch(String region, String action) {
        remediationSignal.send(new RemediationRequest(region, action));
    }
}</code></code></pre><p>The workers are deliberately boring:</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.RemediationRequest;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class RemediationWorkerA {

    private final InMemoryLedger ledger;

    @Inject
    public RemediationWorkerA(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void handle(@Receives RemediationRequest request) {
        ledger.recordWorkerA();
    }
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.RemediationRequest;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class RemediationWorkerB {

    private final InMemoryLedger ledger;

    @Inject
    public RemediationWorkerB(InMemoryLedger ledger) {
        this.ledger = ledger;
    }

    void handle(@Receives RemediationRequest request) {
        ledger.recordWorkerB();
    }
}</code></code></pre><p><code>send()</code> picks one receiver in <strong>round-robin</strong> order. This is the first pattern CDI events do not model cleanly.</p><p>Receivers default to <strong>blocking</strong> execution when they return a plain value. That keeps the &#8220;no Vert.x required&#8221; story honest for this walkthrough.</p><h3><strong>Verify</strong></h3><p>Before you run <code>sendRoundRobinsBetweenWorkers()</code>, predict the split. Six sends should not wake both workers six times. It should land close to 3/3.</p><p><code>sendRoundRobinsBetweenWorkers()</code> sends six remediations and asserts worker A and B counts differ by at most one.</p><h2><strong>Pattern 3: Request for typed replies</strong></h2><p><code>BudgetService</code> asks <code>PricingEngine</code> for a <code>CostEstimate</code>:</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostEstimate;
import dev.quarkex.nebulatrack.model.EstimateRequest;
import io.quarkus.signals.Signal;
import io.smallrye.mutiny.Uni;

@ApplicationScoped
public class BudgetService {

    private final Signal&lt;EstimateRequest&gt; estimateSignal;

    @Inject
    public BudgetService(Signal&lt;EstimateRequest&gt; estimateSignal) {
        this.estimateSignal = estimateSignal;
    }

    public CostEstimate estimateBlocking(String service, int units) {
        return estimateSignal.request(new EstimateRequest(service, units), CostEstimate.class);
    }

    public Uni&lt;CostEstimate&gt; estimateReactive(String service, int units) {
        return estimateSignal.reactive()
                .request(new EstimateRequest(service, units), CostEstimate.class);
    }
}</code></code></pre><p>Receiver:</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import java.math.BigDecimal;

import jakarta.enterprise.context.ApplicationScoped;

import dev.quarkex.nebulatrack.model.CostEstimate;
import dev.quarkex.nebulatrack.model.EstimateRequest;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class PricingEngine {

    CostEstimate onEstimate(@Receives EstimateRequest request) {
        BigDecimal monthlyCost = BigDecimal.valueOf(request.units()).multiply(BigDecimal.valueOf(0.12));
        return new CostEstimate(request.service(), request.units(), monthlyCost);
    }
}</code></code></pre><p>I use a blocking return type here so the app runs without Vert.x. A <code>Uni</code>-returning receiver defaults to non-blocking execution. When Vert.x is part of the runtime, that usually means the event loop.</p><p>Resolution also considers the <strong>response type</strong>. If nothing matches, <code>request()</code> returns <code>null</code> &#8212; worth testing explicitly, because <code>publish()</code> and <code>send()</code> stay silent in the same situation.</p><h3><strong>Verify</strong></h3><ul><li><p><code>requestReturnsTypedEstimate()</code> &#8212; blocking path, 500 units at 0.12 &#8594; <code>60.00</code></p></li><li><p><code>requestReturnsTypedEstimateReactive()</code> &#8212; <code>reactive().request(...)</code></p></li><li><p><code>requestReturnsNullWhenNoReceiver()</code> &#8212; inject <code>Signal&lt;UnmatchedEstimateRequest&gt;</code> with no receivers</p></li></ul><h2><strong>Qualifiers in depth</strong></h2><p>The three lane receivers already show the rule. The last missing piece is the emitter side. Inject <code>@Any Signal&lt;CostAnomaly&gt;</code> when you need <code>select()</code> without carrying <code>@Default</code>:</p><pre><code><code>anyAnomalySignal.select(Critical.Literal.INSTANCE)
        .publish(new CostAnomaly("eu-west-1", 900.0, Severity.CRITICAL));</code></code></pre><p>Publish on the plain <code>Signal&lt;CostAnomaly&gt;</code> injection for the default lane only.</p><h3><strong>Verify</strong></h3><p>Before you run <code>criticalLaneAndCatchAllReceiver()</code>, count the expected catch-all hits first. It should see both emissions, not just the critical one.</p><p><code>criticalLaneAndCatchAllReceiver()</code> publishes once on default and once on critical; default, critical, and catch-all counters all move.</p><h2><strong>Metadata with </strong><code>SignalContext</code></h2><p>Attach metadata at emission time:</p><pre><code><code>anomalySignal.withMetadata("traceId", traceId)
        .withMetadata("tenant", tenant)
        .publish(new CostAnomaly("us-east-1", 340.0, Severity.NORMAL));</code></code></pre><p>Read it in a receiver that takes <code>SignalContext&lt;CostAnomaly&gt;</code>:</p><pre><code><code>void onAnomaly(@Receives SignalContext&lt;CostAnomaly&gt; ctx) {
    String traceId = (String) ctx.metadata().get("traceId");
    CostAnomaly anomaly = ctx.signal();
}</code></code></pre><p>Metadata is the honest built-in context story today. If you want automatic enrichment or interception, the extension already exposes SPI hooks such as <code>SignalMetadataEnricher</code> and <code>ReceiverInterceptor</code>. What it does <strong>not</strong> give you out of the box is automatic tracing or security propagation.</p><h3><strong>Verify</strong></h3><p><code>metadataVisibleInReceiver()</code> after <code>costMonitor.detectWithMetadata("abc-123", "acme")</code>.</p><h2><strong>Programmatic receivers</strong></h2><p>Runtime registration via <code>Receivers</code>:</p><pre><code><code>package dev.quarkex.nebulatrack.service;

import java.util.function.Consumer;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.qualifier.Critical;
import dev.quarkex.nebulatrack.support.CostPlugin;
import io.quarkus.signals.Receivers;
import io.quarkus.signals.SignalContext;

@ApplicationScoped
public class PluginReceiverRegistrar {

    private final Receivers receivers;

    @Inject
    public PluginReceiverRegistrar(Receivers receivers) {
        this.receivers = receivers;
    }

    public Receivers.Registration register(CostPlugin plugin) {
        return receivers.newReceiver(CostAnomaly.class)
                .setQualifiers(Critical.Literal.INSTANCE)
                .setExecutionModel(Receivers.ExecutionModel.BLOCKING)
                .notify((Consumer&lt;SignalContext&lt;CostAnomaly&gt;&gt;) ctx -&gt; plugin.process(ctx.signal()));
    }
}</code></code></pre><p>The explicit <code>Consumer</code> cast avoids ambiguity between <code>notify(Consumer)</code> and <code>notify(Function)</code> overloads.</p><p>The guide documents runtime registration and unregistration, but it does not promise stronger concurrency semantics than that. For plugin-style infrastructure, treat registration as something you should test under load instead of assuming atomic visibility during concurrent emissions.</p><h3><strong>Verify</strong></h3><p><code>programmaticRegisterAndUnregister()</code> &#8212; register, publish critical, assert delivery; <code>unregister()</code>, publish again, count unchanged.</p><h2><strong>Request context per receiver</strong></h2><p>Each receiver invocation activates a <strong>new</strong> CDI request context. That is easy to prove with two tiny classes:</p><pre><code><code>package dev.quarkex.nebulatrack.support;

import java.util.UUID;

import jakarta.enterprise.context.RequestScoped;

@RequestScoped
public class InvocationTrace {

    private final UUID id = UUID.randomUUID();

    public UUID id() {
        return id;
    }
}</code></code></pre><pre><code><code>package dev.quarkex.nebulatrack.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.model.CostAnomaly;
import dev.quarkex.nebulatrack.support.InMemoryLedger;
import dev.quarkex.nebulatrack.support.InvocationTrace;
import io.quarkus.signals.Receives;

@ApplicationScoped
public class RequestScopeProbe {

    private final InMemoryLedger ledger;
    private final InvocationTrace trace;

    @Inject
    public RequestScopeProbe(InMemoryLedger ledger, InvocationTrace trace) {
        this.ledger = ledger;
        this.trace = trace;
    }

    void onAnomaly(@Receives CostAnomaly anomaly) {
        ledger.recordRequestScopeId(trace.id());
    }
}</code></code></pre><h3><strong>Verify</strong></h3><p><code>receiversGetIsolatedRequestScope()</code> &#8212; two <code>publish()</code> calls yield two different UUIDs in the ledger.</p><h2><strong>Optional command-mode entry point</strong></h2><p><code>NebulaTrackMain</code> implements <code>QuarkusApplication</code> with <code>@QuarkusMain</code> and runs a short demo sequence:</p><pre><code><code>package dev.quarkex.nebulatrack;

import jakarta.inject.Inject;

import dev.quarkex.nebulatrack.service.BudgetService;
import dev.quarkex.nebulatrack.service.CostMonitor;
import dev.quarkex.nebulatrack.service.RemediationDispatcher;
import io.quarkus.runtime.QuarkusApplication;
import io.quarkus.runtime.annotations.QuarkusMain;

@QuarkusMain
public class NebulaTrackMain implements QuarkusApplication {

    private final CostMonitor costMonitor;
    private final RemediationDispatcher remediationDispatcher;
    private final BudgetService budgetService;

    @Inject
    public NebulaTrackMain(
            CostMonitor costMonitor,
            RemediationDispatcher remediationDispatcher,
            BudgetService budgetService) {
        this.costMonitor = costMonitor;
        this.remediationDispatcher = remediationDispatcher;
        this.budgetService = budgetService;
    }

    @Override
    public int run(String... args) throws Exception {
        costMonitor.detect();
        remediationDispatcher.dispatch("us-east-1", "scale-down-idle-nodes");
        var estimate = budgetService.estimateBlocking("s3", 500);
        System.out.printf("NebulaTrack demo finished; sample estimate for %s: %s%n",
                estimate.service(), estimate.monthlyCost());
        return 0;
    }
}</code></code></pre><p>Quarkus command-mode testing usually works best as a mix: <code>@QuarkusMainTest</code> for CLI behavior and <code>@QuarkusTest</code> for internals. This article stays on the internal side, so we inject beans directly.</p><h2><strong>Make it survive</strong></h2><p><strong>Experimental extension</strong> &#8212; pin the Quarkus version in the article and in CI; expect API tweaks.</p><p><strong>Silent fan-out</strong> &#8212; <code>publish()</code> and <code>send()</code> succeed when no receiver matches. Fine for notifications; dangerous if you assume someone handled the work.</p><p><strong>Receiver failures</strong> &#8212; blocking <code>publish()</code> and <code>send()</code> log receiver failures instead of throwing them back to the caller. Blocking <code>request()</code> is different: it throws the receiver failure on the calling thread. Reactive emissions fail the returned <code>Uni</code>.</p><p><code>request()</code><strong> returns null</strong> &#8212; treat as part of the contract; test it.</p><p><strong>Qualifier resolution</strong> &#8212; <code>@Default</code> vs <code>@Any</code> trips people coming from CDI observers. Emit from the injection point that matches the lane you intend.</p><p><strong>Execution models</strong> &#8212; blocking receivers work without Vert.x. <code>Uni</code> receivers and <code>NON_BLOCKING</code> pull Vert.x into the execution story. Virtual-thread execution models need the right runtime support; in a minimal app, prefer blocking for receivers and programmatic hooks.</p><p><strong>Programmatic registration</strong> &#8212; runtime registration is useful for plugins, but the public guide does not promise stronger visibility guarantees than register/unregister itself. If this matters under load, test it under load.</p><p><strong>Metadata is not tracing</strong> &#8212; correlation IDs yes; automatic OpenTelemetry or security propagation no.</p><h2><strong>Prove it</strong></h2><p>From the module root:</p><pre><code><code>./mvnw test</code></code></pre><p>All tests in <code>NebulaTrackSignalsTest</code> should pass. That covers publish fan-out, send round-robin, blocking and reactive request, null request, qualifiers, metadata, programmatic register/unregister, and request-scope isolation.</p><h2><strong>When to reach for Signals</strong></h2><p>Use <strong>CDI events</strong> for classic observer multicast, especially when synchronous delivery or transactional observers matter.</p><p>Use <strong>Signals</strong> for in-process async coordination when you need publish, send, or typed request-reply between decoupled components.</p><p>Use <strong>Reactive Messaging</strong> when Kafka, AMQP, Pulsar, backpressure, or connectors are in scope.</p><p>Use <strong>Vert.x EventBus</strong> when you want address-based routing and the Vert.x programming model.</p><p>Signals are not &#8220;another event bus.&#8221; They are the layer between CDI convenience and broker-backed messaging.</p><p>That is the gap from the opening. CDI events still own classic observer fan-out. Signals give you one small in-process API for fan-out, unicast work, and typed request-reply without dragging in broker concerns.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[IBM Bob Needs a Context Budget Before It Needs More Tools]]></title><description><![CDATA[How GitHub MCP tool definitions, git diff, and gh output compete for IBM Bob's 200k context window, and how to keep the budget under control.]]></description><link>https://www.the-main-thread.com/p/bob-mcp-context-tax</link><guid isPermaLink="false">https://www.the-main-thread.com/p/bob-mcp-context-tax</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Sat, 06 Jun 2026 06:08:25 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3e20475f-5008-4457-884a-85794d31c6f6_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Open <a href="https://bob.ibm.com/">IBM Bob</a> or your coding assistant of choice, look at the token counter in the top-right corner, then connect a large MCP server. You can spend a meaningful part of the context window before you ask Bob to do real work.</p><p>I like MCP. I do not like acting as if it has no cost.</p><p>The current <a href="https://bob.ibm.com/docs/ide/core-concepts/context-window-management">Bob context window docs</a> are unusually direct about this. Bob gets a <code>200,000</code>-token context window, starts condensing the conversation at <code>140,000</code>, and includes MCP tool definitions in that same budget. The current <a href="https://bob.ibm.com/docs/ide/configuration/mcp/mcp-in-bob">Bob MCP docs</a> are just as blunt: disable unused tools because their definitions consume context.</p><p>For me, that changes the useful question. Bob can reach GitHub. The more useful question is &#8220;what did I put into the model before it even looked at my code?&#8221;</p><p>I wanted a number I could check, so I measured the current <a href="https://github.com/github/github-mcp-server">GitHub MCP Server</a> surface and compared it with a different kind of context waste: raw <code>git</code> output from a real dirty repository that I have locally. It is my development repository for The Main Thread. It has a ton of local changes, that only once in a while get pushed to Github. It is ideal to test so we can answer the question &#8220;is MCP bad?&#8221; A broad MCP server spends budget up front. A <code>git</code> or <code>gh</code> workflow spends budget later, when you push large outputs into the same chat. This article is about where that context budget goes.</p><h2><strong>What you need</strong></h2><p>You do not need a benchmark setup for this. You need one real repository, one Bob window, and a willingness to look at the token counter before trusting your first impression.</p><ul><li><p>IBM Bob (<a href="https://bob.ibm.com/trial">sign up for a free trial if you like</a>), with access to the current docs and the token counter in the chat panel</p></li><li><p>A repository with real local changes, not a staged toy example</p></li><li><p>Plain <code>git</code></p></li><li><p>Optional: GitHub CLI if you want to apply the same narrowing pattern to pull request work</p></li><li><p>About 20 minutes</p></li></ul><p>As of the date of writing, the current <a href="https://github.com/github/github-mcp-server/blob/main/docs/server-configuration.md">GitHub MCP Server configuration guide</a> says the default toolsets are <code>context</code>, <code>issues</code>, <code>pull_requests</code>, <code>repos</code>, and <code>users</code>. The current <a href="https://github.com/github/github-mcp-server/blob/main/docs/feature-flags.md">feature flag docs</a> also show how that surface can expand when you opt into more granular issue and pull request tools.</p><p>My token estimates here use <code>o200k_base</code> tokenization against the official GitHub MCP tool snapshots and the current local git payloads in said repository. o200k_base is OpenAI&#8217;s latest BPE (Byte Pair Encoding) tokenization algorithm used by advanced models like GPT-4o and later</p><p>Bob&#8217;s exact live tokenizer does surly differ a little. The underlying problem does not. </p><h2><strong>The quiet cost shows up before the task starts</strong></h2><p>The first cost is the catalog tax. I mean the tokens you spend on tool definitions before the task starts.</p><p>Bob&#8217;s docs say MCP tool definitions live inside the context window. The GitHub MCP docs say the default server surface includes five toolsets. So I pulled the official GitHub MCP tool snapshots and counted them.</p><p>The default GitHub MCP toolsets came out to about <code>17,201</code> tokens.</p><p>That is already a non-trivial slice of Bob&#8217;s budget:</p><ul><li><p>About <code>8.6 percent</code> of the full <code>200,000</code>-token window</p></li><li><p>About <code>12.3 percent</code> of the <code>140,000</code>-token condensation threshold</p></li></ul><p>The broader official snapshot surface I measured landed at about <code>37,787</code> tokens. That is roughly <code>18.9 percent</code> of the full window and <code>27.0 percent</code> of the condensation threshold. A few always-on servers are enough to reduce the free space quite a lot.</p><p>The default GitHub buckets were not evenly sized, either:</p><ul><li><p><code>repos</code> cost about <code>7,841</code> tokens</p></li><li><p><code>pull_requests</code> cost about <code>5,093</code> tokens</p></li><li><p><code>issues</code> cost about <code>3,541</code> tokens</p></li><li><p><code>context</code> and <code>users</code> were cheap by comparison at about <code>403</code> and <code>323</code></p></li></ul><p>That breakdown matters because it gives you a practical way to be more careful. If the task is code review, I would rather load a narrow GitHub review surface. I do not want repository, issue, discussion, action, and write-heavy tools in the same task just because &#8220;GitHub&#8221; sounds like one thing.</p><p>The current <a href="https://github.com/github/github-mcp-server/blob/main/docs/server-configuration.md">GitHub MCP Server configuration guide</a> supports exactly that kind of narrowing with <code>X-MCP-Toolsets</code>, <code>X-MCP-Tools</code>, <code>X-MCP-Exclude-Tools</code>, and read-only mode. The GitHub docs are making the same point as the Bob docs: shape the surface on purpose.</p><p>If I were setting up a review-oriented GitHub MCP connection in Bob, I would start closer to this than to &#8220;just enable everything&#8221;:</p><pre><code><code>{
  "mcpServers": {
    "github": {
      "url": "https://api.githubcopilot.com/mcp/",
      "headers": {
        "X-MCP-Toolsets": "issues,pull_requests",
        "X-MCP-Readonly": "true",
        "X-MCP-Exclude-Tools": "create_pull_request,merge_pull_request"
      }
    }
  }
}</code></code></pre><p>I am only showing the headers that shape the surface. Add authentication the same way you already do for your GitHub MCP setup. My point is the surface shape, not your secret-management style.</p><p>This is also where Bob&#8217;s <a href="https://bob.ibm.com/docs/ide/configuration/mcp/mcp-in-bob">project-level </a><code>.bob/mcp.json</code><a href="https://bob.ibm.com/docs/ide/configuration/mcp/mcp-in-bob"> support</a> matters more than teams admit. A global MCP setup gets crowded very quickly. A project-level setup is more likely to reflect what that repository is actually doing.</p><h2><strong>Git and gh also cost tokens</strong></h2><p>Skipping MCP and using <code>git</code> sounds simpler. The real picture is more mixed than that.</p><p>The second cost is the payload tax. I mean the tokens you spend when commands return large outputs.</p><p><code>git</code> and <code>gh</code> feel lighter because Bob does not need to carry a large tool catalog before the task starts. That is true. The problem comes later, when people spend that saving immediately by asking the agent to read a very large patch all at once.</p><p>I measured my current dirty working tree in my test repository because it has the kind of mess that shows bad habits clearly: edits, deletions, untracked publishing files, and generated assets.</p><p>I did not force a separate <code>gh</code> benchmark for the local-change case because that is not where <code>gh</code> is most useful. For &#8220;what changed in my working tree?&#8221; the useful comparison is broad MCP surface versus ordinary <code>git</code>. For pull request work, <code>gh</code> behaves much more like <code>git</code> than like MCP. It has almost no upfront catalog cost, and then a payload cost only when you ask for output.</p><p>The small commands were genuinely small:</p><ul><li><p><code>git status --short</code> came out to about <code>817</code> tokens</p></li><li><p><code>git diff --name-only</code> came out to about <code>619</code> tokens</p></li><li><p><code>git diff --stat</code> came out to about <code>655</code> tokens</p></li><li><p><code>git status --porcelain=v2</code> was fatter, but still small at about <code>3,261</code> tokens</p></li></ul><p>The full patch was much larger:</p><ul><li><p><code>git diff</code> came out to about <code>82,747</code> tokens</p></li><li><p><code>git diff --unified=0</code> was not meaningfully better here at about <code>83,206</code> tokens</p></li><li><p>Reading the current untracked text files as raw content came out to about <code>83,568</code> tokens</p></li></ul><p>That means a local-change prompt can use half the window even without MCP if you choose the broadest possible output.</p><p>The combined numbers are more serious:</p><ul><li><p>Default GitHub MCP surface plus the current tracked diff came out to about <code>99,948</code> tokens</p></li><li><p>Broader GitHub MCP surface plus the current tracked diff came out to about <code>120,534</code> tokens</p></li><li><p>Default GitHub MCP surface plus tracked diff plus current untracked text content came out to about <code>183,516</code> tokens</p></li><li><p>Broader GitHub MCP surface plus tracked diff plus current untracked text content came out to about <code>204,102</code> tokens</p></li></ul><p>That last number is already beyond Bob&#8217;s documented <code>200,000</code>-token window.</p><p>One result is useful and a little absurd. One deleted SVG in my repository accounted for about <code>37,787</code> tokens by itself. That single asset was roughly the same size as the broader GitHub MCP snapshot surface I measured.</p><p>This is why I do not like the simple &#8220;use the CLI instead&#8221; advice. The CLI does not solve the problem by itself. It is only a different way to waste the budget if you choose broad outputs too early.</p><h2><strong>Start local-change analysis with the cheap questions</strong></h2><p>If the prompt is &#8220;analyze my local changes,&#8221; I would not start with a full patch unless the repository is tiny or I already know the diff is clean text.</p><p>I would start like this:</p><pre><code><code>git status --short
git diff --stat
git diff --name-only</code></code></pre><p>That gives Bob three useful things at almost no cost:</p><ul><li><p>Which files changed</p></li><li><p>Rough size by file</p></li><li><p>Whether the mess is concentrated or spread out</p></li></ul><p>After that, narrow on purpose:</p><pre><code><code>git diff -- path/to/file
git diff -- path/to/second-file</code></code></pre><p>That pattern is boring, and that is good here. You are asking the model to tell you where deeper attention belongs before you give it the expensive context.</p><p>The same principle applies to GitHub CLI.</p><p>If the work is really about a pull request, I would rather start with small remote views such as <code>gh pr view</code> metadata or <code>gh pr diff --name-only</code>. I do not want to dump a full PR diff into the chat on turn one. The exact command matters less than the sequence:</p><ol><li><p>Ask what changed</p></li><li><p>Ask where the risky areas are</p></li><li><p>Read only the files that matter</p></li><li><p>Escalate to the full diff only if the first three steps justify it</p></li></ol><p>Here is the main operating rule in one sentence: <strong>narrow the catalog first, then narrow the payload.</strong></p><h2><strong>This is not an argument against MCP</strong></h2><p>I would still use GitHub MCP for workflows where it fits well.</p><p>It fits well when the work is mainly about GitHub:</p><ul><li><p>Reading issue state and comments</p></li><li><p>Reviewing a pull request with structured operations</p></li><li><p>Writing a review comment or updating GitHub state directly</p></li><li><p>Repeating the same repository workflow often enough that the upfront catalog cost is worth it</p></li></ul><p>I would lean on <code>git</code> or <code>gh</code> first when the work is structurally local:</p><ul><li><p>&#8220;Analyze my working tree&#8221;</p></li><li><p>&#8220;Tell me what changed before I commit&#8221;</p></li><li><p>&#8220;Which files deserve review first?&#8221;</p></li><li><p>&#8220;Did I accidentally mix three tasks into one diff?&#8221;</p></li></ul><p>This is the part I think teams mix together. They install a broad GitHub server globally, then ask a local-change question, then feel surprised that the agent is carrying a lot of GitHub machinery into a task that mostly needed <code>git status</code>, <code>git diff --stat</code>, and some restraint.</p><h2><strong>The experiment I would actually run</strong></h2><p>If you want to make this concrete in your own setup, run the same three tasks through three different Bob surfaces.</p><p>Use these setups:</p><ol><li><p>No GitHub MCP server at all. Let Bob use normal file and command tools with <code>git</code>.</p></li><li><p>A lean GitHub MCP setup with only the toolsets needed for review work.</p></li><li><p>A broad GitHub MCP setup that looks convenient but loads a lot of extra surface.</p></li></ol><p>Use tasks shaped like these:</p><ol><li><p>Analyze my current local changes and tell me which three files deserve deeper review first.</p></li><li><p>Summarize the risk in the deleted or moved areas.</p></li><li><p>Tell me what I need to read before I touch the GitHub workflow in this repository.</p></li></ol><p>Watch three things:</p><ul><li><p>The token counter before the first real action</p></li><li><p>The first tool or command Bob reaches for</p></li><li><p>Whether Bob narrows the problem or tries to swallow the whole repository at once</p></li></ul><p>I do not expect one setup to win every task. The right setup changes with the job. Broad always-on configurations fail more often than teams want to admit.</p><h2><strong>The default I would keep</strong></h2><p>If I were writing one rule for a team, it would be simple.</p><p>Keep GitHub MCP project-specific whenever you can. Keep the toolsets narrow. Prefer read-only for review-style work. Do not feed full diffs into the first turn. Treat generated assets as suspiciously expensive until proven otherwise. Ask the agent to rank files before it reads them in full.</p><p>That is also why I like Bob&#8217;s current docs on this point. They do not pretend the model can solve bad context hygiene by itself. The docs say the window is finite, tool definitions consume tokens, and disabling tools helps. Good. That is the more useful way to talk about this.</p><p>MCP is not the enemy. <code>git</code> is not the savior. Unbudgeted context is the problem. Sometimes that waste arrives as a large set of tool definitions. Sometimes it arrives as an 80k-token diff blob. Usually it arrives because nobody decided what the agent actually needed for this task.</p><p>This is also why I do not find &#8220;we just need a bigger context window&#8221; very convincing. A larger window can help in some cases, but it also lets bad context stay alive for longer. That includes stale instructions, irrelevant tool definitions, oversized diffs, and weak intermediate summaries that keep pushing the model in the wrong direction. I think of that as context window poisoning: the window is full, but too much of it is low-value, misleading, or simply old. A bigger window does not fix that. It often hides the problem for a while, increases cost, and delays the moment when the team learns to narrow the tool surface and the payload. <strong>Most of the time, better selection beats more storage.</strong></p><h2><strong>Conclusion</strong></h2><p>The useful mental model here is &#8220;catalog tax versus payload tax.&#8221; A broad GitHub MCP server can use tens of thousands of tokens before Bob touches your code, and a broad local <code>git diff</code> workflow can use just as much a minute later. The fix is simpler than the demos: narrower tool surfaces, narrower outputs, and a little more honesty about what should compete for attention in the same <code>200,000</code>-token window.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Build Zero-Trust Quarkus Services Without Guessing the Boundaries]]></title><description><![CDATA[Build three Quarkus services with app-managed mTLS, OIDC service tokens, and edge-only branch policy so you can prove whether failures come from transport, service identity, or business authorization.]]></description><link>https://www.the-main-thread.com/p/quarkus-zero-trust-microservices</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-zero-trust-microservices</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Fri, 05 Jun 2026 06:08:26 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fc7dd73a-cc88-4c06-8768-dfa5020d0d82_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The ugly part of zero-trust is rarely the first <code>application.properties</code> file. It is the moment an internal service starts accepting calls it should never have seen, and nobody can tell whether the caller was a real service, a stolen token, or a policy bug in the edge API.</p><p>I do not find security tutorials very useful when they jump from TLS to Keycloak to a couple of annotations and call that a design. The part that matters is whether you can answer three separate questions cleanly:</p><ul><li><p>Is this channel trustworthy?</p></li><li><p>Which service is calling me?</p></li><li><p>Where do business rules actually live?</p></li></ul><p>So that is what we do here. </p><h2><strong>What we build</strong></h2><p>We build <strong>LoanFlow</strong>: three <a href="https://quarkus.io/">Quarkus</a> services on JDK 25 with app-managed mTLS, OIDC service tokens on outbound REST clients, and branch-level policy at the public edge.</p><ul><li><p><strong>loan-service</strong> &#8212; public edge API. Accepts bearer tokens for human users, enforces branch ownership, orchestrates downstream calls.</p></li><li><p><strong>credit-service</strong> &#8212; internal only. Requires HTTPS with a client certificate and a bearer token with permission <code>credit_check_run</code>.</p></li><li><p><strong>document-service</strong> &#8212; internal only. Same transport rules; requires permission <code>document_write</code>.</p></li></ul><p>By the end you can prove:</p><ul><li><p>A Berlin loan officer reads and submits their own loan</p></li><li><p>A Hamburg officer gets <code>403</code> on a Berlin loan</p></li><li><p>A direct call to <code>credit-service</code> without a client certificate fails at TLS</p></li><li><p>A call with mTLS but no bearer token gets <code>403</code></p></li><li><p>A second submit of the same loan returns <code>409</code></p></li></ul><h2><strong>Prerequisites</strong></h2><p>You need a normal local Java setup plus the usual tools for certificates and test traffic. Use whatever Quarkus CLI version you already have installed. It pins the platform BOM in each generated <code>pom.xml</code>, so this tutorial does not depend on you matching one exact CLI release.</p><ul><li><p>JDK 25</p></li><li><p>Quarkus CLI</p></li><li><p>Podman (Dev Services starts Keycloak in a Podman container)</p></li><li><p>OpenSSL, <code>keytool</code>, <code>curl</code>, <code>jq</code></p></li><li><p>Familiarity with JAX-RS and OIDC bearer tokens</p></li><li><p>About &#9749;&#65039;&#9749;&#65039;&#9749;&#65039;</p></li></ul><p>Create the workspace root and the shared support directories first:</p><pre><code><code>mkdir -p loanflow-zero-trust/{infrastructure,scripts}
cd loanflow-zero-trust</code></code></pre><p>The three service directories get created by the Quarkus commands in the next step. <a href="https://github.com/myfear/the-main-thread/tree/main/loanflow-zero-trust">Copy infrastructure and scripts from my Github repository</a>.</p><h2><strong>Project setup</strong></h2><p>Create the three applications from the repo root. Run each command once:</p><pre><code><code>quarkus create app com.mainthread.loanflow:loan-service \
  --extension='rest-jackson,rest-client-jackson,oidc,rest-client-oidc-filter,tls-registry,smallrye-health' \
  --java=25 --no-code

quarkus create app com.mainthread.loanflow:credit-service \
  --extension='rest-jackson,oidc,tls-registry,smallrye-health' \
  --java=25 --no-code

quarkus create app com.mainthread.loanflow:document-service \
  --extension='rest-jackson,oidc,tls-registry,smallrye-health' \
  --java=25 --no-code</code></code></pre><p>Extensions and why they matter:</p><ul><li><p><strong>rest-jackson</strong> &#8212; Quarkus REST with JSON for edge and internal APIs</p></li><li><p><strong>rest-client-jackson</strong> + <strong>rest-client-oidc-filter</strong> &#8212; typed outbound clients that attach service tokens automatically (<code>loan-service</code> only)</p></li><li><p><strong>oidc</strong> &#8212; bearer token validation on every secured endpoint</p></li><li><p><strong>tls-registry</strong> &#8212; named TLS configurations for edge HTTPS, internal mTLS servers, and internal mTLS clients</p></li><li><p><strong>smallrye-health</strong> &#8212; liveness probes when you deploy this later</p></li></ul><p>Add test dependencies in each module POM: <code>rest-assured</code> and <code>quarkus-test-security-oidc</code> (test scope). Internal services also need <code>quarkus-test-oidc-server</code> for <code>OidcWiremockTestResource</code>.</p><h2><strong>Architecture on purpose</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zmwG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zmwG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 424w, https://substackcdn.com/image/fetch/$s_!zmwG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 848w, https://substackcdn.com/image/fetch/$s_!zmwG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 1272w, https://substackcdn.com/image/fetch/$s_!zmwG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zmwG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png" width="784" height="210" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:210,&quot;width&quot;:784,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17853,&quot;alt&quot;:&quot;Architecture Overview&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.the-main-thread.com/i/199030919?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Architecture Overview" title="Architecture Overview" srcset="https://substackcdn.com/image/fetch/$s_!zmwG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 424w, https://substackcdn.com/image/fetch/$s_!zmwG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 848w, https://substackcdn.com/image/fetch/$s_!zmwG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 1272w, https://substackcdn.com/image/fetch/$s_!zmwG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d005d5d-4c4d-4d26-8dd9-fa02055de04f_784x210.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>Transport</strong> &#8212; Internal hops use app-managed mTLS through the <a href="https://quarkus.io/guides/tls-registry-reference">TLS registry</a>. We are not using a service mesh here because I want the transport layer to stay visible in config you can grep.</p><p><strong>Service identity</strong> &#8212; <code>loan-service</code> acquires its own token with the <a href="https://quarkus.io/guides/security-openid-connect-client-reference">OIDC client and REST client filter</a>. I would rather let Quarkus do that work than hand-roll <code>Authorization</code> headers and rediscover the annoying parts ourselves.</p><p><strong>Business authorization</strong> &#8212; User-specific rules stay in <code>loan-service</code>. <code>credit-service</code> and <code>document-service</code> never decide whether Alice from Berlin can touch loan <code>LN-100</code>. They decide whether the caller is an allowed internal service with the right permission. A <code>client_credentials</code> token does not represent a human loan officer, and pretending otherwise is how these demos get fuzzy fast.</p><p><strong>Out of scope for this demo</strong> &#8212; service mesh, OPA, Keycloak authorization services, and propagating the end-user token downstream. If <code>credit-service</code> later needs user-level policy, you are in token propagation or token exchange territory. That is a different tutorial.</p><h2><strong>Generate certificates</strong></h2><p>Run <code>./scripts/generate-certs.sh</code> from the repo root. It creates a local CA, one certificate per service, and a shared truststore under <code>infrastructure/certs/</code>.</p><p>Each service directory gets <code>tls.key</code>, <code>tls.crt</code>, and <code>keystore.p12</code>. The shared truststore contains only the CA. PKCS12 password is <code>changeit</code> for local dev.</p><h2><strong>Keycloak with Dev Services</strong></h2><p>Copy <code>infrastructure/keycloak/loanflow-realm.json</code> into each module as <code>src/main/resources/loanflow-realm.json</code>. Quarkus <a href="https://quarkus.io/guides/security-openid-connect-dev-services">Dev Services for Keycloak</a> starts Keycloak in a Podman container when you run <code>quarkus:dev</code>, imports that realm, and wires <code>quarkus.oidc.auth-server-url</code> for you. That saves us from turning local setup into a side quest.</p><p>Leave <code>quarkus.oidc.auth-server-url</code> unset in dev mode, because setting it disables Dev Services. Pin the host port so the <code>curl</code> examples stay stable:</p><pre><code><code>quarkus.keycloak.devservices.realm-name=loanflow
quarkus.keycloak.devservices.realm-path=loanflow-realm.json
quarkus.keycloak.devservices.port=8180

quarkus.oidc.application-type=service</code></code></pre><p>All three services carry the same Dev Services block. Quarkus shares one Keycloak container between dev-mode processes on the same machine. Start <code>loan-service</code> first if you want a predictable boot order, but any of the three can bring Keycloak up.</p><p>The realm defines:</p><ul><li><p><strong>Users</strong> &#8212; <code>alice</code>/<code>alice</code> (Berlin, <code>loan_officer</code>), <code>bob</code>/<code>bob</code> (Hamburg, <code>loan_officer</code>), <code>admin</code>/<code>admin</code> (<code>loan_admin</code>)</p></li><li><p><strong>loanflow-cli</strong> &#8212; confidential client with direct access grants for local password-grant token retrieval (<code>loanflow-cli-secret</code>)</p></li><li><p><strong>loan-service</strong> &#8212; confidential client with service account; default scopes <code>credit_check_run</code> and <code>document_write</code></p></li></ul><p>Realm <strong>roles</strong> (not Keycloak Groups) map to <code>@RolesAllowed</code>. The <code>loanflow-cli</code> client uses the same pattern as the <a href="https://github.com/myfear/the-main-thread">dpop-demo</a> Keycloak realm: <code>roles</code><strong> in </strong><code>defaultClientScopes</code>, with a full <code>roles</code><strong> client scope</strong> definition that adds <code>realm_access.roles</code> to the access token. The separate <code>branch</code><strong> client scope</strong> adds the branch claim. Service token scopes map to <code>@PermissionsAllowed</code> on internal endpoints.</p><p>In the Keycloak admin UI, an empty <strong>Groups</strong> list is expected. Check <strong>Clients &#8594; loanflow-cli &#8594; Client scopes &#8594; Default</strong> for <code>roles</code> and <code>branch</code>, and <strong>Users &#8594; alice &#8594; Role mapping &#8594; Realm roles</strong> for <code>loan_officer</code>.</p><p>The password grant is here as a local test shortcut, nothing more. I would not turn that into the real browser story.</p><h2><strong>Implement loan-service</strong></h2><p><code>loan-service</code> is the policy enforcement point. It knows users, branches, and loan state.</p><p>Seed data in an in-memory repository:</p><ul><li><p><code>LN-100</code>, branch <code>berlin</code>, status <code>DRAFT</code></p></li><li><p><code>LN-200</code>, branch <code>hamburg</code>, status <code>DRAFT</code></p></li><li><p><code>LN-300</code>, branch <code>berlin</code>, status <code>SUBMITTED</code></p></li></ul><p>Expose:</p><ul><li><p><code>GET /api/loans/{loanId}</code></p></li><li><p><code>POST /api/loans/{loanId}/submit</code></p></li></ul><p>Coarse access uses <code>@RolesAllowed({"loan_officer", "loan_admin"})</code>. Keycloak puts realm roles in <code>realm_access.roles</code>; tell Quarkus where to find them:</p><pre><code><code>quarkus.oidc.roles.role-claim-path=realm_access/roles</code></code></pre><p>Fine-grained branch rules live in <code>LoanAccessPolicy</code>. Read <code>branch</code> from the JWT &#8212; it does not land on <code>SecurityIdentity</code> automatically:</p><pre><code><code>@ApplicationScoped
public class LoanAccessPolicy {

    private final CallerContext callerContext;

    public LoanAccessPolicy(CallerContext callerContext) {
        this.callerContext = callerContext;
    }

    public void checkCanRead(LoanApplication loan) {
        if (callerContext.hasRole("loan_admin")) {
            return;
        }

        String branch = callerContext.branch();
        if (!callerContext.hasRole("loan_officer") || !loan.branch().equals(branch)) {
            throw new ForbiddenException();
        }
    }

    public void checkCanSubmit(LoanApplication loan) {
        checkCanRead(loan);
        if (loan.status() != LoanStatus.DRAFT) {
            throw new WebApplicationException("Loan is not in DRAFT status", 409);
        }
    }
}</code></code></pre><p><code>CallerContext</code> is <code>@RequestScoped</code> and reads <code>branch</code> from <code>JsonWebToken</code> at runtime (falling back to <code>SecurityIdentity</code> attributes in tests). <code>ForbiddenAccessMapper</code> turns branch denials into <code>403</code> with a small JSON body so <code>curl</code> output is obvious.</p><p><code>LoanApplicationService.submit()</code> runs in order: load loan &#8594; <code>checkCanSubmit</code> &#8594; call <code>credit-service</code> &#8594; call <code>document-service</code> &#8594; persist <code>SUBMITTED</code> only after both succeed.</p><p>The outbound REST client attaches a service token automatically:</p><pre><code><code>@Path("/internal/credit-checks")
@RegisterRestClient(configKey = "credit-service")
@OidcClientFilter("internal-calls")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public interface CreditServiceClient {

    @POST
    CreditCheckResponse run(CreditCheckRequest request);
}</code></code></pre><p>Create <code>DocumentServiceClient</code> the same way with <code>configKey = "document-service"</code> and path <code>/internal/documents</code>.</p><h2><strong>Implement credit-service and document-service</strong></h2><p>Internal services stay small on purpose. Once these endpoints start owning business rules too, the boundary gets muddy fast. Here they validate transport and permission, then do one job.</p><pre><code><code>@Path("/internal/credit-checks")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public class CreditResource {

    @Inject
    CreditDecisionService creditDecisionService;

    @POST
    @PermissionsAllowed("credit_check_run")
    public CreditCheckResponse run(CreditCheckRequest request) {
        return creditDecisionService.run(request);
    }
}</code></code></pre><p><code>document-service</code> mirrors this with <code>@PermissionsAllowed("document_write")</code> on <code>POST /internal/documents</code>. Credit scoring is deterministic fake logic &#8212; this tutorial is about security boundaries, not bureau accuracy.</p><h2><strong>Wire the configuration</strong></h2><p>This is the part I usually check first in security demos, because it is where hand-wavy examples turn back into engineering. Every property below matters.</p><h3><strong>loan-service</strong></h3><p>Edge HTTPS without client certificates from human callers:</p><pre><code><code>quarkus.http.ssl-port=8443
quarkus.http.insecure-requests=disabled
quarkus.http.tls-configuration-name=edge

quarkus.tls.edge.key-store.p12.path=../infrastructure/certs/loan-service/keystore.p12
quarkus.tls.edge.key-store.p12.password=changeit

quarkus.tls.internal-client.key-store.p12.path=../infrastructure/certs/loan-service/keystore.p12
quarkus.tls.internal-client.key-store.p12.password=changeit
quarkus.tls.internal-client.trust-store.p12.path=../infrastructure/certs/truststore.p12
quarkus.tls.internal-client.trust-store.p12.password=changeit

quarkus.oidc.application-type=service

quarkus.keycloak.devservices.realm-name=loanflow
quarkus.keycloak.devservices.realm-path=loanflow-realm.json
quarkus.keycloak.devservices.port=8180

quarkus.oidc-client.internal-calls.auth-server-url=${quarkus.oidc.auth-server-url}
quarkus.oidc-client.internal-calls.client-id=loan-service
quarkus.oidc-client.internal-calls.credentials.secret=loan-service-secret
quarkus.oidc-client.internal-calls.grant.type=client
quarkus.oidc-client.internal-calls.early-tokens-acquisition=false

quarkus.rest-client.credit-service.url=https://localhost:8444
quarkus.rest-client.credit-service.tls-configuration-name=internal-client

quarkus.rest-client.document-service.url=https://localhost:8445
quarkus.rest-client.document-service.tls-configuration-name=internal-client</code></code></pre><p>What breaks when this is wrong:</p><ul><li><p>Missing <code>edge</code> TLS name &#8594; no HTTPS on 8443</p></li><li><p>Wrong truststore on <code>internal-client</code> &#8594; <code>PKIX path building failed</code> on downstream calls</p></li><li><p>Setting <code>quarkus.oidc.auth-server-url</code> in dev &#8594; Dev Services never starts Keycloak</p></li><li><p>Missing OIDC client <code>internal-calls</code> &#8594; REST client cannot fetch a service token</p></li><li><p><code>early-tokens-acquisition=true</code> &#8594; short-lived tokens may expire before the first downstream call</p></li></ul><h3><strong>credit-service and document-service</strong></h3><p>Internal only &#8212; require client certificates:</p><pre><code><code>quarkus.http.ssl-port=8444
quarkus.http.insecure-requests=disabled
quarkus.http.tls-configuration-name=internal-server
quarkus.http.ssl.client-auth=REQUIRED

quarkus.tls.internal-server.key-store.p12.path=../infrastructure/certs/credit-service/keystore.p12
quarkus.tls.internal-server.key-store.p12.password=changeit
quarkus.tls.internal-server.trust-store.p12.path=../infrastructure/certs/truststore.p12
quarkus.tls.internal-server.trust-store.p12.password=changeit

quarkus.oidc.application-type=service

quarkus.keycloak.devservices.realm-name=loanflow
quarkus.keycloak.devservices.realm-path=loanflow-realm.json
quarkus.keycloak.devservices.port=8180</code></code></pre><p>Use port <code>8445</code> and <code>document-service</code> cert paths for <code>document-service</code>.</p><p>What breaks:</p><ul><li><p>Missing <code>client-auth=REQUIRED</code> &#8594; anyone with a valid token reaches the internal API over TLS</p></li><li><p>Truststore missing the CA &#8594; valid client certificates fail during handshake</p></li><li><p>Hard-coded <code>quarkus.oidc.auth-server-url</code> in dev &#8594; Dev Services disabled; token validation fails if nothing listens on that URL</p></li></ul><h2><strong>Run the system</strong></h2><p>After the certificates exist, start each service in its own terminal. The first <code>quarkus:dev</code> process starts Keycloak in Podman; the others attach to the shared container:</p><pre><code><code>cd loan-service &amp;&amp; ./mvnw quarkus:dev
cd credit-service &amp;&amp; ./mvnw quarkus:dev
cd document-service &amp;&amp; ./mvnw quarkus:dev</code></code></pre><p>Wait until the startup log shows Keycloak listening on port 8180 before running token <code>curl</code> commands.</p><p>HTTPS URLs:</p><ul><li><p><code>loan-service</code> &#8212; https://localhost:8443</p></li><li><p><code>credit-service</code> &#8212; https://localhost:8444</p></li><li><p><code>document-service</code> &#8212; https://localhost:8445</p></li></ul><h2><strong>Prove it</strong></h2><p>Run these commands from the <code>loanflow-zero-trust/</code><strong> repo root</strong> &#8212; the certificate paths are relative to that directory. Your shell prompt should not be <code>~</code> unless the repo happens to live there.</p><pre><code><code>cd /path/to/loanflow-zero-trust

export CA=infrastructure/certs/ca/ca.crt
export LOAN_CERT=infrastructure/certs/loan-service</code></code></pre><p>Get a user token for Alice (local test shortcut):</p><pre><code><code>export USER_TOKEN=$(
  curl -sf http://localhost:8180/realms/loanflow/protocol/openid-connect/token \
    --user loanflow-cli:loanflow-cli-secret \
    -H 'content-type: application/x-www-form-urlencoded' \
    -d 'username=alice&amp;password=alice&amp;grant_type=password' | jq -r '.access_token'
)

echo "$USER_TOKEN" | awk -F. '{print $2}' | python3 -c "import sys,base64,json; p=sys.stdin.read().strip(); p+=('='*(-len(p)%4)); print(json.dumps(json.loads(base64.urlsafe_b64decode(p)), indent=2))"

test -n "$USER_TOKEN" &amp;&amp; test "$USER_TOKEN" != "null"</code></code></pre><p>If <code>test</code> fails, Keycloak is not ready or the realm import missed <code>loanflow-cli</code>. Re-check the Dev Services startup log.</p><p>Read a Berlin loan through the edge:</p><pre><code><code>curl -i --cacert "$CA" \
  -H "Authorization: Bearer $USER_TOKEN" \
  https://localhost:8443/api/loans/LN-100</code></code></pre><p>Expected: <code>HTTP/2 200</code> with the loan JSON.</p><p>Same call with Bob&#8217;s token against <code>LN-100</code>:</p><pre><code><code>export BOB_TOKEN=$(
  curl -sf http://localhost:8180/realms/loanflow/protocol/openid-connect/token \
    --user loanflow-cli:loanflow-cli-secret \
    -H 'content-type: application/x-www-form-urlencoded' \
    -d 'username=bob&amp;password=bob&amp;grant_type=password' | jq -r '.access_token'
)

test -n "$BOB_TOKEN" &amp;&amp; test "$BOB_TOKEN" != "null"

curl -i --cacert "$CA" \
  -H "Authorization: Bearer $BOB_TOKEN" \
  https://localhost:8443/api/loans/LN-100</code></code></pre><p>Expected: <code>HTTP/2 403</code> with <code>{"error":"access_denied"}</code>. Quarkus often returns an empty body on <code>403</code>; this demo maps branch denials to a small JSON payload so the status is obvious in the terminal. The loan-service log should still show <code>Loan access denied ... caller=bob callerBranch=hamburg loanBranch=berlin</code>.</p><p>If you see <code>401</code> instead, the token is missing, expired, or stale &#8212; re-run the <code>export BOB_TOKEN=...</code> block above. Branch policy only runs after OIDC accepts the bearer token.</p><p>Prove the internal API is really internal. Fetch a service token:</p><pre><code><code>export SERVICE_TOKEN=$(
  curl -sf http://localhost:8180/realms/loanflow/protocol/openid-connect/token \
    --user loan-service:loan-service-secret \
    -H 'content-type: application/x-www-form-urlencoded' \
    -d 'grant_type=client_credentials' | jq -r '.access_token'
)</code></code></pre><p>Call <code>credit-service</code> without a client certificate &#8212; expect TLS handshake failure:</p><pre><code><code>curl --cacert "$CA" \
  -H "Authorization: Bearer $SERVICE_TOKEN" \
  -H 'content-type: application/json' \
  -d '{"loanId":"LN-100","applicantId":"alice"}' \
  https://localhost:8444/internal/credit-checks</code></code></pre><p>Call with the <code>loan-service</code> certificate but no bearer token &#8212; expect <code>403</code>:</p><pre><code><code>curl -i --cacert "$CA" \
  --cert "$LOAN_CERT/tls.crt" \
  --key "$LOAN_CERT/tls.key" \
  -H 'content-type: application/json' \
  -d '{"loanId":"LN-100","applicantId":"alice"}' \
  https://localhost:8444/internal/credit-checks</code></code></pre><p>The client certificate satisfies mTLS, but without a bearer token the caller has no OIDC permission &#8212; Quarkus returns <code>403</code>. Plain HTTP tests in <code>@QuarkusTest</code> (no client cert) typically show <code>401</code> instead.</p><p>Call with mTLS and the service token &#8212; expect <code>200</code>:</p><pre><code><code>curl --cacert "$CA" \
  --cert "$LOAN_CERT/tls.crt" \
  --key "$LOAN_CERT/tls.key" \
  -H "Authorization: Bearer $SERVICE_TOKEN" \
  -H 'content-type: application/json' \
  -d '{"loanId":"LN-100","applicantId":"alice"}' \
  https://localhost:8444/internal/credit-checks</code></code></pre><p>Drive the full flow:</p><pre><code><code>curl --cacert "$CA" \
  -X POST \
  -H "Authorization: Bearer $USER_TOKEN" \
  https://localhost:8443/api/loans/LN-100/submit</code></code></pre><p>Expected: <code>200</code>, credit band in the response, audit document stored, second submit returns <code>409</code>.</p><h3><strong>Watch the logs</strong></h3><p>Run each service in its own terminal. After a successful submit, you should see the trust chain play out across three consoles &#8212; user token at the edge, service token on internal hops:</p><pre><code><code>loan-service     Loan submit loanId=LN-100 by alice branch=berlin
loan-service     Calling credit-service for loanId=LN-100
credit-service   Credit check loanId=LN-100 band=D caller=service-account-internal-calls
loan-service     Credit band D for loanId=LN-100
loan-service     Calling document-service for loanId=LN-100
document-service Stored audit document loanId=LN-100 branch=berlin creditBand=D caller=service-account-internal-calls
loan-service     Loan submitted loanId=LN-100 creditBand=D</code></code></pre><p>Denied requests log too &#8212; branch mismatch on read returns <code>403</code> with a <code>Loan access denied</code> line; a second submit on the same loan returns <code>409</code> with <code>Loan submit rejected</code>.</p><p>The services log principals and loan ids only. They never print bearer tokens or JWT bodies.</p><p>Or run <code>./scripts/smoke-test.sh</code> from the repo root once all three services are up &#8212; it walks every failure path in order.</p><h2><strong>Tests worth keeping</strong></h2><p>Module tests rot fast when someone renames a claim or changes a port, so keep at least:</p><ul><li><p><strong>loan-service</strong> &#8212; <code>LoanAccessPolicyTest</code> (unit), <code>LoanResourceSecurityTest</code> (<code>@QuarkusTest</code> + <code>@TestSecurity</code> for branch/admin/401)</p></li><li><p><strong>credit-service</strong> / <strong>document-service</strong> &#8212; <code>@QuarkusTest</code> with <code>OidcWiremockTestResource</code> proving <code>401</code> without a token</p></li></ul><p>For bearer-token tests on internal services, I would rather use Wiremock OIDC than drag Keycloak into every <code>./mvnw test</code> run. For end-to-end transport behavior, the smoke script is still worth more than ten paragraphs of architecture confidence.</p><h2><strong>Troubleshooting</strong></h2><p><code>401</code><strong> from loan-service when you expected branch </strong><code>403</code> &#8212; the bearer token is missing, expired, or invalid. Fetch a fresh token and confirm <code>test -n "$BOB_TOKEN"</code>. Branch policy runs only after OIDC validation succeeds.</p><p><code>curl</code><strong> prints nothing</strong> &#8212; Quarkus often returns an empty body on <code>401</code>/<code>403</code>. Add <code>-i</code> to see the status line. A <code>403</code> with a valid token usually means the access token is missing realm roles. Decode the token and confirm it contains <code>realm_access.roles</code> with <code>loan_officer</code>:</p><pre><code><code>echo "$USER_TOKEN" | awk -F. '{print $2}' | python3 -c "import sys,base64,json; p=sys.stdin.read().strip(); p+=('='*(-len(p)%4)); print(json.dumps(json.loads(base64.urlsafe_b64decode(p)), indent=2))"</code></code></pre><p>After changing <code>loanflow-realm.json</code>, Dev Services does <strong>not</strong> overwrite an existing realm. Stop all services, run <code>./scripts/reset-keycloak.sh</code>, start <code>loan-service</code> first, then fetch a fresh token.</p><p><code>curl: (77) error setting certificate verify locations</code> &#8212; you are not in the <code>loanflow-zero-trust/</code> repo root, or <code>./scripts/generate-certs.sh</code> has not been run. <code>cd</code> to the repo and confirm <code>infrastructure/certs/ca/ca.crt</code> exists. The smoke script resolves paths automatically; hand-typed <code>curl</code> commands need the same working directory.</p><p><code>PKIX path building failed</code> &#8212; truststore missing the CA, or REST client points at the wrong TLS bucket.</p><p><strong>TLS handshake fails before the request is logged</strong> &#8212; usually good news. <code>client-auth=REQUIRED</code> is doing its job; client certificate missing or untrusted.</p><p><code>401</code><strong> from internal services</strong> &#8212; bearer token missing, expired, wrong issuer, or OIDC metadata unreachable.</p><p><code>403</code><strong> from internal services</strong> &#8212; mTLS succeeded but bearer token missing or token valid without the permission required by <code>@PermissionsAllowed</code>.</p><p><code>403</code><strong> from loan-service</strong> &#8212; business rule fired. Check <code>branch</code> claim, roles, and seeded loan branch.</p><p>Enable OIDC trace logging when token validation is unclear:</p><pre><code><code>quarkus.log.category."io.quarkus.oidc.runtime.OidcProvider".level=TRACE
quarkus.log.category."io.quarkus.oidc.runtime.OidcProvider".min-level=TRACE</code></code></pre><h2><strong>Make it survive production</strong></h2><p>This demo is deliberately local and small. Production needs a few extra moves:</p><ul><li><p>Replace the local CA with real internal PKI or a cert-manager pipeline</p></li><li><p>Move secrets out of <code>application.properties</code> into a supported secret source</p></li><li><p>Add <code>quarkus.ssl.native=true</code> for services that make HTTPS calls in native mode</p></li><li><p>Rotate certificates and client secrets on a schedule that exists outside human memory</p></li><li><p>Add correlation IDs so a rejected downstream request traces across services</p></li><li><p>Consider audience enforcement or token exchange if downstream services later need user context</p></li></ul><p>I would not move branch policy into internal services just because the local demo works. That is where these systems start getting weird.</p><h2><strong>Close the loop</strong></h2><p>In a microservice system, zero-trust is only useful when each trust decision is visible and testable.</p><p>In this design, the channel is protected with mTLS, the internal caller proves itself with a service token, and user-level policy stays where the loan aggregate actually lives. When a request fails, you can tell whether it failed at transport, service identity, or business policy. That is the part I care about, because it turns security from vibes into something you can actually debug.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[How to Keep a Shared Bob MCP Server From Acting Like One User]]></title><description><![CDATA[How delegated access, time-limited tokens, and audit-ready identity keep shared MCP infrastructure from turning into a service account.]]></description><link>https://www.the-main-thread.com/p/bob-mcp-attribution</link><guid isPermaLink="false">https://www.the-main-thread.com/p/bob-mcp-attribution</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Thu, 04 Jun 2026 06:08:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/16ded947-4f8f-4c23-a9eb-4431b24a6089_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Sooner or later every shared MCP rollout runs into the same boring question: who actually did this?</p><p>I do not mean which tool name showed up in the trace or which server handled the POST. I mean who approved the deployment, closed the ticket, queried the customer record, or changed the access policy.</p><p>If the honest answer is &#8220;Bob&#8221; or, worse, the name of one long-lived bearer token sitting behind a shared server, then the architecture is unfinished. You did not build user-aware automation. You built a service account funnel with better branding.</p><p>That sounds harsh, but the current IBM Bob guidance is blunt enough to justify it. The <a href="https://bob.ibm.com/docs/ide/security/bob-security-guidance">IBM Bob security guidelines</a> say to use delegated access when needed, use time-limited tokens, monitor and audit all actions, and, for shared MCP servers, make sure actions are attributable to specific users or sessions. That is not a style preference. It is the difference between an automatable system and a future incident report full of shrugs.</p><p>I care about this part of enterprise MCP design because it hides inside something that looks efficient. A shared remote server is easy to centralize, easy to update, and easy to point every Bob user at through <code>.bob/mcp.json</code>. The <a href="https://bob.ibm.com/docs/ide/configuration/mcp/mcp-in-bob">Bob MCP docs</a> explicitly support project-level <code>.bob/mcp.json</code> files that teams can share through version control, and remote MCP configuration supports custom HTTP headers such as <code>Authorization</code>. The convenience is real. So is the way teams quietly reinvent the service account model and call it agent infrastructure.</p><h2><strong>The cheap version is one credential for everyone</strong></h2><p>The bad design usually starts out looking responsible, which is part of why it survives design review.</p><p>There is one remote MCP server for a shared internal system such as deployments, tickets, or change requests. Bob talks to it over HTTP. The team does not want every user setting up their own credentials yet, so someone provisions one token for the server and distributes the config.</p><p>It often looks like this:</p><pre><code><code>{
  "mcpServers": {
    "deploy-control": {
      "url": "https://mcp.internal.example.com/mcp",
      "headers": {
        "Authorization": "Bearer ${SHARED_DEPLOY_MCP_TOKEN}"
      }
    }
  }
}</code></code></pre><p>Using an environment variable instead of hardcoding the token is still better operational hygiene. It does not fix the architecture if everyone ends up using the same credential. Where the secret string lives is not the real issue. The issue is that the server sees one actor for every request.</p><p>That produces audit records like this:</p><pre><code><code>{
  "timestamp": "2026-05-23T10:14:09Z",
  "tool": "approve_deployment",
  "target": "checkout-service",
  "actor": "shared-bob-mcp",
  "result": "approved"
}</code></code></pre><p>This record answers almost nothing that matters after the fact. Which engineer asked for the action? Which Bob session was it tied to? Was the caller allowed to approve production or only staging? Was the token minted for five minutes or copied into a wiki six months ago? You can keep piling logs around this, but the identity hole stays put.</p><p>That is why I do not like hearing &#8220;we have auditing&#8221; when the audit trail only knows the server-side credential. You do not have auditing yet. You have a timestamped receipt from a shared robot account.</p><h2><strong>What attribution looks like in practice</strong></h2><p>For a shared MCP server, the minimum bar is simple even if the implementation takes work: the request that reaches the server must still be tied to a human user or a narrowly scoped session.</p><p>The <a href="https://modelcontextprotocol.io/specification/draft/basic/authorization">MCP authorization spec</a> frames HTTP-based authorization as the client accessing a protected MCP server on behalf of a resource owner. It also recommends short-lived access tokens. That lines up neatly with Bob&#8217;s own security guidance: delegated access, time-limited credentials, and auditable actions.</p><p>In practice, I want the remote server, or a gateway directly in front of it, to validate a user-bound token and emit a structured audit event that survives the entire request path. At minimum, keep these fields:</p><ul><li><p>The human identity: <code>sub</code>, and, if your IdP provides it safely, a stable username or email</p></li><li><p>The issuing authority and target: <code>iss</code>, <code>aud</code>, and the scopes or roles that justified the action</p></li><li><p>Token lifetime and replay clues: <code>exp</code>, and a token or request identifier such as <code>jti</code> when available</p></li><li><p>MCP request correlation: session identifier, request identifier, tool name, and target system</p></li><li><p>Decision outcome: allowed, denied, or failed, with the policy reason when possible</p></li></ul><p>The useful shape is closer to this:</p><pre><code><code>{
  "timestamp": "2026-05-23T10:14:09Z",
  "actor": {
    "sub": "00u7x9b3...",
    "email": "sam@example.com"
  },
  "client": {
    "product": "IBM Bob",
    "mcpSessionId": "1f3a4b5c-6d7e-8f9a-0b1c-2d3e4f5a6b7c",
    "tool": "approve_deployment"
  },
  "policy": {
    "scopes": ["deploy:approve"],
    "delegated": true,
    "tokenExpiresAt": "2026-05-23T10:19:09Z"
  },
  "target": "checkout-service",
  "result": "approved"
}</code></code></pre><p>Now the record can answer a serious question. Sam approved the deployment. The call came through a specific Bob session. The server saw a delegated token with the <code>deploy:approve</code> scope, and that token expired five minutes later. That is the beginning of accountability, which is a lot more useful than another dashboard that can only tell you a tool fired.</p><p>Notice what I am not asking for here. I am not asking you to store full prompt transcripts as the only audit system, or to bury the truth in a blob store full of raw HTTP dumps. Good auditability is not maximal logging. It is the ability to answer who, what, when, which permission, and which target without needing a forensic hobby.</p><h2><strong>Useful MCP headers are not the same thing as identity</strong></h2><p>There is a subtle trap here because MCP over HTTP is getting better observability support. The current <a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/transports">MCP transport spec</a> defines Streamable HTTP as the standard remote transport, and <a href="https://modelcontextprotocol.io/seps/2243-http-standardization">SEP-2243</a> standardizes request headers such as <code>Mcp-Method</code>, <code>Mcp-Name</code>, and <code>Mcp-Session-Id</code> so proxies, gateways, and observability tools can reason about MCP traffic without tearing apart the JSON body.</p><p>I like that change. It helps routing, rate limiting, correlation, and troubleshooting. It does not solve attribution by itself.</p><p><code>Mcp-Session-Id</code> tells you that several requests belong to the same logical MCP session. It does not tell you which human sat behind that session. <code>Mcp-Method</code> and <code>Mcp-Name</code> tell you what kind of MCP operation happened. They do not tell you whether the caller was allowed to do it.</p><p>So keep the headers. They are useful. Just do not confuse request metadata with identity. A beautiful trace that still collapses every actor into one shared token is just a more expensive blind spot.</p><h2><strong>Prompt warnings are not policy</strong></h2><p>There is another shortcut teams try after they notice the identity problem. They keep the shared credential and add instructions around it:</p><ul><li><p>Only use this tool for the current user</p></li><li><p>Ask for confirmation before production actions</p></li><li><p>Never approve your own deployment</p></li></ul><p>Those rules are fine as defense in depth. They are not a security boundary.</p><p>The server has to enforce the policy because the server is where the blast radius lives. If the remote MCP tool can approve a deployment, create a change request, or read customer data, the authorization decision needs to happen with server-side knowledge of the user, the session, the scopes, and the target resource. Prompt text can remind the model. It cannot carry your authorization model.</p><p>This is also why I prefer denial events in the audit log. A server that records who was refused, for which action, and why, is much easier to trust than one that only logs successful calls and leaves failures to chat history.</p><h2><strong>Shared remote infrastructure is not always the right answer</strong></h2><p>Bob&#8217;s own transport guidance is refreshingly practical here. The <a href="https://bob.ibm.com/docs/ide/configuration/mcp/server-transports">Bob transport docs</a> describe STDIO as lower latency, simpler, and &#8220;inherently more secure with no network exposure,&#8221; and they recommend it for security-sensitive local operations. The MCP transport spec also says clients should support <code>stdio</code> whenever possible.</p><p>That matters because not every useful tool needs to be a shared service.</p><p>If the capability is mostly local to the developer machine, such as reading a repository, running a formatter, checking a build, or querying a local scratch database, a shared remote MCP server may add centralization without adding enough value. You take on network exposure, shared-service hardening, multi-user authorization, and attribution requirements just to avoid installing a local tool.</p><p>I would keep these workflows local by default:</p><ul><li><p>Repository and filesystem helpers</p></li><li><p>Local build, test, and formatting tools</p></li><li><p>Developer-specific scratch data or local diagnostics</p></li><li><p>Anything where the useful authority already comes from the workstation user</p></li></ul><p>I would consider a shared remote server when the target system is already centralized and the controls are worth centralizing too:</p><ul><li><p>Deployment approvals</p></li><li><p>Ticketing or change-management workflows</p></li><li><p>Internal data systems with real access policy</p></li><li><p>Shared operational tooling that needs consistent server-side enforcement</p></li></ul><p>That is the actual trade-off. Shared MCP infrastructure does not become &#8220;more enterprise&#8221; just because it is remote. It earns that label when the central server adds real policy, real auditability, and real operational control.</p><h2><strong>A short checklist before you roll this out</strong></h2><p>If you are about to stand up a shared MCP server for Bob, I would want these answers before the first broad rollout:</p><ol><li><p>Can the server identify the human user behind each action, not just the Bob client or a shared token?</p></li><li><p>Are credentials delegated and time-limited, or did you quietly create a durable service account path?</p></li><li><p>Does the audit event capture actor, session, tool, target, permission, and outcome in structured form?</p></li><li><p>Are authorization decisions enforced server-side for every sensitive tool call?</p></li><li><p>Do denied actions get logged with enough context to review policy behavior later?</p></li><li><p>Did you keep local-only workflows on STDIO instead of centralizing them out of habit?</p></li><li><p>If someone asks &#8220;who approved this at 10:14 UTC on May 23, 2026,&#8221; can you answer without opening a chat transcript?</p></li></ol><p>If the answer to the last question is no, stop there. The server is not ready yet.</p><h2><strong>Conclusion</strong></h2><p>The interesting failure mode in shared MCP is not tool power. That part is obvious. The failure shows up when the first working version looks good enough while quietly erasing the human actor from the record.</p><p>IBM Bob&#8217;s current security guidance does us a favor by saying the quiet part out loud: use delegated access, use time-limited tokens, audit actions, and make shared MCP activity attributable to specific users or sessions. Once you accept that, the architecture gets simpler. A shared MCP server is either identity-aware infrastructure, or it is a service account wearing an agent costume.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Model Routing in Quarkus LangChain4j with Ollama]]></title><description><![CDATA[Build a two-lane Quarkus service that classifies prompts, routes cheap questions to a fast local model, and keeps routing decisions testable and observable.]]></description><link>https://www.the-main-thread.com/p/quarkus-langchain4j-model-routing</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-langchain4j-model-routing</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Wed, 03 Jun 2026 06:08:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5be49c2f-b88e-4026-bf31-11ac23fd2f41_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>ForgeCI&#8217;s on-call queue looked fine until someone opened the inference bill. &#8220;What flag disables cache?&#8221; and &#8220;why does my pipeline OOM only on cached arm64 builds?&#8221; both hit the same model at the same per-token rate. The cheap questions subsidized the expensive ones.</p><p>This tutorial fixes that. We build <strong>ForgeAssist</strong>: a Quarkus service that classifies each incoming prompt, routes it to the cheapest Ollama model that can plausibly handle it, and logs the routing decision without blocking the HTTP response. The sample stays small: one enum, one AI service, one router, one observer, and one REST endpoint.</p><h2><strong>What you will be able to do</strong></h2><p>By the end you can:</p><ol><li><p>Configure multiple named Ollama model instances in one Quarkus application.</p></li><li><p>Declare a structured-output AiService that returns a Java enum from an LLM.</p></li><li><p>Inject named <code>ChatModel</code> beans programmatically and pick between them at runtime.</p></li><li><p>Fire a CDI event to make routing decisions observable without polluting the response path.</p></li><li><p>Write a <code>@QuarkusTest</code> that asserts routing behavior instead of answer quality.</p></li></ol><h2><strong>What we build</strong></h2><p><strong>ForgeAssist</strong> exposes <code>POST /assist</code> with a plain-text question body. Every request flows through a classifier on the fast lane (<code>qwen2.5:0.5b</code>), then dispatches to either the fast lane or the power lane (<code>llama3.2</code>) based on a <code>Complexity</code> enum. A CDI observer logs the decision asynchronously.</p><p>The request lands on the REST resource, the router asks the classifier on the fast model, then it picks a lane, returns the answer, and fires an async routing event on the way out.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XOgX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XOgX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 424w, https://substackcdn.com/image/fetch/$s_!XOgX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 848w, https://substackcdn.com/image/fetch/$s_!XOgX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 1272w, https://substackcdn.com/image/fetch/$s_!XOgX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XOgX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png" width="1456" height="577" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:577,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:552922,&quot;alt&quot;:&quot;Flowchart&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.the-main-thread.com/i/198937281?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Flowchart" title="Flowchart" srcset="https://substackcdn.com/image/fetch/$s_!XOgX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 424w, https://substackcdn.com/image/fetch/$s_!XOgX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 848w, https://substackcdn.com/image/fetch/$s_!XOgX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 1272w, https://substackcdn.com/image/fetch/$s_!XOgX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b40f15d-b0b6-40de-b88c-bbbc317f4a2c_6355x2520.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Named model</strong> &#8212; a second configuration block in <code>application.properties</code>, distinguished by a string identifier.</p><p><strong>AiService</strong> &#8212; a Java interface Quarkus implements at build time by calling an LLM.</p><p><strong>Structured output</strong> &#8212; the return type drives schema-aware parsing; you do not hand-parse enum strings.</p><p><code>@ModelName</code> &#8212; a LangChain4j CDI qualifier that selects which named model bean to inject.</p><p><strong>CDI event</strong> &#8212; decoupled notification; <code>fire()</code> is synchronous, <code>fireAsync()</code> is asynchronous.</p><h2><strong>What you need</strong></h2><p>You have written <code>@QuarkusTest</code> before and know what a LangChain4j AI service interface looks like.</p><ul><li><p>JDK <strong>25</strong> (<code>java -version</code>)</p></li><li><p>Maven <strong>3.9+</strong> (<code>mvn -version</code>)</p></li><li><p><a href="https://ollama.com/">Ollama</a> installed and running (<code>ollama serve</code>)</p></li><li><p>Models pulled: <code>ollama pull qwen2.5:0.5b</code> and <code>ollama pull llama3.2</code></p></li><li><p>Quarkus CLI installed (<code>quarkus --version</code>) &#8212; optional; Maven wrapper works fine</p></li><li><p>About <strong>two &#9749;&#65039;&#9749;&#65039;</strong></p></li></ul><h2><strong>Project setup</strong></h2><pre><code><code>quarkus create app dev.forgeassist:forgeassist-routing \
  --package-name=dev.forgeassist \
  --extension='quarkus-rest,quarkus-langchain4j-ollama,quarkus-arc' \
  --java=25 \
  --no-code
cd forgeassist-routing</code></code></pre><p>When you add the Ollama extension, the generator also adds <code>quarkus-langchain4j-bom</code> to <code>dependencyManagement</code>. </p><p>Add test dependencies to <code>pom.xml</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;xml&quot;,&quot;nodeId&quot;:&quot;e2f38c76-8b08-4fab-9184-4a3caaf9a818&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-xml"> &lt;dependency&gt;
            &lt;groupId&gt;io.quarkus&lt;/groupId&gt;
            &lt;artifactId&gt;quarkus-junit-mockito&lt;/artifactId&gt;
            &lt;scope&gt;test&lt;/scope&gt;
        &lt;/dependency&gt;
        &lt;dependency&gt;
            &lt;groupId&gt;io.rest-assured&lt;/groupId&gt;
            &lt;artifactId&gt;rest-assured&lt;/artifactId&gt;
            &lt;scope&gt;test&lt;/scope&gt;
        &lt;/dependency&gt;</code></pre></div><p>Run a quick sanity check before business logic:</p><pre><code><code>quarkus dev</code></code></pre><p>Quarkus should start and Dev UI should be at <code>http://localhost:8080/q/dev</code>. No model call happens yet. We are only confirming the extension graph wires up.</p><h2><strong>Configure two Ollama models</strong></h2><p>This block is the spine of the tutorial. Misread it and you will debug phantom CDI issues for an hour.</p><p><code>src/main/resources/application.properties</code>:</p><pre><code><code>quarkus.application.name=forgeassist-routing

# Explicit host Ollama &#8212; Dev Services off for teaching clarity
quarkus.langchain4j.ollama.devservices.enabled=false
quarkus.langchain4j.ollama.base-url=http://localhost:11434

# Power lane (default unnamed bean)
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.chat-model.temperature=0.7
quarkus.langchain4j.timeout=60s

# Fast lane (named "fast")
quarkus.langchain4j.fast.chat-model.provider=ollama
quarkus.langchain4j.ollama.fast.base-url=${quarkus.langchain4j.ollama.base-url}
quarkus.langchain4j.ollama.fast.chat-model.model-id=qwen2.5:0.5b
quarkus.langchain4j.ollama.fast.chat-model.temperature=0.0
quarkus.langchain4j.ollama.fast.timeout=15s</code></code></pre><p>Named model config has two parts: <code>quarkus.langchain4j.&lt;name&gt;.*</code> at the provider level and <code>quarkus.langchain4j.ollama.&lt;name&gt;.*</code> for Ollama-specific settings. Leave <code>&lt;name&gt;</code> out and you configure the default bean. Add <code>fast</code> and CDI gets a second bean exposed through <code>@ModelName("fast")</code>. The <code>quarkus.langchain4j.fast.chat-model.provider=ollama</code> line matters; skip it and the named lane does not resolve cleanly.</p><p>Temperature <code>0.0</code> on the fast model is deliberate. The classifier must be deterministic; a stochastic classifier that occasionally says SIMPLE when it means COMPLEX defeats the purpose.</p><p>Dev Services is off on purpose. Quarkus LangChain4j can auto-start Ollama and pull models in dev mode, which is convenient locally and bad for a tutorial that is supposed to show the real endpoint and the real failure path.</p><p>One quick sanity check before you move on: if Dev Services is off and Ollama is unreachable, or <code>qwen2.5:0.5b</code> is missing, what does the first AI request do? Make the prediction, then try it. That is the failure you want to recognize later when the logs are less friendly.</p><h2><strong>The complexity enum</strong></h2><p>I use a binary enum rather than SIMPLE / MODERATE / COMPLEX. More tiers need more calibration and prompt tuning, and they only help when you have three meaningfully different models to route to. Start with two lanes; extend later if the economics justify it.</p><p><code>src/main/java/dev/forgeassist/Complexity.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;b9fedc98-9da4-465b-be77-0951a8c8f334&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

public enum Complexity {

    /**
     * Factual lookups, single-step how-tos, definitional questions.
     * Examples: "What does --dry-run do?", "List the ForgeCI environment
     * variables."
     */
    SIMPLE,

    /**
     * Multi-step reasoning, debugging with context, architectural trade-offs,
     * ambiguous or environment-specific problems.
     * Examples: "Why does my pipeline OOM only on cached arm64 builds?"
     */
    COMPLEX
}</code></pre></div><h2><strong>The classifier AiService</strong></h2><p>Quarkus LangChain4j can return Java types from an AiService method, not just <code>String</code>. With an enum return type, the framework injects the enum schema into the prompt and deserializes the response, so there is no hand-rolled parsing code. The trade-off is still real: a confused model can return something unparseable. For a two-lane classifier with a small context window, I think that risk is low enough.</p><p><code>src/main/java/dev/forgeassist/PromptClassifier.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;5d1e07be-b3ee-41c1-82f6-e58d6ff32239&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(modelName = "fast")
public interface PromptClassifier {

    @SystemMessage("""
            You are a prompt complexity classifier for ForgeCI, a CI/CD platform.
            Classify the user's question into exactly one of: SIMPLE, COMPLEX.

            SIMPLE: factual lookups, flag definitions, single-step how-tos,
                    questions answerable from documentation alone.

            COMPLEX: debugging with environment context, multi-step reasoning,
                     architectural trade-offs, questions that require inference
                     beyond what documentation states.

            Respond with ONLY the enum value. No punctuation. No explanation.
            """)
    Complexity classify(@UserMessage String prompt);
}</code></pre></div><p><code>modelName = "fast"</code> on the classifier matters. Classification is already the cheap job. Using the power model to decide whether to use the power model is circular and a little silly.</p><p>The system prompt uses ForgeCI-flavored examples on purpose. Generic &#8220;SIMPLE: short questions&#8221; prompts look tidy and perform worse once you hit the awkward edge cases.</p><p>At the wire, a simple flag question looks like this:</p><pre><code><code>{
  "model": "qwen2.5:0.5b",
  "messages": [
    {
      "role": "system",
      "content": "You are a prompt complexity classifier..."
    },
    {
      "role": "user",
      "content": "What does the --no-cache flag do in forge build?"
    }
  ]
}</code></code></pre><p>Response: <code>SIMPLE</code></p><h2><strong>Routing decision record</strong></h2><p>We define the record first so the <code>fireAsync</code> line later is moving a real type, not a mystery blob.</p><p><code>src/main/java/dev/forgeassist/RoutingDecision.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;4c1a2a05-0f16-46fa-b530-e2d13866a3fa&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import java.time.Instant;

/**
 * Immutable record of a single model routing decision.
 * Fired as a CDI event; consumed by observers for logging and metrics.
 */
public record RoutingDecision(
        String prompt,
        Complexity complexity,
        String selectedModel,
        long classificationMillis,
        Instant timestamp) {

    public RoutingDecision(
            String prompt, Complexity complexity, String selectedModel, long classificationMillis) {
        this(prompt, complexity, selectedModel, classificationMillis, Instant.now());
    }
}</code></pre></div><p>Records give you immutability for free. The CDI event keeps the routing decision decoupled from whoever wants to watch it, so the router does not care whether you only log today or add metrics tomorrow.</p><h2><strong>The routing event observer</strong></h2><p><code>src/main/java/dev/forgeassist/RoutingEventObserver.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;135ab96d-aa38-4ad4-8cfb-29e5e6ef6725&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import org.jboss.logging.Logger;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.ObservesAsync;

@ApplicationScoped
public class RoutingEventObserver {

    private static final Logger LOG = Logger.getLogger(RoutingEventObserver.class);

    public void onRoutingDecision(@ObservesAsync RoutingDecision decision) {
        LOG.infof(
                "[ROUTING] complexity=%s model=%s classificationMs=%d prompt=\"%s\"",
                decision.complexity(),
                decision.selectedModel(),
                decision.classificationMillis(),
                truncate(decision.prompt(), 80));
    }

    private static String truncate(String s, int max) {
        return s.length() &lt;= max ? s : s.substring(0, max) + "&#8230;";
    }
}</code></pre></div><p>That separation is the whole point: the observer does not know about the router, the router does not know about the observer, and adding a Micrometer counter later does not require another pass through <code>ModelRouter</code>.</p><h2><strong>The model router</strong></h2><p>This is the center of the sample, so we build it in layers.</p><h3><strong>Shell and injections</strong></h3><p><code>@ModelName</code> is a Quarkus LangChain4j qualifier &#8212; not <code>@Named</code>. Injection without a qualifier resolves to the default (unnamed) bean, which is the power model here.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;144b4431-391e-4bb0-899c-ee0517af6ff3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import dev.langchain4j.model.chat.ChatModel;
import io.quarkiverse.langchain4j.ModelName;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Event;
import jakarta.inject.Inject;

@ApplicationScoped
public class ModelRouter {

    private final PromptClassifier classifier;
    private final ChatModel fastModel;
    private final ChatModel powerModel;
    private final Event&lt;RoutingDecision&gt; routingEvents;

    @Inject
    public ModelRouter(
            PromptClassifier classifier,
            @ModelName("fast") ChatModel fastModel,
            ChatModel powerModel,
            Event&lt;RoutingDecision&gt; routingEvents) {
        this.classifier = classifier;
        this.fastModel = fastModel;
        this.powerModel = powerModel;
        this.routingEvents = routingEvents;
    }

    // route() follows below
}</code></pre></div><h3><strong>Classification and timing</strong></h3><p>At this point in execution, only the fast model has run (via the classifier AiService). The power model has done nothing.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;c28f9bf4-9c40-4095-a8a2-cf9f682d4533&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">public String route(String userPrompt) {
        long start = System.currentTimeMillis();
        Complexity complexity = classifier.classify(userPrompt);
        long classificationMillis = System.currentTimeMillis() - start;

        // dispatch continues below
    }</code></pre></div><h3><strong>Dispatch and event</strong></h3><p>Once the target model changes at runtime, drop down to the raw LangChain4j <code>ChatModel.chat(ChatRequest)</code> API instead of wrapping another AiService around it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;6598c7e9-d7c6-4c46-acdd-d5691c1978a9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">        ChatModel selected = switch (complexity) {
            case SIMPLE -&gt; fastModel;
            case COMPLEX -&gt; powerModel;
        };

        ChatRequest request = ChatRequest.builder().messages(UserMessage.from(userPrompt)).build();

        String response = selected.chat(request).aiMessage().text();

        routingEvents.fireAsync(
                new RoutingDecision(
                        userPrompt,
                        complexity,
                        complexity == Complexity.SIMPLE ? "qwen2.5:0.5b" : "llama3.2",
                        classificationMillis));

        return response;
    }</code></pre></div><p>Complete <code>ModelRouter.java</code> with imports:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;95833db1-effd-41d6-95cd-166ad65bfea5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import io.quarkiverse.langchain4j.ModelName;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Event;
import jakarta.inject.Inject;

@ApplicationScoped
public class ModelRouter {

    private final PromptClassifier classifier;
    private final ChatModel fastModel;
    private final ChatModel powerModel;
    private final Event&lt;RoutingDecision&gt; routingEvents;

    @Inject
    public ModelRouter(
            PromptClassifier classifier,
            @ModelName("fast") ChatModel fastModel,
            ChatModel powerModel,
            Event&lt;RoutingDecision&gt; routingEvents) {
        this.classifier = classifier;
        this.fastModel = fastModel;
        this.powerModel = powerModel;
        this.routingEvents = routingEvents;
    }

    public String route(String userPrompt) {
        long start = System.currentTimeMillis();
        Complexity complexity = classifier.classify(userPrompt);
        long classificationMillis = System.currentTimeMillis() - start;

        ChatModel selected = switch (complexity) {
            case SIMPLE -&gt; fastModel;
            case COMPLEX -&gt; powerModel;
        };

        ChatRequest request = ChatRequest.builder().messages(UserMessage.from(userPrompt)).build();

        String response = selected.chat(request).aiMessage().text();

        routingEvents.fireAsync(
                new RoutingDecision(
                        userPrompt,
                        complexity,
                        complexity == Complexity.SIMPLE ? "qwen2.5:0.5b" : "llama3.2",
                        classificationMillis));

        return response;
    }
}</code></pre></div><p><code>fireAsync</code> keeps the HTTP response path clean: the caller gets the answer immediately and the observer runs asynchronously. That makes sense for diagnostics and lightweight counters. It is a bad fit for side effects you cannot afford to lose unless you also have a plan for seeing and recovering from async observer failures.</p><h2><strong>REST endpoint</strong></h2><p><code>src/main/java/dev/forgeassist/ForgeAssistResource.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;a699b673-bbdc-43ac-98cb-e94b3090c692&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/assist")
public class ForgeAssistResource {

    private final ModelRouter router;

    @Inject
    public ForgeAssistResource(ModelRouter router) {
        this.router = router;
    }

    @POST
    @Consumes(MediaType.TEXT_PLAIN)
    @Produces(MediaType.TEXT_PLAIN)
    public String ask(String question) {
        return router.route(question);
    }
}</code></pre></div><h2><strong>Prove it (live Ollama)</strong></h2><p>With <code>quarkus dev</code> running and Ollama on http://localhost:11434</p><pre><code><code># Simple prompt &#8212; expect fast lane when classification cooperates
curl -s -X POST http://localhost:8080/assist \
  -H "Content-Type: text/plain" \
  -d "What does the --no-cache flag do in forge build?"

# Complex prompt &#8212; expect power lane
curl -s -X POST http://localhost:8080/assist \
  -H "Content-Type: text/plain" \
  -d "My ForgeCI pipeline passes locally but fails on cached arm64 runners with an OOM error only when layer cache is warm. What should I investigate first?"
</code></code></pre><p>You want a log line like this:</p><pre><code><code>[ROUTING] complexity=SIMPLE model=qwen2.5:0.5b classificationMs=... prompt="What does the --no-cache flag do in forge build?"</code></code></pre><p>The exact <code>classificationMs</code> value moves around, and because the log comes from an async observer it may show up just after the HTTP response. The signal that matters is <code>complexity</code><strong> + </strong><code>model</code>, not exact timings or identical answer text. Small classifiers will mislabel some edge cases. That is normal, and the production section below is where the trade-off starts to matter.</p><h2><strong>Production risks</strong></h2><p>Once this works in dev, the pleasant part is over. These are the problems that show up in a real team.</p><p><strong>Misclassification is more expensive than over-routing.</strong> The dangerous failure mode is a debugging-heavy prompt stamped SIMPLE and sent to the cheap lane, which then answers confidently and badly. When in doubt, route up, not down.</p><p><strong>Logging the raw prompt is a demo convenience.</strong> Prompt text can contain tokens, stack traces, customer data, or internal hostnames. Production systems often hash, redact, or sample this field.</p><p><strong>Routing adds a second model hop.</strong> Latency is now classification plus answer. Keep the classifier on a fast local model, but set explicit timeout boundaries on both lanes and decide what the API does when classification times out.</p><p><code>fireAsync()</code><strong> buys latency at the cost of failure visibility.</strong> Observer failures are harder to spot than synchronous method failures. If the event becomes business-critical, move that concern to a durable mechanism instead of pretending CDI async delivery is a queue.</p><p><strong>Dev Services is a local convenience, not production topology.</strong> The app talks to a known Ollama endpoint, models must already exist there, and health checks should reflect that dependency honestly.</p><h2><strong>Tests</strong></h2><p>LLM output moves around, so tests that assert answer content age badly. The deterministic part, and the part worth locking down first, is the routing behavior: did the classifier run, and did the router pick the expected lane? The HTTP layer is deterministic too: did <code>POST /assist</code> delegate to the router?</p><p>Use <code>@InjectMock</code> with Mockito. <code>ModelRouterTest</code> replaces the classifier and both <code>ChatModel</code> beans so the router never calls Ollama. <code>ForgeAssistResourceTest</code> replaces the router so the HTTP test stays thin and boring, which is exactly what you want from a resource test.</p><p>I would not make the first tutorial test about asynchronous CDI delivery. Readers need to trust the lane decision before they care about observer scheduling.</p><p><code>src/test/java/dev/forgeassist/ModelRouterTest.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;222bbb36-03a7-4f3d-b4ea-87b218c16875&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.clearInvocations;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.verifyNoInteractions;
import static org.mockito.Mockito.when;

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import io.quarkiverse.langchain4j.ModelName;
import io.quarkus.test.InjectMock;
import io.quarkus.test.junit.QuarkusTest;
import jakarta.inject.Inject;

@QuarkusTest
class ModelRouterTest {

    @Inject
    ModelRouter router;

    @InjectMock
    PromptClassifier classifier;

    @InjectMock
    @ModelName("fast")
    ChatModel fastModel;

    @InjectMock
    ChatModel powerModel;

    @BeforeEach
    void stubModels() {
        when(fastModel.chat(any(ChatRequest.class))).thenReturn(response("fast-lane"));
        when(powerModel.chat(any(ChatRequest.class))).thenReturn(response("power-lane"));
        clearInvocations(fastModel, powerModel);
    }

    @Test
    void simplePromptUsesFastLane() {
        String prompt = "What does the --no-cache flag do in forge build?";
        when(classifier.classify(prompt)).thenReturn(Complexity.SIMPLE);
        clearInvocations(classifier);

        String answer = router.route(prompt);

        assertEquals("fast-lane", answer);
        verify(classifier).classify(prompt);
        verify(fastModel).chat(any(ChatRequest.class));
        verifyNoInteractions(powerModel);
    }

    @Test
    void complexPromptUsesPowerLane() {
        String prompt = "Why does my pipeline OOM only on cached arm64 builds?";
        when(classifier.classify(prompt)).thenReturn(Complexity.COMPLEX);
        clearInvocations(classifier);

        String answer = router.route(prompt);

        assertEquals("power-lane", answer);
        verify(classifier).classify(prompt);
        verify(powerModel).chat(any(ChatRequest.class));
        verifyNoInteractions(fastModel);
    }

    private static ChatResponse response(String text) {
        return ChatResponse.builder().aiMessage(AiMessage.from(text)).build();
    }
}</code></pre></div><p><code>src/test/java/dev/forgeassist/ForgeAssistResourceTest.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;bab4923d-5dd1-4cb1-bfea-c8e601183021&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.forgeassist;

import static io.restassured.RestAssured.given;
import static org.hamcrest.Matchers.is;
import static org.mockito.Mockito.when;

import org.junit.jupiter.api.Test;

import io.quarkus.test.InjectMock;
import io.quarkus.test.junit.QuarkusTest;

@QuarkusTest
class ForgeAssistResourceTest {

    @InjectMock
    ModelRouter router;

    @Test
    void assistEndpointDelegatesToRouter() {
        when(router.route("What does --no-cache do?")).thenReturn("fast-lane");

        given()
                .contentType("text/plain")
                .body("What does --no-cache do?")
                .when()
                .post("/assist")
                .then()
                .statusCode(200)
                .body(is("fast-lane"));
    }
}</code></pre></div><p>Run tests (no running Ollama required for assertions):</p><pre><code><code>./mvnw test</code></code></pre><p>Expect <strong>BUILD SUCCESS</strong> and <strong>3</strong> tests.</p><h2><strong>Extension exercises</strong></h2><p>If you still feel adventurous. Some homework ideas:</p><p><strong>Three-tier routing</strong> &#8212; Add <code>MODERATE</code> to the enum, introduce a third Ollama model (for example <code>qwen2.5:3b</code>), and update the system prompt and switch expression. How does the classifier prompt need to change to distinguish MODERATE from both SIMPLE and COMPLEX?</p><p><strong>Confidence-aware routing</strong> &#8212; Return a record <code>ClassificationResult(Complexity complexity, int confidencePercent)</code> instead of a bare enum. Route SIMPLE classifications with <code>confidencePercent &lt; 70</code> to the power model as a fallback.</p><p><strong>Micrometer metrics observer</strong> &#8212; Add a second CDI observer that increments a <code>Counter</code> per complexity tier and expose the metrics endpoint.</p><p><strong>Dev Services contrast</strong> &#8212; If you have Docker available, try Quarkus Dev Services for Ollama and compare startup experience with the explicit host configuration used here.</p><h2><strong>Close the loop</strong></h2><p>ForgeAssist classifies each prompt on a cheap local model, routes SIMPLE questions to <code>qwen2.5:0.5b</code>, routes COMPLEX ones to <code>llama3.2</code>, and logs the decision without blocking the caller. That fixes the opening problem: trivial and heavy questions no longer burn the same expensive default.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[What Should Go in an AGENTS.md File?]]></title><description><![CDATA[A small repo instruction file can cut scope drift, bad assumptions, and review noise by telling coding agents how your project actually works.]]></description><link>https://www.the-main-thread.com/p/coding-agent-operating-manual</link><guid isPermaLink="false">https://www.the-main-thread.com/p/coding-agent-operating-manual</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Tue, 02 Jun 2026 06:08:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7c800679-adbf-4854-b7d2-66c8c2efbe9e_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I tend to like the AI post where somebody shares a <code>CLAUDE.md</code> or repo rule file, says &#8220;these 21 rules changed everything,&#8221; and then the quote tweet count starts to explode.</p><p>I like those posts more than most AI productivity advice because they point to something real. A coding agent gets much better once it stops guessing how your project works. The model name, the filename, and the exact rule count are the forgettable parts.</p><p>That sounds obvious until you watch one work without any local guidance.</p><p>By default, a coding model knows a lot about syntax and just enough about software engineering to be dangerous in the same cheerful way a junior engineer with root access would be dangerous. It can produce code. It can explain code. It can also decide that your tiny bug fix is a great time to &#8220;clean up&#8221; three unrelated files, assume a requirement you never stated, and speak about its own uncertainty with the confidence of a generated release note.</p><p>This breaks for a simpler reason: missing local context. Your agent does not know what &#8220;done&#8221; means in this repo, which changes are out of bounds, which conventions are non-negotiable, or when it should stop and ask instead of making its own decisions.</p><p>That is what the instruction file is for.</p><p>Call it <code>AGENTS.md</code>, <code>CLAUDE.md</code>, repo rules, or a small pile of working notes. The name matters a lot less than the job. You are giving the agent an operating manual for this codebase so it can stop guessing its way through every task.</p><h2><strong>The four rules that pull the most weight</strong></h2><p>I do not think the magic number is 21. Four rules do most of the real work.</p><ol><li><p><strong>Ask instead of assume.</strong><br>If requirements are unclear, the agent should surface the ambiguity instead of silently choosing a path. Wrong confidence is more expensive than one extra question.</p></li><li><p><strong>Start with the smallest working change.</strong><br>Agents love broad cleanup because it gives them more surface area to look helpful on. That rarely means the task is going well, so I would strongly prefer small, reversible diffs.</p></li><li><p><strong>Do not touch unrelated code.</strong><br>Most teams do not mind extra polish in theory. They mind it very much when a review gets slow because the assistant &#8220;also improved&#8221; files nobody asked about.</p></li><li><p><strong>Flag uncertainty before acting.</strong><br>A decent agent should know the difference between &#8220;I found the bug&#8221; and &#8220;I have a plausible theory.&#8221; The second state is fine. Hiding it is not.</p></li></ol><p>Those four rules are bigger than any one model. They are the difference between an assistant and an intern who keeps rearranging your desk while answering the wrong question.</p><h2><strong>What the file is really doing</strong></h2><p>I think of a good agent instruction file as a boundary-setting document. Its job is to reduce four common failure modes.</p><p><strong>Scope drift</strong><br>The agent starts on a bug fix, notices a naming inconsistency, then notices an outdated helper, then notices a nearby refactor opportunity, and suddenly your one-line change reaches much farther than it should. A rules file tells it what &#8220;stay in scope&#8221; actually means here.</p><p><strong>Bad defaults</strong><br>Every model comes with habits. Some over-explain. Some rush to code. Some make confident assumptions to keep the turn moving. Some will happily take a destructive action if you did not explicitly say &#8220;ask first.&#8221; If you do not set the defaults, the model will.</p><p><strong>Decision loss</strong><br>Longer sessions get weird because the model forgets which trade-offs were already settled. You said no refactor. Later it proposes one. You said keep the old API. Later it &#8220;improves&#8221; it. This is where a tiny memory stack helps more than another clever prompt.</p><p><strong>Polite nonsense</strong><br>This is my least favorite failure mode. The model has partial evidence, but it keeps writing as if it has a proof. A good working agreement makes uncertainty a first-class output, not an embarrassment to hide.</p><p>That is why the screenshot-style rule sets often split into three buckets: defaults, behavior, and memory. I think that split is sane, and I would keep it.</p><h2><strong>What to put in a generalized </strong><code>AGENTS.md</code></h2><p>If I were writing one file meant to survive changing models, I would keep these sections.</p><h3><strong>Defaults</strong></h3><p>This is where you define the repo&#8217;s normal operating mode.</p><ul><li><p>Keep changes narrow and task-focused</p></li><li><p>Prefer the simplest fix that works</p></li><li><p>Match existing style unless the task is explicitly a redesign</p></li><li><p>Ask before destructive actions</p></li><li><p>Explain what changed in plain language</p></li><li><p>Do not invent requirements, APIs, or test results</p></li></ul><p>Defaults matter because they catch the moments when the prompt is too short to say all of that again.</p><h3><strong>Behavior</strong></h3><p>This section tells the agent how to work, not just what to produce.</p><ul><li><p>Restate the task in concrete terms before making major changes</p></li><li><p>Confirm assumptions when they change behavior or risk</p></li><li><p>Show diffs or summarize edits clearly</p></li><li><p>Stop if the new plan expands beyond the original ask</p></li><li><p>Keep the user in the loop before irreversible steps</p></li></ul><p>This is about visibility, not ceremony. Most people can tolerate an agent being wrong for a turn. They get annoyed when it is wrong and quiet about it.</p><h3><strong>Memory</strong></h3><p>This is the part many teams skip, and for me it is where a lot of the quality gain comes from.</p><p>Your agent does not just need rules. It needs a place to pin decisions that should survive the next twenty turns.</p><ul><li><p><code>DECISIONS.md</code> for active trade-offs and chosen direction</p></li><li><p><code>KNOWN_ISSUES.md</code> or <code>ERRORS.md</code> for failures already observed</p></li><li><p><code>SESSION.md</code> for where work stopped and what should happen next</p></li><li><p><strong>Permanent facts</strong> for &#8220;never rename this endpoint&#8221; or &#8220;this file mirrors an external contract&#8221;</p></li></ul><p>You do not need all of these in every repo. You just need enough memory to stop paying for the same rediscovery loop over and over.</p><h3><strong>Escalation</strong></h3><p>This is where the agent learns when to slow down.</p><ul><li><p>Ask when two paths have different product consequences</p></li><li><p>Ask when requirements conflict with existing constraints</p></li><li><p>Ask before deleting, renaming, or migrating broad surfaces</p></li><li><p>Say when confidence is low</p></li><li><p>Prefer a question over a fake conclusion</p></li></ul><p>If you only add one behavioral rule, make it this kind of rule. A coding agent that escalates well is dramatically easier to trust.</p><h2><strong>A template I would actually use</strong></h2><p>Here is a generalized version I would be comfortable dropping into a project root today:</p><pre><code><code># AGENTS.md

## Goal

Help with this repository by making the smallest correct change that solves the
requested task.

## Defaults

- Stay within the requested scope
- Prefer the simplest working fix over broad rewrites
- Match existing code style and project structure
- Do not edit unrelated files just because they could be improved
- Ask before destructive or irreversible actions
- Do not claim tests passed unless you actually ran them
- Do not invent requirements, endpoints, configs, or results

## Working style

- Restate the task briefly before major edits
- Surface assumptions when they affect behavior
- Show what changed and why
- Keep diffs easy to review
- Stop and ask if the task expands into a refactor or redesign

## Uncertainty

- Flag low-confidence conclusions explicitly
- Prefer "I need to verify X" over guessing
- When there are multiple reasonable paths, present the trade-off

## Memory

- Read `DECISIONS.md` before making architectural changes
- Read `KNOWN_ISSUES.md` before debugging recurring failures
- Update `SESSION.md` with current status when work stops mid-task
- Treat pinned project facts as constraints, not suggestions

## Guardrails

- Never hide uncertainty behind polished wording
- Never change public contracts without calling it out
- Never mix requested work with opportunistic cleanup unless asked
</code></code></pre><p>That file is boring, which is exactly what you want here. The best rule file in a repo is usually the one that prevents avoidable messes, not the one that sounds smartest on social media.</p><h2><strong>This is really about interface design</strong></h2><p>What I like about these rule-file experiments is that they quietly move the conversation away from model hype and back toward engineering.</p><p>Better outcomes do not come only from upgrading the model. You also get them by improving the interface between the model and the project. A local instruction file, a short decision log, and a few explicit escalation rules can do more for day-to-day quality than another round of &#8220;which coding assistant is winning this week.&#8221;</p><p>That is also why I would generalize the idea well beyond Claude.</p><p><code>AGENTS.md</code> is a better mental model than <code>CLAUDE.md</code> if you want something durable. Models change. Product names change. The failure modes stay stubbornly the same. Every coding agent needs help with scope, defaults, uncertainty, and memory. If your team solves those once in versioned text, you do not have to solve them again from scratch every time a new assistant shows up with a shinier demo.</p><h2><strong>The real win</strong></h2><p>For me, the win is simpler than &#8220;21 rules made the model smarter.&#8221; The model stopped having to guess what kind of teammate it was supposed to be, which is a much more useful improvement anyway.</p><p>If you use coding agents seriously, give them a local operating manual. Keep it short, keep it opinionated, and keep it honest about boundaries. Then add just enough memory that the same wrong turn does not have to be rediscovered every Tuesday.</p><p>For me, that is just good interface design for a very fast, very eager collaborator.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[AI Coding Break-Even: Cheap Tokens, Expensive Software]]></title><description><![CDATA[Why the real crossover in enterprise software is review, repair, and delivery overhead, not token price.]]></description><link>https://www.the-main-thread.com/p/ai-coding-break-even</link><guid isPermaLink="false">https://www.the-main-thread.com/p/ai-coding-break-even</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Mon, 01 Jun 2026 06:08:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/364f5f41-e852-44be-a546-8e78c7e2081f_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I keep seeing the same chart in AI budget discussions. Token bill on one side, developer salary on the other, tiny AI bar, giant human bar, conclusion achieved.</p><p>I do not like that chart because it prices the easiest part of software and quietly ignores the rest.</p><p>Enterprise teams do not buy keystrokes. They buy a change that made it through product scoping, design, implementation, review, testing, security checks, release plumbing, and production. If you compare tokens to one developer writing code, you are comparing a narrow input cost to a full delivery salary and acting surprised when the machine looks cheap.</p><p>As of <strong>May 21, 2026</strong>, <a href="https://developers.openai.com/api/docs/models/gpt-5.4/">OpenAI lists GPT-5.4 at $2.50 per million input tokens and $15 per million output tokens</a>. <a href="https://www.anthropic.com/pricing">Anthropic lists Claude Sonnet 4 at $3 per million input tokens and $15 per million output tokens</a>. Those are real numbers, and they are low enough that I am not going to pretend otherwise. AI-assisted coding is cheap on raw tokens.</p><p>The problem is that raw tokens are rarely where enterprise software wins or loses money.</p><h2><strong>The AI side is a spend band, not a number</strong></h2><p>The first thing I would stop doing is talking about &#8220;the AI cost&#8221; as if it were one number.</p><p>Anthropic&#8217;s <a href="https://code.claude.com/docs/en/costs">Claude Code cost guide</a> says the average across enterprise deployments is around <strong>$13 per developer per active day</strong>, roughly <strong>$150 to $250 per developer per month</strong>, and still <strong>below $30 per active day for 90% of users</strong>. That is a useful center of gravity. It tells you what ordinary enterprise usage looks like when people are not trying to become folklore on Hacker News.</p><p>The public ceiling is a different story. I pulled the live <a href="https://tkmx.odio.dev/api/usage?days=28">HN Tokenmaxxing feed</a> on <strong>May 21, 2026</strong> and recomputed the trailing seven-day window. That sample showed <strong>28 active users</strong>, about <strong>$54,739.46</strong> in total spend, a <strong>median active-user cost of $98.25 per day</strong>, an <strong>average of $279.28 per day</strong>, and top users burning <strong>$4,575 to $9,381 per week</strong> each.</p><p>Those numbers are not contradictory. They describe different populations.</p><p>Anthropic gives you a managed enterprise baseline. Tokenmaxxing gives you a public power-user ceiling full of people who use long context, multiple agents, and enough model calls to make finance curious. It is useful precisely because it is a bit unhinged. It shows the upper edge of behavior once teams stop treating AI like autocomplete and start treating it like a second operating mode.</p><p>It also moves around. Ten days earlier, the same public sample produced a materially higher median. That is another reason I would not build a strategy deck around one leaderboard screenshot. The AI side of the curve is a spend band, not a point estimate.</p><h2><strong>The denominator is the whole delivery chain</strong></h2><p>The usual chart also cheats on the human side.</p><p>To make the comparison less fuzzy, I modeled one modest enterprise feature slice using <a href="https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm">BLS median wages</a> for the roles that usually touch real delivery work, then converted those wages to loaded employer cost using the <a href="https://www.bls.gov/news.release/archives/ecec_03202026.htm">BLS private-industry compensation release published March 20, 2026</a>. That multiplier comes out to about <strong>1.43x wages</strong>.</p><p>The feature slice is intentionally ordinary: six hours of product or project work, six hours of design, 24 hours of implementation, eight hours of QA, three hours of security review, three hours of release or ops work, and four hours of management or review coordination. Nothing heroic. Nothing transformation-program sized. Just a change that still has to survive the whole system.</p><p>That model lands at about <strong>$4,496</strong> in loaded labor cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EKvE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EKvE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 424w, https://substackcdn.com/image/fetch/$s_!EKvE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 848w, https://substackcdn.com/image/fetch/$s_!EKvE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 1272w, https://substackcdn.com/image/fetch/$s_!EKvE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EKvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png" width="1456" height="1030" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1030,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:251637,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.the-main-thread.com/i/198658129?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EKvE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 424w, https://substackcdn.com/image/fetch/$s_!EKvE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 848w, https://substackcdn.com/image/fetch/$s_!EKvE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 1272w, https://substackcdn.com/image/fetch/$s_!EKvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a59786-4b8f-406c-81ba-761dfd230401_2302x1628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Implementation is only <strong>48.7%</strong> of the bill. Planning, design, QA, security, release, and coordination are the other <strong>51.3%</strong>.</p><p>That is the part the token-versus-salary chart keeps deleting. Even if code generation gets dramatically cheaper, more than half the lifecycle bill is still waiting for you after the model says it is done. Software is full of second-order costs. The model only touches some of them.</p><h2><strong>Break-even is a review curve</strong></h2><p>If I had to keep one chart and throw away the rest, I would keep the review curve.</p><p>The useful x-axis is not tokens. It is <strong>human review and repair hours required after the AI run</strong>.</p><p>For the same coding-shaped unit of work, the comparison is simple:</p><ul><li><p><strong>Human-only cost</strong> = loaded developer time</p></li><li><p><strong>AI-led cost</strong> = model spend + human review and repair time</p></li></ul><p>Using the feature model above, a loaded developer day is about <strong>$730</strong>, or about <strong>$91.25 per hour</strong>. Once you combine that with the current spend bands from Anthropic and Tokenmaxxing, you get this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Fv9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Fv9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 424w, https://substackcdn.com/image/fetch/$s_!7Fv9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 848w, https://substackcdn.com/image/fetch/$s_!7Fv9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!7Fv9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Fv9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png" width="1456" height="946" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:946,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:214279,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.the-main-thread.com/i/198658129?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Fv9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 424w, https://substackcdn.com/image/fetch/$s_!7Fv9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 848w, https://substackcdn.com/image/fetch/$s_!7Fv9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!7Fv9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ed0a46-464d-4af8-a172-5c069829a52d_2168x1408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At Anthropic&#8217;s enterprise average of <strong>$13 per active day</strong>, AI still beats a human day on raw cost until the human spends almost <strong>7.9 hours</strong> reviewing and repairing it. At Anthropic&#8217;s <strong>90th-percentile</strong> ceiling of <strong>$30 per active day</strong>, the crossover is still about <strong>7.7 hours</strong>. At the current Tokenmaxxing <strong>median</strong> of <strong>$98.25 per day</strong>, the crossover drops to about <strong>6.9 hours</strong>. At the current Tokenmaxxing <strong>average</strong> of <strong>$279.28 per day</strong>, the crossover drops to about <strong>4.9 hours</strong>. At an intentionally ugly frontier day of <strong>$550</strong>, you only get about <strong>2.0 hours</strong> of human cleanup before AI loses the raw coding-day comparison.</p><p>That is the real question for engineering leaders:</p><p><strong>How many hours of senior-human verification, correction, and coordination does this class of task usually need after the model claims success?</strong></p><p>Once you ask it that way, the task sorting gets a lot less mystical.</p><h2><strong>Where AI is actually strong</strong></h2><p>I think there are three broad zones.</p><p><strong>AI-led work</strong> is the place where the output is cheap to check and the blast radius is tightly bounded. Repo search, code explanation, narrow transforms, test scaffolding with a known oracle, documentation drafts, log triage, and boring CRUD inside one known framework all live here. These are the tasks where the Anthropic-style <strong>$13 to $30 active day</strong> economics are genuinely compelling because the review burden stays low.</p><p><strong>Hybrid work</strong> is where most serious teams will make their money. Bounded feature work inside one service, behavior-preserving refactors, migration scripts with rollback paths, build repairs with a fast verification loop, and bug fixes anchored by deterministic tests fit here. AI can do a lot of the drafting and search. A human still has to own the change shape, the acceptance criteria, and the final judgment. In this zone, pure human work wastes cheap drafting power and fully agentic work creates expensive recovery loops.</p><p><strong>Human-led work</strong> returns the moment review, coordination, or ambiguity starts dominating the bill. Product discovery, architecture under uncertainty, security and compliance changes, cross-team design, rollout strategy, incident response, and distributed-systems behavior under failure all live here. The model may still help, often a lot, but it is helping around the edges of the expensive thing. It is not replacing the expensive thing.</p><p>That is why I do not find &#8220;AI replaces a developer&#8221; especially useful as a framing for enterprise software. At best, AI replaces or compresses some implementation-shaped slices inside a much larger chain of work.</p><h2><strong>The rest of the evidence points the same way</strong></h2><p><a href="https://cloud.google.com/blog/products/devops-sre/announcing-the-2024-dora-report">Google&#8217;s 2024 DORA report summary</a> is one of the more interesting reports  because it refuses to tell a neat success story. More AI adoption was associated with better documentation quality, better code quality, and faster code review. It was also associated with <strong>lower delivery throughput</strong> and <strong>lower delivery stability</strong>. DORA&#8217;s later write-up on <a href="https://dora.dev/insights/balancing-ai-tensions/">balancing AI tensions in the SDLC</a> says the quiet part out loud: teams often spend the saved drafting time on <strong>auditing and verification</strong>.</p><p>That fits the curve almost perfectly. AI speeds up local creation. The bill comes back during checking.</p><p><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">METR&#8217;s July 2025 study</a> found that experienced open-source developers working in large repositories they already knew well took <strong>19% longer</strong> when AI tools were allowed. The 2026 follow-up says current uplift is harder to measure cleanly because tools improved, people increasingly refuse to work without them, and high-leverage usage patterns keep changing. That is a very modern research result. The tools matter. The measurement is messy. Anyone selling you one permanent productivity number is enjoying the simplicity more than the truth.</p><p>Security tells the same story from a less cheerful angle. <a href="https://www.gitguardian.com/state-of-secrets-sprawl-report-2026">GitGuardian&#8217;s 2026 State of Secrets Sprawl report</a> says public GitHub saw about <strong>29 million</strong> newly leaked hardcoded secrets in 2025, and that AI-assisted commits leak secrets at roughly <strong>2x</strong> the baseline rate. Cheap generation is still cheap when it creates a secret leak. The expensive part arrives later, with rotation, cleanup, review, and the very ordinary question of who signed off on the thing.</p><p>Even the MIT numbers people like to quote need more care than they usually get. <a href="https://iceberg.mit.edu/report.pdf">Project Iceberg&#8217;s report</a> says <strong>11.7% of U.S. wage value</strong> is technically exposed to current AI capability. That is interesting. It is also a statement about technical exposure, not a clean proof of near-term economic replacement in enterprise software organizations with real delivery, security, and coordination costs.</p><h2><strong>The takeaway I would give an engineering leader</strong></h2><p>If you run an engineering team, platform group, or budget process, my takeaway is pretty simple.</p><p>Do not ask whether AI is cheaper than developers. Ask whether AI lowers the total cost of getting a production-worthy change through your delivery system once review, repair, security, rollout, and coordination are included.</p><p>That changes what you measure.</p><ul><li><p>Track <strong>review and repair hours</strong> alongside model spend</p></li><li><p>Track <strong>rework after review</strong>, not just output volume</p></li><li><p>Track <strong>escaped defects, hotfixes, and rollback rate</strong></p></li><li><p>Track <strong>security findings and secret exposure</strong></p></li><li><p>Track <strong>cycle time to a trusted production change</strong>, not just time to first draft</p></li></ul><p>If those numbers improve, I do not care if the token bill goes up. You bought leverage.</p><p>If output volume goes up while review pain, rework, and incidents stay flat or get worse, you probably bought a faster typing machine and a more expensive queue behind it.</p><p><strong>Cheap tokens matter. Expensive software still matters more.</strong> The real break-even point is where human review, repair, and lifecycle coordination start growing faster than the AI draft got cheaper.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Quarkus Container Image Strategy: When Buildpacks Beat Jib]]></title><description><![CDATA[Build the same Quarkus service with Jib and Buildpacks, then decide whether your team needs a boring image build or a rebase-friendly platform contract.]]></description><link>https://www.the-main-thread.com/p/quarkus-buildpacks-vs-jib</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-buildpacks-vs-jib</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Sun, 31 May 2026 06:08:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/470481a7-294e-46ea-bb03-4bcc08b21b4f_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Both Jib and <a href="https://buildpacks.io/">Buildpacks</a> let you skip a handwritten Dockerfile. That is the least interesting thing about them.</p><p>The real choice is operational. Do you want the app team to build a predictable image with as little ceremony as possible, or do you want the image build to carry stronger platform policy, richer metadata, and the option to patch the runtime base without rebuilding the app? Those are not the same job, and Quarkus gives you a path for both.</p><p>I think this is where a lot of container-image advice gets mushy. We flatten the decision into &#8220;what command builds the image?&#8221; when the better question is &#8220;what contract am I taking on?&#8221; Jib is mostly an application-builder story. Buildpacks are closer to a platform-builder story, even when a developer triggers the command on a laptop.</p><p>This walkthrough keeps the application intentionally small so the comparison stays honest. We build the same Quarkus service twice, first with Jib and then with Buildpacks, and then we look at the part that actually changes the decision: metadata, builder control, and rebase behavior.</p><h2><strong>Prerequisites</strong></h2><p>You need a normal Quarkus setup and a working local container runtime. We are not doing anything exotic here, but I do want the commands to be reproducible without filling in missing pieces from memory.</p><ul><li><p>JDK 21 installed</p></li><li><p><a href="https://quarkus.io/guides/cli-tooling">Quarkus CLI</a> on your <code>PATH</code></p></li><li><p>Docker available locally</p></li><li><p>The <code>pack</code><a href="https://buildpacks.io/docs/for-platform-operators/how-to/integrate-ci/pack/"> CLI</a> for image inspection and rebase checks</p></li><li><p>About 2 &#9749;&#65039;</p></li></ul><p>I am using Docker here because the current Quarkus Buildpack integration in the <a href="https://quarkus.io/guides/container-image">Quarkus container image guide</a> expects a Docker-backed flow. Buildpacks as an ecosystem are broader than that. This article is about what Quarkus gives you today, not every possible CNB setup.</p><h2><strong>Create the sample once</strong></h2><p>Create a tiny REST application:</p><pre><code><code>quarkus create app dev.themainthread:container-choice-demo \
  --extension=rest-jackson \
  --java=21 \
  --no-code</code></code></pre><p>I am using <code>--no-code</code> because the application itself is not the story. We only need one endpoint that proves the image runs.</p><p>Create <code>src/main/java/dev/themainthread/container/HelloResource.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;5f3b7167-794d-480a-a194-431cdbdb4b8d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.themainthread.container;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/hello")
public class HelloResource {

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    public String hello() {
        return "hello from quarkus";
    }
}</code></pre></div><p>Start the app once and verify the endpoint:</p><pre><code><code>cd container-choice-demo
quarkus dev</code></code></pre><p>In another terminal:</p><pre><code><code>curl http://localhost:8080/hello</code></code></pre><p>You should get:</p><pre><code><code>hello from quarkus</code></code></pre><p>Stop dev mode after that. The application is now boring in exactly the way I want. If one build path behaves differently from the other, the image tooling is the variable.</p><p>I am also using the Quarkus CLI image commands on purpose. They let us switch between supported image builders without turning the project setup into the story.</p><h2><strong>First path: Jib</strong></h2><p>Build the image with Jib:</p><pre><code><code>quarkus image build jib --group mainthread --name container-choice-demo</code></code></pre><p>For a local build, Quarkus wires the result into the local container daemon so you can run it immediately:</p><pre><code><code>docker run --rm -p 8080:8080 mainthread/container-choice-demo:1.0.0-SNAPSHOT</code></code></pre><p>Then check the endpoint again:</p><pre><code><code>curl http://localhost:8080/hello</code></code></pre><p>If the container answers, Jib did its job. That sounds obvious, but it is worth saying because this is what a lot of teams actually need: take a normal Quarkus service, build an image, move on with life.</p><p>What Jib gets right is the lack of drama. It understands Java application layering well, it fits naturally into Quarkus image tooling, and in push-oriented CI pipelines it can avoid some of the local-daemon assumptions that other paths drag in. If your requirement is &#8220;build a container image for this service and keep the pipeline boring,&#8221; Jib is a very strong default.</p><p>That does not mean it is universally better. It means the default burden is low. You do not have to decide on a builder stack up front, and you do not have to explain to the rest of the team why image creation suddenly has opinions about lifecycle binaries and run images.</p><h2><strong>Second path: Buildpacks</strong></h2><p>Before running the Buildpacks path, switch the Maven Wrapper away from the generated only-script mode:</p><pre><code><code>mvn -N org.apache.maven.plugins:maven-wrapper-plugin:3.3.4:wrapper -Dtype=source</code></code></pre><p>I am doing this up front because current generated wrappers can be awkward inside builder containers. In only-script mode, mvnw can fall back from the Maven ZIP distribution to the TAR.GZ distribution when unzip is missing in the build environment. If your checksum in maven-wrapper.properties matches the ZIP, that fallback can make the Buildpacks build fail even though the wrapper file looks correct. type=source avoids that path and keeps the wrapper self-contained without requiring a checked-in wrapper JAR.</p><p>Also, Quarkus defaults to fast-jar, which writes the runnable application under target/quarkus-app/. Paketo&#8217;s Maven buildpack, by default, looks for built artifacts with the pattern target/*.[ejw]ar. For this comparison piece, the least awkward fix is to have the build inside the builder produce an uber-jar instead.</p><p>Now build the same application with Buildpacks:</p><pre><code><code>quarkus image build buildpack \
  --group mainthread \
  --name container-choice-demo \
  --build-env 'BP_MAVEN_ADDITIONAL_BUILD_ARGUMENTS=-Dquarkus.package.jar.type=uber-jar' \
  --builder-image paketobuildpacks/builder-jammy-base</code></code></pre><p>A builder image is the build-side toolkit for a Buildpacks build. It is an OCI image that bundles the lifecycle binaries, an ordered set of buildpacks, a build-time base image, and a reference to the run image that will sit under the final application image.</p><p>That means paketobuildpacks/builder-jammy-base is not just a helper image name. It affects which buildpacks get a chance to detect your project, what environment they run in while producing layers, and which runtime base line the exported image starts from. Later, when we get to rebase, that run-image relationship becomes the interesting part.</p><p>That is why I am pinning the builder explicitly instead of hand-waving it away as a default. In a Buildpacks flow, the builder is part of the image contract.</p><p>Run the image:</p><pre><code><code>docker run --rm -p 8080:8080 mainthread/container-choice-demo:1.0.0-SNAPSHOT</code></code></pre><p>And verify it:</p><pre><code><code>curl http://localhost:8080/hello</code></code></pre><p>At this point both tools look similar from the outside. We built a Quarkus service, we started a container, and the endpoint answered. If that is where you stop, the article collapses into &#8220;two ways to avoid a Dockerfile.&#8221; That is not the interesting part.</p><p>The interesting part is what the Buildpack image knows about itself.</p><h2><strong>Inspect the Buildpack image</strong></h2><p>Use <code>pack</code> to inspect the image metadata:</p><pre><code><code>pack inspect-image mainthread/container-choice-demo:1.0.0-SNAPSHOT</code></code></pre><p>You should see sections for the run image, the buildpacks that participated in the build, and the process types baked into the image. That matters because the resulting artifact is not just &#8220;some filesystem layers Jib assembled.&#8221; It is an image with a buildpack lineage. The <a href="https://buildpacks.io/docs/for-app-developers/how-to/build-outputs/inspect-app/">inspect-image command</a> is where that becomes visible.</p><p>This is one of the first places where platform concerns show up. A platform team can standardize on a sanctioned builder and know that every service image carries the same kind of metadata and the same base-image policy. An app team can still trigger the build, but the artifact says more about how it came to exist.</p><p>You also get a better answer when somebody asks, &#8220;What exactly built this image?&#8221; With Buildpacks, that question has first-class metadata behind it. With Jib, the answer is usually simpler, but it is also mostly &#8220;our build assembled these layers.&#8221; That can be enough. Sometimes it is not.</p><h2><strong>Rebase is the whole point</strong></h2><p>If you want one reason Buildpacks still matter in 2026, this is the one I would use.</p><p>Try:</p><pre><code><code>pack rebase mainthread/container-choice-demo:1.0.0-SNAPSHOT</code></code></pre><p>Rebase swaps the run-image layers under a buildpack-produced application image without rebuilding the application itself. When there is an updated run image available, you can patch the base and keep the application layers intact. The <a href="https://buildpacks.io/docs/for-app-developers/concepts/rebase/">Buildpacks rebase docs</a> are worth reading here because this is the feature that changes the operational conversation.</p><p>That is a very different operational story from &#8220;run the Java build again.&#8221; If your platform team owns runtime base images and your application teams own code, Buildpacks let those responsibilities meet in a cleaner place. A CVE in the base image does not automatically mean every service has to rerun the whole application build just to move onto a patched runtime layer.</p><p>This is also where I stop thinking of Buildpacks as a developer convenience feature. Convenience is nice. Rebase is strategy. It only matters if your organization actually separates application rebuilds from base-image maintenance, but when that split is real, Jib and Buildpacks are not interchangeable anymore.</p><h2><strong>Where Jib still wins</strong></h2><p>I would still pick Jib first for a lot of Quarkus services.</p><p>If the team mostly wants a reliable image build inside the application pipeline, Jib is easier to explain and easier to own. There is less builder policy to reason about. There are fewer moving pieces to standardize. The mental model is close to the application build itself: compile the app, assemble a sensible layered image, push it where it needs to go.</p><p>That simplicity matters. Teams rarely regret the simpler pipeline on day one. They regret it when the complicated alternative did not buy them anything real.</p><p>The mistake is treating Buildpacks as automatically more modern because they are more opinionated. More opinionated is only better when the opinion matches a problem you actually have.</p><h2><strong>Where Buildpacks earn their keep</strong></h2><p>I would pick Buildpacks when one or more of these are true:</p><ul><li><p>The platform team wants to standardize the builder and the runtime base across many services</p></li><li><p>You care about buildpack metadata and image inspection as part of your delivery story</p></li><li><p>Rebase is operationally useful in your environment</p></li><li><p>You want image construction to express platform policy, not just application packaging</p></li></ul><p>That is a narrower case than &#8220;all Java services should use Buildpacks.&#8221; It is also a much more honest one.</p><h2><strong>A few traps worth calling out</strong></h2><p>Do not turn <code>quarkus.container-image.build=true</code> into a permanent property just because a tutorial wanted shorter commands. The current Quarkus docs explicitly warn against doing that for the Buildpack path because it can lead to nested builds in places you did not intend.</p><p>If you pass Buildpack-specific environment, use the current Quarkus configuration shape:</p><pre><code><code>quarkus.buildpack.builder-env."BP_JVM_VERSION"=21</code></code></pre><p>That is the kind of detail stale articles get wrong. Older config examples using <code>quarkus.buildpack.env.*</code> are easy to find and easy to cargo-cult into a project that no longer matches the docs.</p><p>Also, pin the builder you actually mean. <code>latest</code> is a lousy platform contract. If builder choice is part of the image policy, treat it like policy.</p><p>Finally, do not force Buildpacks into a team that only needs an ordinary image build. If nobody will inspect the metadata, nobody will use rebase, and nobody cares about standardizing on a sanctioned builder line, Jib is probably the better answer because it asks less from everyone.</p><h2><strong>My rule of thumb</strong></h2><p>If I am building a normal Quarkus service and I want the image step to stay boring, I pick Jib.</p><p>If I want the image to carry platform policy, builder identity, and rebase-friendly runtime separation, I pick Buildpacks.</p><p>That is the whole decision for me. Not &#8220;which one avoids a Dockerfile?&#8221; Both do. The better question is who owns the image contract after the Java build is over.</p><p>Quarkus makes it easy to try both, which is exactly what it should do. The mistake is pretending they solve the same problem just because they can both produce a container image from the same source tree.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Trace a Quarkus LangChain4j App in LangSmith]]></title><description><![CDATA[Send Quarkus LangChain4j traces to LangSmith over OpenTelemetry, inspect tool calls and failures, and keep tests local with a deterministic ChatModel stub.]]></description><link>https://www.the-main-thread.com/p/quarkus-langchain4j-langsmith</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-langchain4j-langsmith</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Sat, 30 May 2026 06:08:36 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a4d6b676-3ed7-44e1-b99a-9bc5089b95b2_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>LangSmith still looks like a Python-and-LangChain product in a lot of Java teams. When I started wiring Quarkus LangChain4j into it, I assumed I would need a Java SDK bridge or some Arconia-style semantic-convention shim before the traces would look sane in the UI.</p><p>Turns out I did not. Quarkus LangChain4j already emits OpenTelemetry spans with <strong>GenAI</strong> attributes when <code>quarkus-opentelemetry</code> is on the classpath. LangSmith accepts generic OTLP from non-LangChain apps and maps those attributes into its tracing UI. So the work here is exporter wiring, deciding how much prompt and tool payload to ship, and generating traces worth opening.</p><p>We build <strong>SignalDesk</strong>, a fictional on-call support assistant with one REST endpoint, one AI service, one runbook tool, and three request shapes: plain chat, tool call, and controlled failure. You run it locally against Ollama, export traces to LangSmith, and keep CI green with a deterministic test stub.</p><p>For a much simpler Quarkus observability story in a hand-crafted way, see <a href="https://www.the-main-thread.com/p/llm-observability-quarkus-langchain4j">LLM observability with Quarkus and LangChain4j</a>. This post stays focussed on standards: <strong>LangSmith over OTLP from a Quarkus AI service</strong>.</p><h2><strong>What we build</strong></h2><p><strong>SignalDesk</strong> exposes <code>POST /signaldesk/assist</code> and returns:</p><ul><li><p><code>answer</code> &#8212; model text</p></li><li><p><code>usedTool</code> &#8212; whether a tool ran</p></li><li><p><code>toolName</code> &#8212; e.g. <code>lookupRunbook</code></p></li><li><p><code>outcome</code> &#8212; <code>OK</code>, <code>TOOL_FAILED</code>, or <code>DEGRADED</code></p></li></ul><p>Three prompts drive three trace stories:</p><ul><li><p><strong>Plain chat</strong> &#8212; &#8220;What is our SLA for SEV-2?&#8221; (no tool)</p></li><li><p><strong>Tool path</strong> &#8212; &#8220;SEV-1 database failover &#8212; which runbook?&#8221; (<code>lookupRunbook</code>)</p></li><li><p><strong>Failure path</strong> &#8212; &#8220;Trigger runbook lookup for UNKNOWN-PLAN&#8221; (tool returns an error; HTTP stays 200 with <code>outcome: TOOL_FAILED</code>)</p></li></ul><h2><strong>What you need</strong></h2><p>You have run Quarkus in dev mode and called a JSON endpoint before. LangChain4j AI service interfaces should look familiar.</p><ul><li><p>JDK <strong>21</strong></p></li><li><p><strong>Ollama</strong> on http://localhost:11434</p></li><li><p> with a tool-capable model (defaults to <code>llama3.2</code>; set <code>OLLAMA_MODEL</code> if you standardize on something else)</p></li><li><p><strong>LangSmith</strong> account, API key, and the OTLP endpoint from your project settings</p></li><li><p>About <strong>4-5 &#9749;&#65039;</strong></p></li></ul><h2><strong>Project setup</strong></h2><p>Follow along or clone the project from my Github. From the repo root (adjust the path if you nest the module elsewhere):</p><pre><code><code>quarkus create app dev.signaldesk:signaldesk-langsmith \
  --package-name=dev.signaldesk \
  --extensions='rest-jackson,quarkus-langchain4j-ollama,quarkus-opentelemetry' \
  --java=21 \
  --no-code
cd signaldesk-langsmith</code></code></pre><p>The generator already adds <code>quarkus-langchain4j-bom</code> to <code>dependencyManagement</code> when you pick the LangChain4j Ollama extension. That is the same outcome you get from <a href="https://code.quarkus.io/">code.quarkus.io</a> with that extension selected. </p><p>Add rest-assured dependency to your <code>pom.xml</code>:</p><ul><li><p><code>rest-assured</code> &#8212; HTTP contract tests</p></li></ul><p>Enable parameter names on <code>maven-compiler-plugin</code> (<code>&lt;parameters&gt;true&lt;/parameters&gt;</code>) for <code>{{question}}</code> templates.</p><p>Package root: <code>dev.signaldesk</code>.</p><h2><strong>Assistant and runbook tool</strong></h2><p>I keep the assistant deliberately small here. One AI service, one short system message, and one tool box are enough for LangSmith to show a trace tree you would actually want to inspect: parent chat span, child tool span, and token usage without extra noise.</p><p><code>SignalDeskAssistant</code><strong>:</strong></p><pre><code><code>@RegisterAiService
@ApplicationScoped
public interface SignalDeskAssistant {

    @SystemMessage(
            """
            You are SignalDesk, an internal support assistant for on-call engineers.
            Answer SLA and policy questions directly when no runbook lookup is needed.
            For SEV-1 failover or explicit runbook requests, call lookupRunbook with service and severity.
            Keep answers short.""")
    @UserMessage("{{question}}")
    @ToolBox(RunbookTools.class)
    String assist(String question);
}</code></code></pre><p><code>RunbookTools.lookupRunbook</code> records invocations on a request-scoped <code>AssistTrace</code> and returns a fake runbook string. For <code>UNKNOWN-PLAN</code> it records a tool failure and returns an <code>ERROR:</code> line. LangChain4j then feeds that back to the model instead of throwing through the whole stack, which is a lot easier to demo and inspect:</p><pre><code><code>@Tool("Looks up the on-call runbook for a service and severity. Use for failover or incident response.")
public String lookupRunbook(String service, String severity) {
    if (service != null &amp;&amp; service.toUpperCase().contains("UNKNOWN-PLAN")) {
        assistTrace.recordToolFailure(TOOL_NAME);
        return "ERROR: runbook not found for service " + service;
    }
    assistTrace.recordTool(TOOL_NAME);
    return "runbook-" + service + "-" + severity + ": page platform-oncall, follow failover checklist RB-12";
}</code></code></pre><h2><strong>REST endpoint and </strong><code>AssistTrace</code></h2><p><code>SignalDeskResource</code> delegates to <code>SignalDeskService</code>, which resets <code>AssistTrace</code>, calls the assistant, and maps trace state into <code>AssistResponse</code>. When the tool reports failure, <code>outcome</code> becomes <code>TOOL_FAILED</code> even though HTTP stays <strong>200</strong>. I like that split here because API success and tool success are different stories, and the trace should make that visible.</p><h2><strong>OpenTelemetry &#8594; LangSmith</strong></h2><p>Add <code>quarkus-opentelemetry</code> (included if you used the create command above). Configure OTLP export in <code>src/main/resources/application.properties</code>:</p><pre><code><code>quarkus.application.name=signaldesk-langsmith

quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.chat-model.model-id=${OLLAMA_MODEL:llama3.2}
quarkus.langchain4j.ollama.chat-model.temperature=0.2
quarkus.langchain4j.timeout=120s

quarkus.otel.exporter.otlp.protocol=http/protobuf
quarkus.otel.exporter.otlp.traces.protocol=http/protobuf
quarkus.otel.exporter.otlp.traces.endpoint=${OTEL_EXPORTER_OTLP_ENDPOINT:${LANGSMITH_OTLP_ENDPOINT:https://api.smith.langchain.com/otel}}
quarkus.otel.exporter.otlp.traces.headers=x-api-key=${LANGSMITH_API_KEY:},Langsmith-Project=${LANGSMITH_PROJECT:signaldesk-langsmith}
quarkus.otel.traces.sampler=parentbased_always_on</code></code></pre><p>Three details burned me during smoke testing:</p><p><strong>1. Use the OTLP base URL ending in </strong><code>/otel</code><strong>, not </strong><code>/otel/v1/traces</code><strong>.</strong> LangSmith&#8217;s <a href="https://docs.langchain.com/langsmith/trace-with-opentelemetry">OpenTelemetry guide</a> often shows <code>&#8230;/otel</code> as the endpoint. With <code>http/protobuf</code>, Quarkus <strong>appends</strong> <code>v1/traces</code> automatically (<a href="https://quarkus.io/guides/opentelemetry">Quarkus OpenTelemetry guide</a>). If you also put <code>/v1/traces</code> in the property, export silently hits a double path (<code>&#8230;/otel/v1/traces/v1/traces</code>) and the dashboard stays empty even though spans exist locally in the logs.</p><p><strong>2. Quarkus does not read the Python LangChain SDK env vars.</strong> These are ignored for this app:</p><ul><li><p><code>LANGSMITH_TRACING</code></p></li><li><p><code>LANGSMITH_OTEL_ENABLED</code></p></li><li><p><code>LANGSMITH_ENDPOINT</code> (API host, not the OTLP exporter URL)</p></li></ul><p>Use OTLP variables instead:</p><ul><li><p><code>LANGSMITH_API_KEY</code> &#8594; <code>x-api-key</code> header</p></li><li><p><code>OTEL_EXPORTER_OTLP_ENDPOINT</code> or <code>LANGSMITH_OTLP_ENDPOINT</code> &#8594; traces endpoint (base <code>/otel</code> URL)</p></li><li><p><code>LANGSMITH_PROJECT</code> &#8594; <code>Langsmith-Project</code> header (must match the project name in the UI)</p></li></ul><p><strong>3. Regional hosts matter.</strong> EU accounts need the EU OTLP host, not the US default:</p><ul><li><p><strong>EU:</strong> <code>https://eu.api.smith.langchain.com/otel</code></p></li><li><p><strong>US:</strong> <code>https://api.smith.langchain.com/otel</code></p></li></ul><p>Export credentials in the <strong>same shell</strong> you use to start dev mode, then restart:</p><pre><code><code>export LANGSMITH_API_KEY=lsv2_pt_...
export OTEL_EXPORTER_OTLP_ENDPOINT=https://eu.api.smith.langchain.com/otel
export LANGSMITH_PROJECT=signaldesk-langsmith
./mvnw quarkus:dev</code></code></pre><p>Use a <strong>slug</strong> for <code>LANGSMITH_PROJECT</code> (no spaces). Values like &#8220;<code>Quarkus Test App&#8221;</code> break Quarkus comma-separated headers and LangSmith may only receive <code>Langsmith-Project=Quarkus</code>. If you need spaces, set the full header string via <code>LANGSMITH_OTLP_HEADERS</code> and wire that property in <code>application.properties</code> instead.</p><h3><strong>Startup check: </strong><code>OtelExportConfigProbe</code></h3><p>The demo includes a small startup bean that logs the resolved OTLP endpoint, the project name it sees, whether an API key reached the exporter config, and a few common-footgun warnings. On boot you want something like:</p><pre><code><code>OTLP traces: protocol=http/protobuf endpoint=https://eu.api.smith.langchain.com/otel project=signaldesk-langsmith apiKeySet=true</code></code></pre><p>If <code>apiKeySet=false</code> or the endpoint still shows <code>&#8230;/otel/v1/traces</code>, fix the env vars and restart before you curl.</p><p>The app still starts and answers requests even when export is misconfigured. LangSmith just stays empty until the endpoint, key, and project line up. Once they do, traces usually show up in the named project within about 10 to 30 seconds of each request.</p><p>Quarkus LangChain4j records spans for chat model calls with attributes such as <code>gen_ai.operation.name</code>, <code>gen_ai.system</code>, model id, and token usage and there is no custom instrumentation class required for the baseline path.</p><h2><strong>Rich trace content (and why it is dangerous)</strong></h2><p>LangSmith is much easier to debug when prompts and tool payloads are on the span. Quarkus exposes that as configuration, not code:</p><pre><code><code>quarkus.langchain4j.tracing.include-prompt=true
quarkus.langchain4j.tracing.include-completion=true
quarkus.langchain4j.tracing.include-tool-arguments=true
quarkus.langchain4j.tracing.include-tool-result=true</code></code></pre><p>I turn these on for local debugging and leave them off by default in production unless there is redaction, a retention policy, and actual legal sign-off. Customer text, tokens, and runbook arguments end up in a third-party trace store very quickly.</p><p>For local export failures, the demo turns on <code>%dev</code> debug logging for <code>io.opentelemetry.exporter</code> and <code>io.quarkus.opentelemetry</code> &#8212; search the console for <code>401</code>, <code>404</code>, or <code>Failed to export</code> after a curl.</p><h2><strong>Optional: app metadata on spans</strong></h2><p>If you want an app-specific filter that survives in LangSmith, implement <code>ChatModelSpanContributor</code>:</p><pre><code><code>@ApplicationScoped
public class SignalDeskSpanContributor implements ChatModelSpanContributor {

    @Override
    public void onRequest(ChatModelRequestContext requestContext, Span currentSpan) {
        currentSpan.setAttribute("signaldesk.workflow", "signaldesk-assist");
    }
    // onResponse / onError &#8212; same attribute
}</code></code></pre><p>That sits <strong>beside</strong> standard <code>gen_ai.*</code> data, not instead of it.</p><h2><strong>Prove it</strong></h2><p><strong>CI (no Ollama, no LangSmith):</strong></p><pre><code><code>./mvnw test</code></code></pre><p><code>src/test/resources/application.properties</code>:</p><pre><code><code>%test.quarkus.langchain4j.ollama.devservices.enabled=false
%test.quarkus.langchain4j.devservices.enabled=false
%test.quarkus.otel.sdk.disabled=true
%test.quarkus.langchain4j.tracing.include-prompt=false
%test.quarkus.langchain4j.tracing.include-completion=false
%test.quarkus.langchain4j.tracing.include-tool-arguments=false
%test.quarkus.langchain4j.tracing.include-tool-result=false</code></code></pre><p>Tests use <code>SignalDeskStubChatModel</code> via <code>SignalDeskStubProfile</code> (<code>getEnabledAlternatives()</code>), with keyword-driven tool versus no-tool behavior that matches the three curl recipes.</p><p><strong>Manual traces (Ollama + LangSmith):</strong></p><p>Plain chat:</p><pre><code><code>curl -s -X POST http://localhost:8080/signaldesk/assist \
  -H 'Content-Type: application/json' \
  -d '{"question":"What is our SLA for SEV-2?"}' | jq</code></code></pre><p>Tool path:</p><pre><code><code>curl -s -X POST http://localhost:8080/signaldesk/assist \
  -H 'Content-Type: application/json' \
  -d '{"question":"SEV-1 database failover &#8212; which runbook?"}' | jq</code></code></pre><p>Failure path:</p><pre><code><code>curl -s -X POST http://localhost:8080/signaldesk/assist \
  -H 'Content-Type: application/json' \
  -d '{"question":"Trigger runbook lookup for UNKNOWN-PLAN"}' | jq</code></code></pre><p><strong>LangSmith checklist</strong> after each call (project <code>signaldesk-langsmith</code> in the UI):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0ipu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0ipu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0ipu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0ipu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0ipu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0ipu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg" width="1456" height="357" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:357,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:440756,&quot;alt&quot;:&quot;LangSmith Screenshot&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.the-main-thread.com/i/198565494?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="LangSmith Screenshot" title="LangSmith Screenshot" srcset="https://substackcdn.com/image/fetch/$s_!0ipu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0ipu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0ipu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0ipu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c8312-08ec-4e1b-8a60-163557856543_4327x1062.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Open the <strong>Runs</strong> tab for your project. After the tool-path curl, the table should look roughly like this:</p><ul><li><p><code>POST /signaldesk/assist</code> &#8212; HTTP entry (~0.8s), often the slowest row</p></li><li><p><code>langchain4j.services.SignalDeskAssistant.assist</code> (or truncated) &#8212; AI service span</p></li><li><p><code>completion llama3.2</code> &#8212; model span with Input/Output columns showing prompt and answer text; <strong>Tokens</strong> populated (exact count varies by model and prompt)</p></li><li><p><code>lookupRunbook</code> &#8212; tool span with Input <code>database</code> (or similar) and Output <code>runbook-database-SEV-&#8230;</code></p></li><li><p><code>OTEL_SPAN_ID</code> in Metadata on each row &#8212; confirms generic OTLP export, not a LangChain-only SDK path</p></li></ul><p>Drill into one run for the <strong>trace tree</strong> (HTTP &#8594; service &#8594; completion &#8594; tool). The flat Runs table is enough to prove export. The tree view is the part worth keeping for the article screenshot.</p><p>Per recipe:</p><ul><li><p><strong>Plain chat</strong> &#8212; <code>completion</code> rows without a <code>lookupRunbook</code> row</p></li><li><p><strong>Tool path</strong> &#8212; <code>lookupRunbook</code> plus one or more <code>completion llama3.2</code> rows (tool-capable models may take two model hops)</p></li><li><p><strong>Failure path</strong> &#8212; <code>lookupRunbook</code> with error content in Output; JSON shows <code>"outcome":"TOOL_FAILED"</code> even when HTTP stays 200</p></li></ul><h2><strong>Make it survive production</strong></h2><p>Keep the claims modest. This setup gives you useful LangSmith traces from Quarkus over OTLP. It does not promise full parity with LangSmith&#8217;s language-specific SDKs, every LangSmith feature via generic OTLP, or a safe default for shipping full prompts on production traffic.</p><p><strong>Nothing in the dashboard?</strong> Work through this order &#8212; it matches what broke in practice:</p><ol><li><p><strong>Endpoint suffix</strong> &#8212; property must end with <code>/otel</code>, not <code>/otel/v1/traces</code></p></li><li><p><strong>Region</strong> &#8212; EU UI needs <code>eu.api.smith.langchain.com</code>, not <code>api.smith.langchain.com</code></p></li><li><p><strong>Env vars in the dev shell</strong> &#8212; restart <code>quarkus:dev</code> after <code>export</code>; IDE runs often miss them</p></li><li><p><strong>Project name</strong> &#8212; <code>LANGSMITH_PROJECT</code> must match the LangSmith project; avoid spaces in the value unless you use <code>LANGSMITH_OTLP_HEADERS</code></p></li><li><p><strong>Wrong tab</strong> &#8212; traces land under the named project&#8217;s Runs, not &#8220;default&#8221;</p></li><li><p><strong>Batch delay</strong> &#8212; wait 10&#8211;30 seconds and refresh</p></li></ol><p><strong>Missing API key</strong> &#8212; the app should still answer; OTLP export may warn or drop spans. <code>OtelExportConfigProbe</code> logs <code>apiKeySet=false</code> when the key did not reach the JVM.</p><p><strong>PII and secrets</strong> &#8212; prompts, completions, and tool arguments can contain customer data. Once traffic is real, I would rather put sampling, redaction, or a collector in front of LangSmith than pretend the debug-friendly defaults are somehow production policy. The follow-up that belongs in its own post is one Quarkus app with collector fan-out to LangSmith for AI traces and Tempo for everything else.</p><p><strong>Metrics</strong> &#8212; traces answer &#8220;what happened on this request?&#8221; Micrometer dashboards answer &#8220;what is the burn rate?&#8221; The sibling piece in this series is production-style AI metrics, not this article.</p><h2><strong>Close</strong></h2><p>SignalDesk is small on purpose. Quarkus LangChain4j already speaks the OpenTelemetry GenAI dialect LangSmith understands, so the real decision is how you export and how much content you let leave the JVM. Once the plain chat, tool, and failure traces look right in LangSmith, you have the same debugging loop Python teams enjoy, without rewriting the app in LangChain.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Agent-Ready APIs Need Boring Discovery, Not AI Magic]]></title><description><![CDATA[Implement Cloudflare-style readiness in Quarkus with robots.txt, llms.txt, Link headers, Markdown negotiation, OAuth metadata, API Catalog, MCP, and Agent Skills.]]></description><link>https://www.the-main-thread.com/p/quarkus-agent-readiness</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-agent-readiness</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Fri, 29 May 2026 06:08:25 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/52d89d9c-b6c3-462c-9885-6112a7ba6164_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The first request from an agent is usually a tiny archaeology project. It knows a host name, maybe one URL, and then it has to guess where the useful machine-readable bits live. Humans can click through docs. Agents start with HTTP, headers, and a small amount of patience.</p><p>Cloudflare&#8217;s <a href="https://blog.cloudflare.com/agent-readiness/">Agent Readiness score</a> turns that problem into a concrete audit. The scanner at <a href="https://isitagentready.com/">isitagentready.com</a> checks discoverability, content accessibility, bot access control, protocol discovery, and commerce signals. Some of those checks are old web plumbing wearing new boots. Some are new enough that the paint is still wet.</p><p>For me, &#8220;agent-ready&#8221; starts with the boring parts that remove guessing. Give the client crawl rules, a sitemap, Link headers, a compact <code>llms.txt</code>, a real OpenAPI pointer, OAuth protected-resource metadata, and an MCP endpoint if you expose tools. None of this makes the API smarter. It just stops wasting requests on detective work.</p><p>We will build that around <strong>Meridian</strong>, a small Quarkus knowledge-base API with public article search, protected write endpoints, Markdown article content, Keycloak-backed OAuth, and MCP tools. The domain is deliberately boring. That is good; the plumbing is the thing we care about here.</p><p>This article uses Java 21 and Quarkus 3.34.5. The Quarkiverse MCP HTTP extension is brought in through the Quarkus platform MCP server BOM, so the extension version follows the platform you choose.</p><p>The final project lives at <code>https://github.com/myfear/the-main-thread/meridian-agent-ready</code>.</p><h2><strong>What We Build</strong></h2><p>Meridian exposes these agent-facing pieces:</p><ul><li><p><code>robots.txt</code>, <code>sitemap.xml</code>, and <code>llms.txt</code> at the root</p></li><li><p>RFC 8288 Link headers from the API root</p></li><li><p>Markdown responses for article content when the client sends <code>Accept: text/markdown</code></p></li><li><p>Content usage signals on article responses and crawler rules</p></li><li><p>OAuth 2.0 Protected Resource Metadata from Quarkus OIDC</p></li><li><p>an RFC 9727 API Catalog that links to the Quarkus OpenAPI document</p></li><li><p>a Quarkus MCP server over HTTP/SSE</p></li><li><p>an MCP Server Card for pre-connection discovery</p></li><li><p>an Agent Skills discovery index with a small <code>SKILL.md</code></p></li></ul><p>The maturity is uneven. <code>robots.txt</code>, sitemaps, Link headers, OpenAPI, OIDC, and RFC 9728 are boring enough to ship. The MCP endpoint is practical today. API Catalog is a real RFC, but client support is still early. <code>llms.txt</code>, Content Signals, Markdown negotiation, MCP Server Cards, and Agent Skills are useful conventions with uneven adoption. Ship them with clear boundaries. New standards are where confident blog posts go to embarrass themselves six months later. Including this article probably. We will see.</p><h2><strong>What You Need</strong></h2><p>You need a current Quarkus CLI, Java 21, Maven, and a container runtime if you want Keycloak Dev Services locally.</p><ul><li><p>Java 21</p></li><li><p>Quarkus CLI</p></li><li><p>Maven 3.9+</p></li><li><p>Podman or Docker for Keycloak Dev Services</p></li><li><p>Basic Jakarta REST and OAuth knowledge</p></li><li><p>Some amount of your favorite beverage &#9749;&#65039;</p></li></ul><p>Create the project or <a href="https://github.com/myfear/the-main-thread/tree/main/meridian">directly clone the repository</a>.</p><pre><code><code>quarkus create app dev.the-main-thread:meridian \
  --package-name=dev.themainthread.meridian \
  --extension=quarkus-rest,quarkus-rest-jackson,quarkus-smallrye-openapi,quarkus-smallrye-health,quarkus-oidc,io.quarkiverse.mcp:quarkus-mcp-server-http</code></code></pre><p>Use <code>--package-name</code> because <code>dev.the-main-thread</code> is a fine Maven group id and a broken Java package. <code>quarkus-smallrye-health</code> is optional, but the API catalog below links to <code>/q/health</code>, so we add it here instead of pretending the endpoint appears by magic.</p><p>The extensions are simple enough:</p><ul><li><p><code>quarkus-rest</code>: Jakarta REST endpoints</p></li><li><p><code>quarkus-rest-jackson</code>: JSON serialization</p></li><li><p><code>quarkus-smallrye-openapi</code>: generated OpenAPI at <code>/q/openapi</code></p></li><li><p><code>quarkus-smallrye-health</code>: health endpoint for catalog status links</p></li><li><p><code>quarkus-oidc</code>: bearer-token security and protected-resource metadata</p></li><li><p><code>io.quarkiverse.mcp:quarkus-mcp-server-http</code>: MCP over Streamable HTTP and SSE</p></li></ul><p>The MCP artifact name matters. Older examples used <code>quarkus-mcp-server-sse</code>. The current Quarkiverse extension page lists <code>quarkus-mcp-server-http</code>; it still exposes the SSE endpoint at <code>/mcp/sse</code> when you use the HTTP transport.</p><h2><strong>Static Discovery Files</strong></h2><p>Start with files. It is hard to beat a static file when the job is &#8220;be there before any application logic runs.&#8221;</p><p>Quarkus serves static files from <code>src/main/resources/META-INF/resources/</code>. A file at <code>src/main/resources/META-INF/resources/robots.txt</code> becomes <code>/robots.txt</code>.</p><h3><strong>robots.txt</strong></h3><p><code>robots.txt</code> is still about crawl access. <a href="https://www.cloudflare.com/learning/bots/what-is-robots-txt/">It tells automated clients</a> which paths they may fetch. Cloudflare-style Content Signals can also live there for crawlers that understand them.</p><p>Create <code>src/main/resources/META-INF/resources/robots.txt</code>:</p><pre><code><code>User-agent: *
Allow: /
Disallow: /admin/
Disallow: /internal/

User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /internal/

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /internal/

Content-Signal: search=yes, ai-input=yes, ai-train=no

Sitemap: https://api.meridian.dev/sitemap.xml</code></code></pre><p>The crawler rules and the usage signal do different jobs. <code>Disallow</code> blocks fetching a path. <code>Content-Signal</code> states allowed use for content the crawler is allowed to fetch. It is trust-based, so it works only with clients that respect the signal. Annoying, but still better than silence.</p><h3><strong>sitemap.xml</strong></h3><p>A REST API needs a sitemap with stable public entry points, not every protected write URL you ever added.</p><p>Create <code>src/main/resources/META-INF/resources/sitemap.xml</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;xml&quot;,&quot;nodeId&quot;:&quot;17c45120-3ec5-4c92-8446-9463db915558&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"&gt;
    &lt;url&gt;
        &lt;loc&gt;https://api.meridian.dev/api/v1/articles&lt;/loc&gt;
        &lt;changefreq&gt;hourly&lt;/changefreq&gt;
        &lt;priority&gt;1.0&lt;/priority&gt;
    &lt;/url&gt;
    &lt;url&gt;
        &lt;loc&gt;https://api.meridian.dev/q/openapi?format=json&lt;/loc&gt;
        &lt;changefreq&gt;weekly&lt;/changefreq&gt;
        &lt;priority&gt;0.9&lt;/priority&gt;
    &lt;/url&gt;
    &lt;url&gt;
        &lt;loc&gt;https://api.meridian.dev/.well-known/api-catalog&lt;/loc&gt;
        &lt;changefreq&gt;monthly&lt;/changefreq&gt;
        &lt;priority&gt;0.8&lt;/priority&gt;
    &lt;/url&gt;
&lt;/urlset&gt;</code></pre></div><p>If articles are public and individually addressable, generate this from your database instead. Keep the URL stable and change the implementation behind it.</p><h3><strong>llms.txt</strong></h3><p><code>llms.txt</code> is a convention, not an IETF standard. Keep it short. It gives a model a small map of the service and links to the files worth reading.</p><p>Create <code>src/main/resources/META-INF/resources/llms.txt</code>:</p><pre><code><code># Meridian Knowledge API

&gt; Meridian is a Quarkus API for searching and reading structured knowledge
&gt; articles. Public clients can search and retrieve article content. Write
&gt; operations require OAuth bearer tokens from the Meridian realm.

Meridian exposes JSON for API operations and Markdown for article content when
clients send `Accept: text/markdown`.

## API

- [OpenAPI JSON](https://api.meridian.dev/q/openapi?format=json): Machine-readable REST API contract
- [API catalog](https://api.meridian.dev/.well-known/api-catalog): RFC 9727 Linkset catalog
- [OAuth protected resource metadata](https://api.meridian.dev/.well-known/oauth-protected-resource): Authorization server and scope discovery
- [MCP Server Card](https://api.meridian.dev/.well-known/mcp/server-card.json): Pre-connection MCP discovery
- [Agent Skills index](https://api.meridian.dev/.well-known/agent-skills/index.json): Skills published for agent clients

## Content

- [Article search](https://api.meridian.dev/api/v1/articles): Public article search endpoint
- [MCP endpoint](https://api.meridian.dev/mcp): MCP Streamable HTTP endpoint
- [MCP SSE endpoint](https://api.meridian.dev/mcp/sse): Compatibility endpoint for older MCP clients

## Optional

- [Human documentation](https://docs.meridian.dev): Developer guide and examples</code></code></pre><p>That is enough. Once this becomes a second documentation site, you have invented documentation drift with a nicer filename.</p><h2><strong>Link Headers at the Front Door</strong></h2><p>Cloudflare&#8217;s scanner also looks for <a href="https://www.rfc-editor.org/rfc/rfc8288">RFC 8288 Link headers</a> because an agent should not have to parse HTML before it can find the useful machine-readable documents. For an API, I like making <code>/</code> a tiny discovery response instead of a blank 404. It gives humans a pulse check and gives agents headers they can follow.</p><p>Create <code>src/main/java/dev/themainthread/meridian/resource/DiscoveryResource.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;61418077-1f79-490d-bc62-f0b5decb5356&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.themainthread.meridian.resource;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
import java.util.Map;
import org.eclipse.microprofile.config.inject.ConfigProperty;

@ApplicationScoped
@Path("/")
public class DiscoveryResource {

    @ConfigProperty(name = "meridian.api.base-url")
    String apiBaseUrl;

    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Response index() {
        Map&lt;String, String&gt; body = Map.of(
                "service", "Meridian Knowledge API",
                "articles", apiBaseUrl + "/api/v1/articles",
                "openapi", apiBaseUrl + "/q/openapi?format=json",
                "apiCatalog", apiBaseUrl + "/.well-known/api-catalog",
                "mcpServerCard", apiBaseUrl + "/.well-known/mcp/server-card.json",
                "agentSkills", apiBaseUrl + "/.well-known/agent-skills/index.json");

        return withDiscoveryLinks(Response.ok(body), apiBaseUrl).build();
    }

    static Response.ResponseBuilder withDiscoveryLinks(
            Response.ResponseBuilder response, String apiBaseUrl) {
        return response
                .header("Link",
                        "&lt;" + apiBaseUrl
                                + "/.well-known/api-catalog&gt;; rel=\"api-catalog\"; type=\"application/linkset+json\"")
                .header("Link",
                        "&lt;" + apiBaseUrl + "/q/openapi?format=json&gt;; rel=\"service-desc\"; type=\"application/json\"")
                .header("Link",
                        "&lt;" + apiBaseUrl + "/llms.txt&gt;; rel=\"describedby\"; type=\"text/plain\"")
                .header("Link",
                        "&lt;" + apiBaseUrl
                                + "/.well-known/mcp/server-card.json&gt;; rel=\"mcp-server-card\"; type=\"application/json\"")
                .header("Link",
                        "&lt;" + apiBaseUrl
                                + "/.well-known/agent-skills/index.json&gt;; rel=\"agent-skills\"; type=\"application/json\"");
    }
}
</code></pre></div><p><code>service-desc</code> and <code>describedby</code> are broadly useful. <code>api-catalog</code>, <code>mcp-server-card</code>, and <code>agent-skills</code> are newer relation names, so treat them as friendly hints rather than the only discovery path. The actual files still need stable URLs.</p><h2><strong>Markdown for Article Content</strong></h2><p>Cloudflare&#8217;s Markdown for Agents feature made one convention visible: clients can ask for Markdown with <code>Accept: text/markdown</code>, and the response can carry <code>Vary: Accept</code>, <code>X-Markdown-Tokens</code>, and <code>Content-Signal</code>. The CDN can do this, but a Quarkus app can do it itself.</p><p>I prefer choosing the representation in the resource method. A response filter that rewrites <code>text/html</code> after Jakarta REST has already picked a response type looks elegant, then quietly serves HTML with a Markdown content type. That is a small bug with excellent hiding skills.</p><p>Add the HTML-to-Markdown converter:</p><pre><code><code>&lt;dependency&gt;
  &lt;groupId&gt;com.vladsch.flexmark&lt;/groupId&gt;
  &lt;artifactId&gt;flexmark-html2md-converter&lt;/artifactId&gt;
  &lt;version&gt;0.64.8&lt;/version&gt;
&lt;/dependency&gt;</code></code></pre><p>Then expose article content like this. Put list, create, and content negotiation on the <strong>same</strong> Jakarta REST class with <code>@Path("/api/v1/articles")</code>. Registering two application-scoped resources with that identical class-level path is brittle; merge the methods instead.</p><pre><code><code>package dev.themainthread.meridian.resource;

import com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter;
import dev.themainthread.meridian.service.Article;
import dev.themainthread.meridian.service.ArticleListResponse;
import dev.themainthread.meridian.service.ArticleService;
import dev.themainthread.meridian.service.CreateArticleRequest;
import io.quarkus.security.Authenticated;
import jakarta.annotation.security.PermitAll;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.DefaultValue;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.NotFoundException;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.QueryParam;
import jakarta.ws.rs.core.Context;
import jakarta.ws.rs.core.HttpHeaders;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.core.UriBuilder;
import org.eclipse.microprofile.openapi.annotations.Operation;
import org.eclipse.microprofile.openapi.annotations.responses.APIResponse;

@ApplicationScoped
@Path("/api/v1/articles")
public class ArticlesResource {

    private static final MediaType TEXT_MARKDOWN_TYPE =
        new MediaType("text", "markdown");

    private static final FlexmarkHtmlConverter HTML_TO_MARKDOWN =
        FlexmarkHtmlConverter.builder().build();

    @Inject
    ArticleService articleService;

    @GET
    @PermitAll
    @Produces(MediaType.APPLICATION_JSON)
    @Operation(
        summary = "Search and list public articles",
        description = """
            Searches public knowledge articles by title and body text. When the q
            parameter is present, results are ordered by search relevance. Without q,
            results are ordered by publication date. This endpoint does not require
            authentication.
            """)
    @APIResponse(responseCode = "200", description = "Paginated article list")
    public ArticleListResponse listArticles(
            @QueryParam("q") String query,
            @QueryParam("page") @DefaultValue("0") int page,
            @QueryParam("size") @DefaultValue("20") int size) {
        return articleService.search(query, page, size);
    }

    @POST
    @Authenticated
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces(MediaType.APPLICATION_JSON)
    public Response createArticle(CreateArticleRequest request) {
        Article created = articleService.create(request.title(), request.content());
        return Response.created(
                UriBuilder.fromPath("/api/v1/articles").path(created.id()).build())
            .entity(created)
            .build();
    }

    @GET
    @PermitAll
    @Path("/{id}/content")
    public Response getArticleContent(@PathParam("id") String id,
                                      @Context HttpHeaders headers) {
        Article article = articleService.findById(id)
            .orElseThrow(NotFoundException::new);

        if (acceptsMarkdown(headers)) {
            String markdown = article.markdownContent();
            if (markdown == null || markdown.isBlank()) {
                markdown = HTML_TO_MARKDOWN.convert(article.htmlContent());
            }

            return Response.ok(markdown, TEXT_MARKDOWN_TYPE)
                .header("Vary", HttpHeaders.ACCEPT)
                .header("X-Markdown-Tokens", estimateTokens(markdown))
                .header("Content-Signal", "search=yes, ai-input=yes, ai-train=no")
                .build();
        }

        return Response.ok(article.htmlContent(), MediaType.TEXT_HTML_TYPE)
            .header("Vary", HttpHeaders.ACCEPT)
            .build();
    }

    private boolean acceptsMarkdown(HttpHeaders headers) {
        for (MediaType requested : headers.getAcceptableMediaTypes()) {
            if (requested.isWildcardType() || requested.isWildcardSubtype()) {
                return false;
            }
            if (requested.isCompatible(TEXT_MARKDOWN_TYPE)) {
                return true;
            }
            if (requested.isCompatible(MediaType.TEXT_HTML_TYPE)) {
                return false;
            }
        }
        return false;
    }

    private int estimateTokens(String value) {
        return Math.max(1, value.length() / 4);
    }
}
</code></code></pre><p>For production, store Markdown at authoring time and render HTML from it. Converting HTML back to Markdown is fine as a migration bridge. I would not make that the long-term content model, because some structure is already gone by the time you scrape rendered HTML back into text.</p><p>The <code>Vary: Accept</code> header is the cache safety bit. Without it, a CDN can cache the HTML response and serve it to a client that asked for Markdown. That failure looks like &#8220;the agent is dumb&#8221; until you check the headers.</p><div><hr></div><h2><strong>Content Usage Signals</strong></h2><p>The Markdown endpoint above sets <code>Content-Signal</code> on article content. For the JAX-RS responses, use a small response filter. Static files are served by the HTTP layer before this filter matters, so keep crawler-facing signals directly in <code>robots.txt</code>.</p><pre><code><code>package dev.themainthread.meridian.filter;

import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerResponseContext;
import jakarta.ws.rs.container.ContainerResponseFilter;
import jakarta.ws.rs.ext.Provider;
import java.io.IOException;

@Provider
public class ContentSignalsFilter implements ContainerResponseFilter {

    @Override
    public void filter(ContainerRequestContext requestContext,
                       ContainerResponseContext responseContext) throws IOException {
        // Use getPath(true): RESTEasy Reactive (Quarkus REST) rejects getPath(false)
        // with "We do not support non-decoded parameters".
        String path = requestContext.getUriInfo().getPath(true);

        if (path.startsWith("api/v1/articles") || path.startsWith("/api/v1/articles")) {
            responseContext.getHeaders().putSingle(
                "Content-Signal",
                "search=yes, ai-input=yes, ai-train=no");
            return;
        }

        if (path.startsWith(".well-known")
            || path.startsWith("/.well-known")
            || path.equals("robots.txt")
            || path.equals("/robots.txt")
            || path.equals("llms.txt")
            || path.equals("/llms.txt")) {
            responseContext.getHeaders().putSingle(
                "Content-Signal",
                "search=yes, ai-input=yes, ai-train=yes");
        }
    }
}
</code></code></pre><p>With <code>getPath(true)</code>, the path may include a leading slash depending on the runtime. Match both <code>api/v1/...</code> and <code>/api/v1/...</code>. Also avoid <code>getPath(false)</code> on Quarkus REST (RESTEasy Reactive): it throws <code>IllegalArgumentException</code> because non-decoded paths are not supported.</p><div><hr></div><h2><strong>OAuth Discovery with RFC 9728</strong></h2><p>For protected resources, use <a href="https://www.rfc-editor.org/rfc/rfc9728">RFC 9728 OAuth 2.0 Protected Resource Metadata</a>. This tells a client which authorization server protects the API and which scopes are worth asking for.</p><p>You do not need to hand-write the metadata document in Quarkus. <code>quarkus-oidc</code> has protected resource metadata support, and it is disabled by default for a good reason: publishing authorization-server details is a deliberate choice.</p><p>Keep the shared OIDC behavior separate from the production host names:</p><pre><code><code>quarkus.oidc.client-id=meridian-api
quarkus.oidc.application-type=service

quarkus.oidc.resource-metadata.enabled=true
quarkus.oidc.resource-metadata.scopes=read,write,admin
</code></code></pre><p>Then set production URLs in the production profile:</p><pre><code><code>%prod.meridian.api.base-url=https://api.meridian.dev
%prod.quarkus.oidc.auth-server-url=https://auth.meridian.dev/realms/meridian
%prod.quarkus.oidc.resource-metadata.resource=${meridian.api.base-url}
</code></code></pre><p>For local verification, use a dev profile. Leave <code>quarkus.oidc.auth-server-url</code> unset in dev if you want Quarkus Dev Services to start Keycloak for you. Quarkus forces HTTPS resource metadata by default, which is correct for production and annoying for localhost:</p><pre><code><code>%dev.meridian.api.base-url=http://localhost:8080
%dev.quarkus.oidc.resource-metadata.resource=${meridian.api.base-url}
%dev.quarkus.oidc.resource-metadata.force-https-scheme=false
</code></code></pre><p>With that enabled, the default protected resource metadata route is:</p><pre><code><code>/.well-known/oauth-protected-resource
</code></code></pre><p>A client that calls a protected endpoint without a token should also see a challenge like this:</p><pre><code><code>HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer resource_metadata="https://api.meridian.dev/.well-known/oauth-protected-resource"
</code></code></pre><p>The parameter name is <code>resource_metadata</code>. Do not use <code>as_uri</code> here. That is not the RFC 9728 contract, and some OAuth-aware clients will simply miss the metadata URL.</p><p>Treat <code>jwks_uri</code> carefully. In RFC 9728 it belongs to the protected resource, for cases where the resource signs responses. It is separate from the Keycloak realm certificate endpoint used to verify access tokens. Meridian does not sign resource responses, so we leave it out.</p><p>Metadata is still only metadata. It does not enforce scopes. Protect your write methods with <code>io.quarkus.security.Authenticated</code>, <code>@PermissionsAllowed</code>, route permissions, or your normal Keycloak authorization setup. The discovery document helps the client ask for the right token; it does not make a bad token good.</p><div><hr></div><h2><strong>API Catalog with RFC 9727</strong></h2><p>OpenAPI tells a client what operations exist. The <a href="https://www.rfc-editor.org/rfc/rfc9727.html">RFC 9727 API Catalog</a> tells it where to find those API descriptions.</p><p>Quarkus already exposes OpenAPI at <code>/q/openapi</code>; request JSON with <code>/q/openapi?format=json</code>. The catalog at <code>/.well-known/api-catalog</code> should be a Linkset document, not a custom <code>{ "apis": [...] }</code> object.</p><p>Create a resource for the catalog:</p><pre><code><code>package dev.themainthread.meridian.resource;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.HEAD;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.Response;
import java.util.List;
import java.util.Map;
import org.eclipse.microprofile.config.inject.ConfigProperty;

@ApplicationScoped
@Path("/.well-known")
public class ApiCatalogResource {

    private static final String API_CATALOG_TYPE =
        "application/linkset+json; profile=\"https://www.rfc-editor.org/info/rfc9727\"";

    @ConfigProperty(name = "meridian.api.base-url")
    String apiBaseUrl;

    @HEAD
    @Path("/api-catalog")
    public Response apiCatalogHead() {
        return DiscoveryResource.withDiscoveryLinks(Response.noContent(), apiBaseUrl)
            .build();
    }

    @GET
    @Path("/api-catalog")
    @Produces("application/linkset+json")
    public Response apiCatalog() {
        Map&lt;String, Object&gt; catalog = Map.of(
            "linkset", List.of(
                Map.of(
                    "anchor", apiBaseUrl + "/api/v1",
                    "service-desc", List.of(
                        Map.of(
                            "href", apiBaseUrl + "/q/openapi?format=json",
                            "type", "application/json"
                        )
                    ),
                    "service-doc", List.of(
                        Map.of(
                            "href", "https://docs.meridian.dev",
                            "type", "text/html"
                        )
                    ),
                    "status", List.of(
                        Map.of(
                            "href", apiBaseUrl + "/q/health",
                            "type", "application/json"
                        )
                    )
                )
            )
        );

        return DiscoveryResource.withDiscoveryLinks(Response.ok(catalog), apiBaseUrl)
            .type(API_CATALOG_TYPE)
            .build();
    }
}
</code></code></pre><p>Keep the catalog small. It points to the machine-readable descriptions that already exist; it is not the place to rebuild your whole API model.</p><p>The <code>listArticles</code> method above also shows why OpenAPI descriptions matter. &#8220;Returns a list&#8221; is not enough. A planner needs to know when to call the operation, how the results are ordered, and whether it needs a token.</p><div><hr></div><h2><strong>MCP Tool Access</strong></h2><p>The Quarkiverse MCP HTTP extension exposes the MCP endpoint at <code>/mcp</code> for the current Streamable HTTP transport and <code>/mcp/sse</code> for older SSE clients. The default root path is already <code>/mcp</code>, so only set it when you want to be explicit:</p><pre><code><code>quarkus.mcp.server.http.root-path=/mcp
quarkus.mcp.server.server-info.name=meridian
quarkus.mcp.server.server-info.title=Meridian Knowledge API
quarkus.mcp.server.server-info.version=1.0.0
quarkus.mcp.server.server-info.description=Tools for searching and reading Meridian articles.
</code></code></pre><p>Protect the endpoint if the tools expose anything sensitive:</p><pre><code><code>quarkus.http.auth.permission.mcp.paths=/mcp,/mcp/*
quarkus.http.auth.permission.mcp.policy=authenticated
</code></code></pre><p>Then implement the tools:</p><pre><code><code>package dev.themainthread.meridian.mcp;

import com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter;
import dev.themainthread.meridian.service.Article;
import dev.themainthread.meridian.service.ArticleService;
import io.quarkiverse.mcp.server.Tool;
import io.quarkiverse.mcp.server.ToolArg;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import java.util.stream.Collectors;

@ApplicationScoped
public class MeridianMcpTools {

    private static final FlexmarkHtmlConverter HTML_TO_MARKDOWN =
        FlexmarkHtmlConverter.builder().build();

    @Inject
    ArticleService articleService;

    @Tool(description = "Search public Meridian articles by keyword or phrase.")
    public String searchArticles(
            @ToolArg(description = "Search keyword or phrase") String query) {
        return articleService.search(query, 0, 10).items()
            .stream()
            .map(article -&gt; article.id() + ": " + article.title())
            .collect(Collectors.joining("\n"));
    }

    @Tool(description = "Read one Meridian article as Markdown.")
    public String getArticle(
            @ToolArg(description = "Article ID") String id) {
        return articleService.findById(id)
            .map(MeridianMcpTools::markdownOrConverted)
            .orElse("No article found for ID `" + id + "`.");
    }

    private static String markdownOrConverted(Article article) {
        String markdown = article.markdownContent();
        if (markdown != null &amp;&amp; !markdown.isBlank()) {
            return markdown;
        }
        return HTML_TO_MARKDOWN.convert(article.htmlContent());
    }
}
</code></code></pre><p>Tool output is not a web page. Return Markdown or compact plain text. HTML in a tool response usually gives the model more tokens and less meaning. That trade is bad and not even interesting.</p><p>Cloudflare checks MCP Server Card discovery before a client connects. The MCP protocol still negotiates capabilities during <code>initialize</code>, so the card is a map, not the source of truth. Keep it small and regenerate it when tools change.</p><p>Create <code>src/main/java/dev/themainthread/meridian/resource/McpServerCardResource.java</code>:</p><pre><code><code>package dev.themainthread.meridian.resource;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
import java.util.List;
import java.util.Map;
import org.eclipse.microprofile.config.inject.ConfigProperty;

@ApplicationScoped
@Path("/.well-known")
public class McpServerCardResource {

    @ConfigProperty(name = "meridian.api.base-url")
    String apiBaseUrl;

    @GET
    @Path("/mcp/server-card.json")
    @Produces(MediaType.APPLICATION_JSON)
    public Response serverCard() {
        return Response.ok(card()).build();
    }

    @GET
    @Path("/mcp.json")
    @Produces(MediaType.APPLICATION_JSON)
    public Response legacyServerCard() {
        return Response.ok(card()).build();
    }

    private Map&lt;String, Object&gt; card() {
        return Map.of(
            "$schema",
            "https://static.modelcontextprotocol.io/schemas/mcp-server-card/v1.json",
            "version",
            "1.0",
            "protocolVersion",
            "2025-06-18",
            "serverInfo",
            Map.of(
                "name", "meridian",
                "title", "Meridian Knowledge API",
                "version", "1.0.0"),
            "description",
            "Search and read public Meridian knowledge articles.",
            "transport",
            Map.of(
                "type", "streamable-http",
                "endpoint", apiBaseUrl + "/mcp"),
            "authentication",
            Map.of(
                "required", true,
                "schemes", List.of("bearer"),
                "resourceMetadata", apiBaseUrl + "/.well-known/oauth-protected-resource"),
            "tools",
            List.of(
                Map.of(
                    "name", "searchArticles",
                    "title", "Search articles",
                    "description", "Search public Meridian articles by keyword or phrase.",
                    "inputSchema", Map.of(
                        "type", "object",
                        "properties", Map.of(
                            "query", Map.of(
                                "type", "string",
                                "description", "Search keyword or phrase")),
                        "required", List.of("query"))),
                Map.of(
                    "name", "getArticle",
                    "title", "Read article",
                    "description", "Read one Meridian article as Markdown.",
                    "inputSchema", Map.of(
                        "type", "object",
                        "properties", Map.of(
                            "id", Map.of(
                                "type", "string",
                                "description", "Article ID")),
                        "required", List.of("id")))));
    }
}
</code></code></pre><p>I serve both <code>/.well-known/mcp/server-card.json</code> and <code>/.well-known/mcp.json</code> because the discovery convention is still settling and different tools have looked in different places. That is mildly annoying, but cheaper than asking agents to guess.</p><div><hr></div><h2><strong>Agent Skills Discovery</strong></h2><p><a href="https://agentskills.io/specification">Agent Skills</a> are still just instruction bundles: a directory with a <code>SKILL.md</code> file, optional scripts, references, and assets. Cloudflare&#8217;s scanner also looks for the proposed discovery index at <code>/.well-known/agent-skills/index.json</code>, so Meridian publishes a tiny index and a single skill document.</p><p>The useful distinction is this: Quarkus serves the skill, but it does not execute the skill. The document teaches an agent how to use the API after the agent chooses to load it. Keep it narrow, version it with the service, and avoid stuffing policy novels into it. Agents are very good at following stale instructions with confidence.</p><p>Create <code>src/main/java/dev/themainthread/meridian/resource/AgentSkillsResource.java</code>:</p><pre><code><code>package dev.themainthread.meridian.resource;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HexFormat;
import java.util.List;
import java.util.Map;
import org.eclipse.microprofile.config.inject.ConfigProperty;

@ApplicationScoped
@Path("/.well-known/agent-skills")
public class AgentSkillsResource {

    private static final String DISCOVERY_SCHEMA =
        "https://schemas.agentskills.io/discovery/0.2.0/schema.json";

    private static final String SKILL_MD = """
        ---
        name: meridian
        description: Search and read Meridian knowledge articles. Use when a user asks for Meridian article IDs, public article search, or Markdown article content.
        ---

        # Meridian

        Use the public search endpoint first unless the user already provided an article ID.

        ## Search

        Call:

        ```text
        GET https://api.meridian.dev/api/v1/articles?q={query}
        ```

        Use returned article IDs for follow-up reads.

        ## Read

        Call:

        ```text
        GET https://api.meridian.dev/api/v1/articles/{id}/content
        Accept: text/markdown
        ```

        Prefer Markdown content. Do not scrape HTML unless Markdown is unavailable.
        """;

    @ConfigProperty(name = "meridian.api.base-url")
    String apiBaseUrl;

    @GET
    @Path("/index.json")
    @Produces(MediaType.APPLICATION_JSON)
    public Response index() {
        Map&lt;String, Object&gt; index = Map.of(
            "$schema",
            DISCOVERY_SCHEMA,
            "skills",
            List.of(Map.of(
                "name",
                "meridian",
                "type",
                "skill-md",
                "description",
                "Search and read Meridian knowledge articles.",
                "url",
                apiBaseUrl + "/.well-known/agent-skills/meridian/SKILL.md",
                "digest",
                sha256Digest(SKILL_MD))));

        return Response.ok(index).build();
    }

    @GET
    @Path("/meridian/SKILL.md")
    @Produces("text/markdown")
    public Response skill() {
        return Response.ok(SKILL_MD, "text/markdown")
            .header("Cache-Control", "public, max-age=3600")
            .build();
    }

    private static String sha256Digest(String value) {
        try {
            byte[] digest = MessageDigest.getInstance("SHA-256")
                .digest(value.getBytes(StandardCharsets.UTF_8));
            return "sha256:" + HexFormat.of().formatHex(digest);
        } catch (NoSuchAlgorithmException e) {
            throw new IllegalStateException("SHA-256 is required by the Java platform", e);
        }
    }
}
</code></code></pre><p>The digest is not decorative. Discovery clients can use it to notice when the served skill changed. If your skill grows beyond one small file, move to an archive artifact and validate the archive like you would any other executable input from the Internet.</p><h2><strong>CORS and Browser-Hosted Agents</strong></h2><p>If browser-hosted clients call your API, expose the headers they need to read. Keep the origins explicit for a protected API. <code>*</code> is fine for a public static demo and a poor default for anything with write paths.</p><pre><code><code>quarkus.http.cors.enabled=true
quarkus.http.cors.origins=https://app.meridian.dev,https://inspector.modelcontextprotocol.io
quarkus.http.cors.methods=GET,POST,PUT,DELETE,OPTIONS
quarkus.http.cors.headers=Accept,Authorization,Content-Type
quarkus.http.cors.exposed-headers=Content-Signal,Link,X-Markdown-Tokens,Vary,WWW-Authenticate
quarkus.http.cors.access-control-max-age=1H
quarkus.http.cors.access-control-allow-credentials=false
</code></code></pre><p>The property is <code>quarkus.http.cors.enabled</code>, not <code>quarkus.http.cors</code>. The older short form appears in many examples, and examples are how configuration drift learns to travel. If your browser client uses cookies instead of bearer tokens, revisit <code>access-control-allow-credentials</code>; for this API, bearer tokens in the <code>Authorization</code> header keep the browser credential rules simpler.</p><h2><strong>Verify the Chain</strong></h2><p>Run the application locally first:</p><pre><code><code>quarkus dev</code></code></pre><p>Then check the static discovery files:</p><pre><code><code>curl -i http://localhost:8080/
curl -i http://localhost:8080/robots.txt
curl -i http://localhost:8080/sitemap.xml
curl -i http://localhost:8080/llms.txt</code></code></pre><p>You should see <code>200 OK</code>. On <code>/</code>, check that the response includes <code>Link</code> headers for the API catalog, OpenAPI document, MCP Server Card, and Agent Skills index. For <code>robots.txt</code>, check that the <code>Content-Signal</code> line is in the body.</p><p>Check Markdown negotiation:</p><pre><code><code>curl -i http://localhost:8080/api/v1/articles/intro-to-meridian/content \
  -H "Accept: text/markdown"</code></code></pre><p>Expected headers:</p><pre><code><code>HTTP/1.1 200 OK
Content-Type: text/markdown
Vary: Accept
X-Markdown-Tokens: ...
Content-Signal: search=yes, ai-input=yes, ai-train=no</code></code></pre><p>Check the API catalog:</p><pre><code><code>curl -i http://localhost:8080/.well-known/api-catalog \
  -H "Accept: application/linkset+json"</code></code></pre><p>Expected headers:</p><pre><code><code>HTTP/1.1 200 OK
Content-Type: application/linkset+json; profile="https://www.rfc-editor.org/info/rfc9727"</code></code></pre><p>Check the required <code>HEAD</code> response as well:</p><pre><code><code>curl -I http://localhost:8080/.well-known/api-catalog</code></code></pre><p>Expected header:</p><pre><code><code>Link: &lt;http://localhost:8080/.well-known/api-catalog&gt;; rel="api-catalog"; type="application/linkset+json"</code></code></pre><p>Expected body shape:</p><pre><code><code>{
  "linkset": [
    {
      "anchor": "http://localhost:8080/api/v1",
      "service-desc": [
        {
          "href": "http://localhost:8080/q/openapi?format=json",
          "type": "application/json"
        }
      ]
    }
  ]
}</code></code></pre><p>Check protected resource metadata:</p><pre><code><code>curl -i http://localhost:8080/.well-known/oauth-protected-resource</code></code></pre><p>Expected body fields include:</p><pre><code><code>{
  "resource": "http://localhost:8080",
  "authorization_servers": [
    "http://localhost:8180/realms/quarkus"
  ],
  "scopes_supported": [
    "read",
    "write",
    "admin"
  ]
}</code></code></pre><p>The exact <code>authorization_servers</code> URLs depend on your Dev Services setup. When Docker is unavailable, Quarkus may start an embedded OIDC dev service instead of Keycloak; the array can contain that base URL (without a <code>/realms/...</code> path). In production, the same field should point at your real issuer, for example <code>https://auth.meridian.dev/realms/meridian</code>.</p><p>Then hit a protected endpoint without a token (your write method should use <code>io.quarkus.security.Authenticated</code>; there is no <code>jakarta.annotation.security.Authenticated</code>):</p><pre><code><code>curl -i http://localhost:8080/api/v1/articles \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"title":"Draft","content":"No token yet"}'</code></code></pre><p>Expected result:</p><pre><code><code>HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer resource_metadata="..."</code></code></pre><p>For MCP, use a real MCP client or MCP Inspector with a bearer token against:</p><pre><code><code>http://localhost:8080/mcp</code></code></pre><p>For older SSE clients, use the compatibility endpoint:</p><pre><code><code>http://localhost:8080/mcp/sse</code></code></pre><p>Do not call the MCP endpoint with plain <code>curl</code> and expect a friendly REST document. It is a protocol endpoint, not a brochure.</p><p>You can still check the pre-connection discovery documents with <code>curl</code>:</p><pre><code><code>curl -i http://localhost:8080/.well-known/mcp/server-card.json
curl -i http://localhost:8080/.well-known/mcp.json
curl -i http://localhost:8080/.well-known/agent-skills/index.json
curl -i http://localhost:8080/.well-known/agent-skills/meridian/SKILL.md</code></code></pre><p>The server card should name the <code>streamable-http</code> transport and point at <code>http://localhost:8080/mcp</code>. The skills index should contain a <code>skills</code> array with a <code>skill-md</code> entry and a <code>sha256:</code> digest.</p><h2><strong>What I Would Not Ship Blindly</strong></h2><p>Some adjacent ideas are worth watching, but I would not make the main tutorial depend on them.</p><p><strong>MCP Server Cards</strong> are useful as pre-connection hints, but I would not let the card drift from the real MCP implementation. Let MCP <code>initialize</code> do the authoritative capability exchange and treat the card as an index.</p><p><strong>Agent Skills</strong> are useful as packaged instructions. Keep the skill small and specific. A giant skill that restates your whole developer portal is just documentation drift with YAML frontmatter.</p><p><strong>A2A Agent Cards</strong> matter when your service is itself an agent that other agents should delegate to. Meridian is an API with MCP tools, not an A2A server, so adding an A2A card here would be theater.</p><p><strong>WebMCP</strong> is aimed at browser-exposed tool surfaces. It is worth watching, but it does not change this server-side Quarkus API.</p><p><strong>x402 and other agent payment protocols</strong> are moving quickly. x402 has real momentum after the Linux Foundation announcement in April 2026, but paid API flows need product, fraud, accounting, and support decisions. Cloudflare&#8217;s live scanner also lists MPP, UCP, and ACP; that is a separate article. Maybe in the future.</p><p><strong>Web Bot Auth</strong> is becoming more practical through Cloudflare and AWS WAF support. Use it when bot identity matters for enforcement. It is not required for the discovery flow we built here.</p><h2><strong>Summary</strong></h2><p>Making a Quarkus API easier for agents is mostly about removing guesswork.</p><p><code>robots.txt</code>, <code>sitemap.xml</code>, <code>llms.txt</code>, and Link headers give clients a first map. Markdown content negotiation keeps article bodies cheap to read. Content Signals make usage intent explicit. Quarkus OIDC can publish RFC 9728 protected resource metadata when you opt in. RFC 9727 gives your OpenAPI document a proper catalog. The Quarkiverse MCP extension gives tool-capable clients a protocol endpoint, and the server card plus Agent Skills index make that tool surface easier to discover before a client connects.</p><p>None of this turns a bad API into a good one. It just makes the good parts discoverable without asking an agent to reverse-engineer your service by failing requests at it. Boring repeatability wins again.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[When to Use Java Records vs Builders in a Quarkus API]]></title><description><![CDATA[Use a small OrderDesk API to keep simple DTOs as records, staged order assembly readable with a builder, and request validation at the edge.]]></description><link>https://www.the-main-thread.com/p/java-records-builders-quarkus</link><guid isPermaLink="false">https://www.the-main-thread.com/p/java-records-builders-quarkus</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Thu, 28 May 2026 06:08:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4ac199ab-cf75-4fb3-a06a-dfe9b48d238d_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A constructor change shipped on Friday. Two <code>String</code> parameters swapped places in a twelve-argument <code>OrderDto</code> call. Everything compiled. Invoices went out with shipping addresses in the customer name field until someone noticed the pattern in support tickets on Monday.</p><p>Records fixed the boilerplate part of that story. The staged assembly problem stayed exactly where it was: inventory, pricing, shipping, and fraud checks keep adding fields until the canonical constructor turns into a minefield.</p><p>We build <strong>OrderDesk</strong>, a small Quarkus API that shows where records are enough on their own, where a hand-written builder still earns its keep, and how Bean Validation fits on record request bodies. The sample uses <strong>Quarkus 3.35.2</strong>.</p><h2><strong>What we build</strong></h2><p><strong>OrderDesk</strong> is a compact e-commerce slice that:</p><ul><li><p>exposes <code>GET /products</code> with simple <strong>record</strong> DTOs;</p></li><li><p>exposes <code>GET /orders/sample</code> with a multi-field <strong>OrderDto</strong> assembled through <strong>OrderDtoBuilder</strong> and a staged <strong>OrderAssemblyService</strong>;</p></li><li><p>exposes <code>POST /orders</code> with a validated <strong>CreateOrderRequest</strong> record;</p></li><li><p>ships unit and <code>@QuarkusTest</code> coverage for builder invariants and HTTP validation.</p></li></ul><h2><strong>What you need</strong></h2><p>You have written Jakarta REST resources before and know what a DTO is for.</p><ul><li><p>JDK <strong>21</strong></p></li><li><p><strong>Quarkus CLI</strong> or Maven</p></li><li><p><code>curl</code> for manual checks</p></li><li><p>About two &#9749;&#65039;&#9749;&#65039;</p></li></ul><h2><strong>Project setup</strong></h2><p>Create the project:</p><pre><code><code>quarkus create app com.orderdesk:orderdesk-records-builders \
  --extension='quarkus-rest-jackson,hibernate-validator' \
  --java=21 \
  --no-code
cd orderdesk-records-builders</code></code></pre><p>Extensions:</p><ul><li><p><code>quarkus-rest-jackson</code> &#8212; REST endpoints and Jackson JSON serialization for records</p></li><li><p><code>hibernate-validator</code> &#8212; Bean Validation on request bodies and method parameters</p></li></ul><p>Use package <code>com.orderdesk</code> for application code.</p><h2><strong>Simple record DTOs</strong></h2><p>Before records, a three-field product DTO was mostly ceremony: fields, constructor, getters, <code>equals</code>, <code>hashCode</code>, and often Lombok because nobody wanted to maintain it by hand.</p><p>A record states the contract in one place &#8212; immutable data, value semantics, transparent state:</p><pre><code><code>package com.orderdesk;

import java.math.BigDecimal;

public record ProductDto(
        Long id,
        String name,
        BigDecimal price
) {
}</code></code></pre><p>Jackson deserializes records through the canonical constructor on current Quarkus, so this sample does not need extra annotations.</p><h3><strong>Product catalog and list endpoint</strong></h3><pre><code><code>package com.orderdesk;

import java.math.BigDecimal;
import java.util.List;
import java.util.Map;
import java.util.Optional;

import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class ProductCatalog {

    private static final Map&lt;Long, ProductDto&gt; PRODUCTS = Map.of(
            1L, new ProductDto(1L, "Mechanical Keyboard", new BigDecimal("149.99")),
            2L, new ProductDto(2L, "Vertical Mouse", new BigDecimal("89.99")));

    public List&lt;ProductDto&gt; listAll() {
        return PRODUCTS.keySet().stream()
                .sorted()
                .map(PRODUCTS::get)
                .toList();
    }

    public Optional&lt;ProductDto&gt; findById(long id) {
        return Optional.ofNullable(PRODUCTS.get(id));
    }
}</code></code></pre><pre><code><code>package com.orderdesk;

import java.util.List;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;

@Path("/products")
public class ProductResource {

    private final ProductCatalog catalog;

    public ProductResource(ProductCatalog catalog) {
        this.catalog = catalog;
    }

    @GET
    public List&lt;ProductDto&gt; listProducts() {
        return catalog.listAll();
    }
}</code></code></pre><p>Start dev mode:</p><pre><code><code>./mvnw quarkus:dev</code></code></pre><p>List products:</p><pre><code><code>curl -s http://localhost:8080/products | jq</code></code></pre><p>You should see two products with <code>id</code>, <code>name</code>, and <code>price</code>. <code>Map.of</code> is fine for lookup by id, but do not assume <code>values()</code> iteration order &#8212; sort keys (or use a <code>LinkedHashMap</code>) when the API must return a stable list. This is the sweet spot: small immutable carriers, no construction pipeline, no builder.</p><h2><strong>When the canonical constructor stops scaling</strong></h2><p>Real order DTOs grow. Cart submission, inventory confirmation, tax, shipping, fraud scoring, addresses &#8212; fields arrive in stages from different steps. The record still fits as the <strong>immutable result</strong>:</p><pre><code><code>package com.orderdesk;

import java.math.BigDecimal;
import java.time.Instant;
import java.util.List;

public record OrderDto(
        String orderId,
        String customerId,
        List&lt;ProductDto&gt; products,
        BigDecimal subtotal,
        BigDecimal tax,
        BigDecimal shipping,
        BigDecimal total,
        String currency,
        String shippingAddress,
        String billingAddress,
        String status,
        Integer fraudScore,
        Instant createdAt
) {
    public OrderDto {
        if (orderId == null || orderId.isBlank()) {
            throw new IllegalArgumentException("orderId must not be blank");
        }
    }
}</code></code></pre><p>The compact constructor is the right place for <strong>single-field invariants</strong> that must hold on every instance. Cross-field rules like subtotal matching line items or shipping being required for physical goods deserve a different home, so we put those on the builder.</p><p>Calling <code>new OrderDto(...)</code> with a dozen positional arguments is where teams swap two <code>String</code> values and ship garbage. Named assembly fixes that without giving up record immutability.</p><h2><strong>OrderDtoBuilder</strong></h2><p>The builder is mutable scratch space. <code>build()</code> turns that scratch space into the record once, with validation and defensive copies:</p><pre><code><code>package com.orderdesk;

import java.math.BigDecimal;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;

public class OrderDtoBuilder {

    private String orderId;
    private String customerId;
    private List&lt;ProductDto&gt; products = new ArrayList&lt;&gt;();
    private BigDecimal subtotal = BigDecimal.ZERO;
    private BigDecimal tax = BigDecimal.ZERO;
    private BigDecimal shipping = BigDecimal.ZERO;
    private BigDecimal total = BigDecimal.ZERO;
    private String currency = "EUR";
    private String shippingAddress;
    private String billingAddress;
    private String status = "CREATED";
    private Integer fraudScore = 0;
    private Instant createdAt = Instant.now();

    public OrderDtoBuilder orderId(String orderId) {
        this.orderId = orderId;
        return this;
    }

    public OrderDtoBuilder customerId(String customerId) {
        this.customerId = customerId;
        return this;
    }

    public OrderDtoBuilder addProduct(ProductDto product) {
        this.products.add(product);
        return this;
    }

    public OrderDtoBuilder subtotal(BigDecimal subtotal) {
        this.subtotal = subtotal;
        return this;
    }

    public OrderDtoBuilder tax(BigDecimal tax) {
        this.tax = tax;
        return this;
    }

    public OrderDtoBuilder shipping(BigDecimal shipping) {
        this.shipping = shipping;
        return this;
    }

    public OrderDtoBuilder total(BigDecimal total) {
        this.total = total;
        return this;
    }

    public OrderDtoBuilder currency(String currency) {
        this.currency = currency;
        return this;
    }

    public OrderDtoBuilder shippingAddress(String shippingAddress) {
        this.shippingAddress = shippingAddress;
        return this;
    }

    public OrderDtoBuilder billingAddress(String billingAddress) {
        this.billingAddress = billingAddress;
        return this;
    }

    public OrderDtoBuilder status(String status) {
        this.status = status;
        return this;
    }

    public OrderDtoBuilder fraudScore(Integer fraudScore) {
        this.fraudScore = fraudScore;
        return this;
    }

    List&lt;ProductDto&gt; productsSnapshot() {
        return List.copyOf(products);
    }

    BigDecimal totalSnapshot() {
        return total;
    }

    public OrderDto build() {
        if (customerId == null || customerId.isBlank()) {
            throw new IllegalStateException("Customer ID is required");
        }

        if (products.isEmpty()) {
            throw new IllegalStateException("At least one product is required");
        }

        if (orderId == null || orderId.isBlank()) {
            throw new IllegalStateException("Order ID is required");
        }

        return new OrderDto(
                orderId,
                customerId,
                List.copyOf(products),
                subtotal,
                tax,
                shipping,
                total,
                currency,
                shippingAddress,
                billingAddress,
                status,
                fraudScore,
                createdAt);
    }
}</code></code></pre><p><code>.shipping(new BigDecimal("9.99"))</code> is easier to read at 2am than the seventh anonymous argument to a constructor. Defaults like <code>currency = "EUR"</code> and <code>status = "CREATED"</code> live in one place instead of spreading across call sites.</p><h2><strong>Staged enrichment with OrderAssemblyService</strong></h2><p>This is the part records do not solve on their own. One builder instance walks through inventory, pricing, shipping, and fraud enrichment before <code>build()</code> freezes the record.</p><pre><code><code>package com.orderdesk;

import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.UUID;

import org.jboss.logging.Logger;

import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class OrderAssemblyService {

    private static final Logger LOG = Logger.getLogger(OrderAssemblyService.class);

    private static final BigDecimal TAX_RATE = new BigDecimal("0.19");
    private static final BigDecimal SHIPPING_FLAT = new BigDecimal("9.99");

    private final ProductCatalog catalog;

    public OrderAssemblyService(ProductCatalog catalog) {
        this.catalog = catalog;
    }

    public OrderDto buildSampleOrder(String orderId) {
        ProductDto keyboard = catalog.findById(1L).orElseThrow();
        ProductDto mouse = catalog.findById(2L).orElseThrow();

        OrderDtoBuilder builder = new OrderDtoBuilder()
                .orderId(orderId)
                .customerId("customer-42")
                .addProduct(keyboard)
                .addProduct(mouse)
                .shippingAddress("Main Street 10")
                .billingAddress("Main Street 10");

        applyInventory(builder);
        applyPricing(builder);
        applyShipping(builder);
        applyFraud(builder);

        return builder.build();
    }

    public OrderDto assembleFromRequest(CreateOrderRequest request) {
        OrderDtoBuilder builder = new OrderDtoBuilder()
                .orderId(UUID.randomUUID().toString())
                .customerId(request.customerId())
                .shippingAddress(request.shippingAddress())
                .billingAddress(request.shippingAddress());

        for (Long productId : request.productIds()) {
            ProductDto product = catalog.findById(productId)
                    .orElseThrow(() -&gt; new IllegalArgumentException("Unknown product: " + productId));
            builder.addProduct(product);
        }

        applyInventory(builder);
        applyPricing(builder);
        applyShipping(builder);
        applyFraud(builder);

        OrderDto order = builder.build();
        LOG.infof("Assembled order %s for customer %s", order.orderId(), order.customerId());
        return order;
    }

    private void applyInventory(OrderDtoBuilder builder) {
        builder.status("INVENTORY_CONFIRMED");
        LOG.debug("Inventory enrichment applied");
    }

    private void applyPricing(OrderDtoBuilder builder) {
        BigDecimal subtotal = builder.productsSnapshot().stream()
                .map(ProductDto::price)
                .reduce(BigDecimal.ZERO, BigDecimal::add);
        BigDecimal tax = subtotal.multiply(TAX_RATE).setScale(2, RoundingMode.HALF_UP);
        BigDecimal totalBeforeShipping = subtotal.add(tax);

        builder.subtotal(subtotal).tax(tax).total(totalBeforeShipping).status("PRICED");
        LOG.debugf("Pricing enrichment applied: subtotal=%s tax=%s", subtotal, tax);
    }

    private void applyShipping(OrderDtoBuilder builder) {
        BigDecimal totalWithShipping = builder.totalSnapshot().add(SHIPPING_FLAT);
        builder.shipping(SHIPPING_FLAT).total(totalWithShipping).status("SHIPPING_QUOTED");
        LOG.debug("Shipping enrichment applied");
    }

    private void applyFraud(OrderDtoBuilder builder) {
        int score = builder.totalSnapshot().compareTo(new BigDecimal("200")) &gt; 0 ? 12 : 5;
        builder.fraudScore(score).status("READY");
        LOG.debugf("Fraud enrichment applied: score=%d", score);
    }
}</code></code></pre><p>Each <code>apply*</code> method stands in for another service in a larger system. The builder is the assembly boundary. The record is what you hand back to REST or messaging once the work is done.</p><h3><strong>Order endpoints</strong></h3><pre><code><code>package com.orderdesk;

import org.eclipse.microprofile.config.inject.ConfigProperty;

import jakarta.validation.Valid;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/orders")
@Produces(MediaType.APPLICATION_JSON)
public class OrderResource {

    private final OrderAssemblyService assemblyService;
    private final String sampleOrderId;

    public OrderResource(
            OrderAssemblyService assemblyService,
            @ConfigProperty(name = "orderdesk.sample-order-id", defaultValue = "ORD-2026-001") String sampleOrderId) {
        this.assemblyService = assemblyService;
        this.sampleOrderId = sampleOrderId;
    }

    @GET
    @Path("/sample")
    public OrderDto sampleOrder() {
        return assemblyService.buildSampleOrder(sampleOrderId);
    }

    @POST
    @Consumes(MediaType.APPLICATION_JSON)
    public OrderDto createOrder(@Valid CreateOrderRequest request) {
        return assemblyService.assembleFromRequest(request);
    }
}</code></code></pre><p>Optional config in <code>application.properties</code> keeps the sample order id stable in logs:</p><pre><code><code>orderdesk.sample-order-id=ORD-2026-001</code></code></pre><p>Fetch the sample order:</p><pre><code><code>curl -s http://localhost:8080/orders/sample | jq '.orderId, .status, .total'</code></code></pre><p>Expect <code>ORD-2026-001</code>, <code>READY</code>, and a computed total that includes tax and flat shipping.</p><h2><strong>Validated request records</strong></h2><p>Ingress DTOs are usually the easy case: immutable data carriers with validation at the edge. Bean Validation annotations on record components work the same as on classes:</p><pre><code><code>package com.orderdesk;

import java.util.List;

import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.NotEmpty;

public record CreateOrderRequest(
        @NotBlank String customerId,
        @NotEmpty List&lt;Long&gt; productIds,
        @NotBlank String shippingAddress
) {
}</code></code></pre><p>Create an order:</p><pre><code><code>curl -s -X POST http://localhost:8080/orders \
  -H "Content-Type: application/json" \
  -d '{
    "customerId": "customer-77",
    "productIds": [1],
    "shippingAddress": "Tech Street 42"
  }' | jq '.customerId, .status, .products | length'</code></code></pre><p>Send an empty payload and Quarkus returns <strong>HTTP 400</strong> with constraint violations before your assembly service gets a chance to build anything.</p><h2><strong>Where builders are overkill</strong></h2><p>A builder on a three-field <code>ProductDto</code> buys you very little. There are no staged steps, no cross-field rules, and no enrichment pipeline. Use the record constructor and move on.</p><p>My rule of thumb:</p><ul><li><p><strong>Small immutable DTO</strong> &#8594; record only</p></li><li><p><strong>Staged or multi-step construction</strong> &#8594; record + builder (or factory) at the assembly boundary</p></li><li><p><strong>Mutable domain object</strong> &#8594; regular class</p></li><li><p><strong>ORM entity</strong> &#8594; regular class, separate from API DTOs</p></li></ul><p>Wrapper records like <code>record CustomerId(String value)</code> help when several <code>String</code> parameters would otherwise look identical to the compiler. They are cheap insurance on large payloads; this sample keeps plain strings so the builder contrast stays obvious.</p><h2><strong>Under load and across versions</strong></h2><p><strong>Thread confinement</strong> &#8212; create a new <code>OrderDtoBuilder</code> per request. Builders are mutable scratch pads. Putting one in an <code>@ApplicationScoped</code> bean and reusing it across threads will leak order state between customers. I have seen the same failure mode with reusable JAXB builders years ago; the shape changes, the bug does not.</p><p><strong>Validation boundaries</strong> &#8212; record compact constructors for invariants that must hold on every instance. Builders for cross-field assembly rules before the record exists. Bean Validation on ingress records for wire-format constraints. If you force all three concerns into one constructor, the result usually reads like a bug report.</p><p><strong>DTO evolution</strong> &#8212; records are strict. When upstream APIs add fields gradually, builders absorb defaults and optional steps more gracefully than widening every <code>new OrderDto(...)</code> call site.</p><p><strong>Jackson</strong> &#8212; records serialize cleanly in current Quarkus. Builders still help when you normalize inconsistent external JSON into one internal DTO shape before <code>build()</code>.</p><h2><strong>Prove it</strong></h2><p>Run the test suite:</p><pre><code><code>./mvnw test</code></code></pre><p>Seven tests cover builder invariants, record compact-constructor rejection, product listing, sample order shape, successful POST assembly, and validation failures on bad POST bodies.</p><p>Manual smoke checks while <code>quarkus:dev</code> is running:</p><pre><code><code>curl -s http://localhost:8080/products
curl -s http://localhost:8080/orders/sample
curl -s -X POST http://localhost:8080/orders \
  -H "Content-Type: application/json" \
  -d '{"customerId":"","productIds":[],"shippingAddress":""}'</code></code></pre><p>The last call should return <strong>400</strong>, not a half-built order.</p><h2><strong>Closing</strong></h2><p>Records removed most of the DTO boilerplate I used to carry. Careful construction still matters when objects grow, enrich in stages, or need defaults and cross-field checks before they exist.</p><p>OrderDesk is intentionally small: records for the simple shapes, a builder at the assembly boundary, validation split across ingress annotations, builder <code>build()</code>, and the record compact constructor. That split is the point.</p><p>The finished sample lives in the <code>orderdesk-records-builders</code><a href="https://github.com/myfear/the-main-thread/tree/main/orderdesk-records-builders"> repository</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Code LLM Language Support Is a Workflow Risk Question]]></title><description><![CDATA[A practical way to evaluate tokenizers, benchmark bias, framework awareness, and agent tooling before you trust a model in a real repository.]]></description><link>https://www.the-main-thread.com/p/code-llm-language-support</link><guid isPermaLink="false">https://www.the-main-thread.com/p/code-llm-language-support</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Wed, 27 May 2026 06:08:17 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/eafd0d25-946f-4be1-a039-b007f3f9f18d_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Open almost any code model product page and you will find the same promise in slightly different clothes: supports more than 100 programming languages.</p><p>That line sounds clean and authoritative. It also hides most of the detail developers actually need.</p><p>In an IDE, language support has visible edges. Java support means parsers, navigation, refactoring, formatting, debugger integration, build awareness, and at least some framework understanding. When the support is thin, the editor embarrasses itself in public.</p><p>Code LLMs work on a looser boundary. A model can read the source text, imitate the syntax, and still feel unreliable once the task moves past a short example. That is where the confusion starts. A first demo in Kotlin or COBOL or PL/SQL often works well enough to create optimism. Then the real repository shows up. Imports drift. Framework habits disappear. The agent invents APIs, rewrites build files like it met Maven five minutes ago, or writes Java that reads like Python wearing office clothes.</p><p>So the real question is not &#8220;does the model support my language?&#8221; It is &#8220;which layers of support am I actually getting?&#8221;</p><p>For code LLMs, language support is a stack. The tokenizer has to represent the source efficiently. The training data has to include enough examples to build real patterns. The benchmark has to measure something close to your work. The agent stack has to recover the right files, builds, generated sources, and framework context. By the time a vendor compresses all of that into one word, most of the useful signal is gone.</p><h2><strong>The Tokenizer Sets the Floor</strong></h2><p>The tokenizer is the first bottleneck.</p><p>Before the model can reason about code, it has to see code as tokens. Common fragments in popular languages usually collapse into efficient token sequences because they appear over and over in tokenizer training and model pretraining. Rare constructs get split into smaller and noisier pieces, which means the same source file burns more context window and arrives at the model with weaker internal structure.</p><p>You feel that cost quickly. Long files become harder to hold together. Cross-file reasoning gets shakier. Identifiers start drifting. Odd syntax grows brittle. The model may still &#8220;know&#8221; the language in the loose marketing sense, but it knows it through poorer building blocks.</p><p>This explains some surprisingly uneven behavior. Two languages can have roughly similar training exposure and still feel very different in practice because one of them is more expensive for the tokenizer to represent. Shared structure helps too. A model with heavy Java exposure often does better in Kotlin than you would guess from Kotlin&#8217;s share alone because the type system, package layout, inheritance patterns, and framework habits overlap. That transfer fades once you move into languages with different shapes and different idioms.</p><p>If I were evaluating a new coding model, I would start here. Paste a real file, not a neat sample, and see how the system behaves once the source gets long, dense, and slightly ugly.</p><h2><strong>Benchmarks Explain a Lot of the Overconfidence</strong></h2><p>The benchmark story matters because it trained the whole industry to talk about code ability in a very narrow way.</p><p><a href="https://arxiv.org/abs/2107.03374">HumanEval</a> gave the field an early focal point: 164 handwritten Python tasks scored by whether generated answers passed the tests. It taught everyone to celebrate a kind of success that maps poorly to enterprise development. HumanEval says a lot about short function synthesis. It says very little about framework habits, repository structure, build logic, or safe multi-file edits.</p><p>The benchmark picture is broader now. <a href="https://arxiv.org/abs/2208.08227">MultiPL-E</a> expanded evaluation across languages. <a href="https://arxiv.org/abs/2306.03091">RepoBench</a> pushed toward repository-level completion. <a href="https://www.swebench.com/SWE-bench/">SWE-bench</a>, <a href="https://www.swebench.com/multilingual">SWE-bench Multilingual</a>, and <a href="https://arxiv.org/abs/2504.02605">Multi-SWE-bench</a> all move closer to actual software engineering work.</p><p>The most relevant example for this article is <a href="https://github.com/Tencent-Hunyuan/AutoCodeBenchmark">Tencent Hunyuan&#8217;s AutoCodeBench</a>. The project describes a full set with 3,920 problems spread evenly across 20 programming languages, plus lighter and completion-style subsets. That balanced coverage already makes it more useful for language-support discussions than the older Python-centered staples. The associated ICLR 2026 paper also groups Java with Python, C++, and C# in its &#8220;popular languages&#8221; slice, then compares that group with lower-resource languages such as Racket, Shell, Elixir, and TypeScript.</p><p>That split surfaces two different truths at once. First, multilingual balance matters. Python-only comfort scores hide a lot. Second, even a stronger multilingual benchmark still measures a particular kind of coding work. AutoCodeBench covers more languages and raises the difficulty, but it remains a code-generation benchmark. It tells you more about benchmark Java than about enterprise Java.</p><p>That difference matters because Java usually looks healthier at the syntax and algorithm level than it does in a real service repository.</p><h2><strong>Java Is Where the Gap Becomes Obvious</strong></h2><p>Java gives vendors a flattering place to stand.</p><p>Models usually learn enough Java syntax to look competent early. The language is common, verbose in a helpful way, and heavily represented in public code. A model can post respectable results on a multilingual benchmark and still feel clumsy inside a Quarkus or Spring codebase a few minutes later.</p><p>Enterprise Java raises the bar in a very ordinary way. The hard part is not writing a class with the right braces. The hard part is fitting that class into CDI scopes, REST conventions, serialization behavior, annotation processors, test slices, generated sources, Maven or Gradle rules, extension configuration, and the local conventions the team built over time. A Quarkus service is Java plus a build model plus a runtime model plus a pile of framework habits that only make sense together.</p><p>That is why Java needs its own paragraph in this discussion. A benchmark that says &#8220;the model is decent at Java&#8221; can still be completely consistent with an engineer saying &#8220;this thing keeps producing awkward Quarkus code.&#8221; Those statements describe different levels of the ladder. One is language-level code generation. The other is ecosystem-level work in a repository that has history, tooling, and consequences.</p><p>Don&#8217;t forget to read more about the Quarkus Agent MCP in my earlier article:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;870e48dc-ff63-4beb-8a8b-81c53040b7fb&quot;,&quot;caption&quot;:&quot;Yesterday at IBM Bob Day, I was supposed to show the new Quarkus Agent MCP.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;My IBM Bob Day Demo Failed. Quarkus Agent MCP Got Better.&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:72758027,&quot;name&quot;:&quot;Markus Eisele&quot;,&quot;bio&quot;:&quot;I&#8217;ve spent 20+ years helping Java systems adapt without breaking. Here I share the architecture, tools, and thinking behind that work. Java Champion &#183; Developer &#183; IBM Research.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00dcb2da-5c09-46bf-a265-2a22ed32250b_800x800.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-06T06:08:19.561Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bd9a58f-0e9b-47a2-8950-6ebd2a3a96c0_1731x909.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.the-main-thread.com/p/quarkus-agent-mcp-ibm-bob&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196082783,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:4194688,&quot;publication_name&quot;:&quot;The Main Thread&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!8sdd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81643b8a-6240-4cd1-9f3a-8fd19cc3a455_254x254.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p>I see weak Java support show up less as loud failure and more as code that technically works while feeling wrong. Wrong annotations. Strange dependency choices. Reinvented framework features. Methods that compile and quietly ignore the architectural shape around them. That kind of output is annoying because it looks finished right up until somebody has to maintain it.</p><h2><strong>Agents Changed What &#8220;Support&#8221; Means</strong></h2><p>The base model still matters, but the surrounding stack now shapes the developer experience just as much.</p><p>A modern coding agent usually wraps the model with repository indexing, retrieval, syntax repair, AST parsing, build inspection, semantic search, diff awareness, and sometimes framework-specific helpers. Developers experience all of that as one tool, which means language support has become a property of the system, not just the model.</p><p>This is why two tools built on similar foundation models can feel wildly different in the same language. If one agent can inspect the build, trace imports, pull the right sibling files, and recognize generated sources before it writes a diff, the language suddenly feels much better supported. The model did not become wiser in the abstract. The system simply stopped asking it to guess so much.</p><p>Java benefits from that stack more than most marketing copy admits. A modest model with strong repository tooling can feel better in a Quarkus codebase than a stronger raw model that only sees the current file. That is also why language-support claims should be evaluated at system level. &#8220;Is this model good at Java?&#8221; is usually too small a question. &#8220;Can this toolchain help my team change Java code safely?&#8221; is much closer to the real one.</p><h2><strong>I Prefer a Support Ladder to a Checkbox</strong></h2><p>The cleanest way to talk about this is as a ladder.</p><ol><li><p><strong>Tokenizable</strong><br>The system can ingest the source text and produce syntax-shaped output.</p></li><li><p><strong>Snippet-capable</strong><br>It handles small isolated tasks, boilerplate, local completions, and benchmark-style functions reasonably well.</p></li><li><p><strong>Ecosystem-capable</strong><br>It works with mainstream frameworks, builds, tests, and repository conventions in a way that feels idiomatic rather than uncanny.</p></li><li><p><strong>Workflow-capable</strong><br>It survives long files, cross-file edits, refactors, generated code, build logic, and architectural consistency with a low enough error rate that you would trust it in serious work.</p></li></ol><p>Most vendor claims stop around level one or two and borrow the emotional confidence of level three or four. That is the whole problem.</p><p>Very few systems live at the top of this ladder across many languages. When they get close, they usually arrive with a lot of help: better retrieval, tighter workflows, language-aware tooling, fine-tuning, and careful context construction.</p><h2><strong>How I Would Evaluate Language Support</strong></h2><p>I would test five things.</p><ol><li><p><strong>Real-file behavior</strong><br>Paste production code, let the file get long, and watch for identifier drift, repeated fragments, truncation, or unstable completions.</p></li><li><p><strong>Fill-in-the-middle editing</strong><br>Ask the agent to complete a method inside an existing file while preserving style, imports, and the surrounding abstraction.</p></li><li><p><strong>Framework fit</strong><br>Make it touch the ecosystem that matters to your team: Quarkus CDI, Spring configuration, JPA mappings, Gradle or Maven, serializers, test fixtures, generated code.</p></li><li><p><strong>Multi-file reasoning</strong><br>Have it trace behavior across modules, update several files, and keep the interfaces intact.</p></li><li><p><strong>Refactoring discipline</strong><br>Ask for a change that needs restraint: rename an abstraction, migrate an API, keep the tests passing, avoid gratuitous build churn.</p></li></ol><p>Java deserves special treatment in this evaluation because the difference between &#8220;can write Java&#8221; and &#8220;can work in our Java system&#8221; is often huge. I would always include the build file, the framework configuration, and at least one generated or framework-owned edge in the test. Otherwise the evaluation stops right before the interesting part.</p><h2><strong>What To Do When Support Feels Thin</strong></h2><p>Thin support usually means the base model is carrying work that should have been distributed across the rest of the stack.</p><p>Better retrieval is usually the fastest improvement. When the system can pull the right interfaces, configs, generated sources, and neighboring files, a merely decent model often becomes much more useful because it stops improvising key details.</p><p>Language-aware tools help too. Build inspectors, AST tooling, language servers, test runners, framework detectors, and repository summaries can carry a lot of the burden that teams currently describe as &#8220;the model understanding Java.&#8221;</p><p>Fine-tuning or adapters also make sense when the underlying model is already close and the real gap sits around internal frameworks, modernization patterns, or house conventions. That path is a lot more practical than hoping one general model will absorb every proprietary DSL and every local habit by osmosis.</p><p>Context construction deserves more attention as well. A lot of supposed language weakness is really missing-context weakness. The model never saw the build file, the generated source, the parent abstraction, or the one configuration class that explains the rest of the service. The bug gets filed under &#8220;Java support&#8221; anyway because that is the label people have available.</p><h2><strong>The Sentence I Wish Vendors Would Use</strong></h2><p>Here is the version I would trust more:</p><blockquote><p>This system reads the language, performs well on small benchmark tasks, has uneven framework depth, and becomes much more useful when paired with retrieval and tooling tuned to your repository.</p></blockquote><p>That sentence has less marketing lift, but it describes the world developers actually live in.</p><p>For enterprise teams, the real issue is workflow support. Your files, your builds, your frameworks, your refactors, your generated sources, your failure modes, and your maintenance burden after the demo glow wears off. Once you look at language support through that lens, a lot of strange agent behavior becomes easier to explain. You are usually looking at partial support wearing the confidence of full support.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Quarkus Server-Rendered UI for Teams That Want Less Frontend Overhead]]></title><description><![CDATA[A practical reading path for building fast business UIs with Qute, HTMX, SSE, and pagination without dragging every screen into frontend theater.]]></description><link>https://www.the-main-thread.com/p/qute-htmx-server-rendered-ui</link><guid isPermaLink="false">https://www.the-main-thread.com/p/qute-htmx-server-rendered-ui</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Tue, 26 May 2026 06:08:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e4b58ed5-e922-4225-beef-0ad761328047_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Sometimes I just want to click &#8220;view source&#8221; and still understand how the page was built.</p><p>That preference is not nostalgia. It is usually a reaction to avoidable complexity. A lot of internal tools, back-office systems, dashboards, and workflow UIs do not need a full client-side state machine plus a small museum of JavaScript build tooling. They need to load fast, render predictably, and let the backend keep most of the application logic where the backend team can actually maintain it.</p><p>Quarkus is unusually good at this style when you combine Qute, HTMX, SSE, and a little discipline. This page gathers the relevant articles into one path so readers can move from &#8220;server-rendered UI sounds interesting&#8221; to &#8220;I can build a real interface this way and not feel embarrassed by it.&#8221;</p><h2><strong>Why I still like server-rendered UI</strong></h2><p>The strongest argument is not that JavaScript is bad. JavaScript is fine. The stronger argument is that many teams keep paying frontend complexity costs for features that never needed that level of machinery in the first place.</p><p>If your UI mostly shows data, collects input, performs searches, paginates lists, streams a few updates, and needs sane HTML semantics, a server-rendered approach can be both faster to build and easier to reason about. The trade-off is that you need to be deliberate about templates, fragment reuse, and small interaction patterns. That is where this cluster helps.</p><h2><strong>Start with the pieces that make the approach feel real</strong></h2><p>I would begin with the posts that show the core stack plainly instead of treating it like a retro curiosity.</p><ul><li><p><a href="https://www.the-main-thread.com/p/htmx-quarkus-server-rendered-ui-java">HTMX with Quarkus for server-rendered UI</a> is the natural starting point because it makes the interaction model concrete</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-qute-type-safe-templating">Qute type-safe templating</a> matters early, because maintainable templates beat clever templates every time</p></li><li><p><a href="https://www.the-main-thread.com/p/lean-business-ui-quarkus-qute-no-javascript">Lean business UI with Quarkus Qute and no JavaScript</a> is where the approach stops sounding theoretical and starts sounding like a reasonable platform choice</p></li></ul><p>That trio gives you the baseline: render HTML on the server, use HTMX for targeted interaction, and keep the template layer honest enough that refactors do not become archaeology.</p><h2><strong>Real interfaces live or die on composition</strong></h2><p>A lot of server-rendered demos look nice until the second page. Then shared layout, nested fragments, breadcrumbs, and error handling show up, and suddenly the &#8220;simple&#8221; stack needs adult supervision.</p><p>That is why I would move next into the composition and navigation posts:</p><ul><li><p><a href="https://www.the-main-thread.com/p/quarkus-qute-dynamic-includes-tutorial">Qute dynamic includes</a> helps when the page is no longer one static template</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-qute-dynamic-breadcrumb">Qute dynamic breadcrumb</a> is small, but it solves the sort of repeated UI problem that tells you whether the stack still feels pleasant after a week</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-custom-error-pages-rest-qute">Custom error pages with REST and Qute</a> matters because polished happy paths are easy and production errors are where interfaces become memorable for the wrong reason</p></li></ul><p>This part of the cluster is less flashy, which is exactly why it matters. Mature UI stacks are usually decided by how they handle repeated structure and awkward edges, not by how elegant the first demo looked on a Friday afternoon.</p><h2><strong>Interaction does not need a giant frontend rewrite</strong></h2><p>The usual objection is that server-rendered UI cannot stay interactive enough. I think that objection is often based on a stale mental model.</p><p>You can do a lot with targeted partial updates, infinite scroll, streaming events, and a few carefully chosen endpoints. You do not need to rebuild the web every time a list grows longer than one page.</p><p>This set makes that case well:</p><ul><li><p><a href="https://www.the-main-thread.com/p/quarkus-cursor-pagination-infinite-scroll">Cursor pagination with infinite scroll</a> shows the simple version</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-cursor-pagination-full-text-search-infinite-scroll">Cursor pagination, full-text search, and infinite scroll</a> pushes the pattern into a more realistic data-heavy flow</p></li><li><p><a href="https://www.the-main-thread.com/p/real-time-iss-tracker-quarkus-sse-qute-java">Real-time ISS tracker with SSE and Qute</a> adds streaming updates without dragging the whole architecture into SPA territory</p></li></ul><p>For me, this is the part where server-rendered UI stops being a style preference and becomes a practical engineering option. Once you see live updates and richer search flows handled this way, the &#8220;you obviously need a huge frontend framework&#8221; reflex starts looking less obvious.</p><h2><strong>There is still room for bigger abstractions when you need them</strong></h2><p>Not every reader wants the lightest possible stack. Some want more framework help. Some are migrating from Spring MVC. Some want a fuller web app feel with routing and conventions already in place.</p><p>That is where these articles fit:</p><ul><li><p><a href="https://www.the-main-thread.com/p/spring-boot-to-quarkus-qute-migration-guide">Spring Boot to Quarkus with Qute</a> is the bridge if your team still thinks in Spring web terms</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-renarde-full-stack-java-web-tutorial">Quarkus Renarde full-stack Java web tutorial</a> is useful when you want a more opinionated web layer on top of the same general philosophy</p></li><li><p><a href="https://www.the-main-thread.com/p/build-twitter-clone-quarkus-kafka-qute">Build a Twitter clone with Quarkus, Kafka, and Qute</a> is a bigger example for readers who need proof that the approach scales past toy screens</p></li></ul><p>I like having these in the cluster because they keep the conversation honest. The choice is not between &#8220;plain HTML forever&#8221; and &#8220;single-page app for everything.&#8221; There is a middle ground, and Quarkus gives you several ways to stand there comfortably.</p><h2><strong>The reading order I would use</strong></h2><p>If I were building a serious server-rendered UI path for Quarkus readers, I would use this order:</p><ol><li><p><a href="https://www.the-main-thread.com/p/htmx-quarkus-server-rendered-ui-java">HTMX with Quarkus for server-rendered UI</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-qute-type-safe-templating">Qute type-safe templating</a></p></li><li><p><a href="https://www.the-main-thread.com/p/lean-business-ui-quarkus-qute-no-javascript">Lean business UI with Quarkus Qute and no JavaScript</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-qute-dynamic-includes-tutorial">Qute dynamic includes</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-qute-dynamic-breadcrumb">Qute dynamic breadcrumb</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-custom-error-pages-rest-qute">Custom error pages with REST and Qute</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-cursor-pagination-infinite-scroll">Cursor pagination with infinite scroll</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-cursor-pagination-full-text-search-infinite-scroll">Cursor pagination, full-text search, and infinite scroll</a></p></li><li><p><a href="https://www.the-main-thread.com/p/real-time-iss-tracker-quarkus-sse-qute-java">Real-time ISS tracker with SSE and Qute</a></p></li></ol><p>If your team is migrating from Spring, read the Spring migration piece after the first three. If your team wants a fuller framework opinion, bring Renarde in sooner.</p><h2><strong>What this cluster is trying to prove</strong></h2><p>I want readers to come away with a much simpler idea than &#8220;Qute versus React&#8221; or &#8220;HTMX versus JavaScript.&#8221; Those comparisons get noisy fast and they often miss the point.</p><p>The better question is this: what is the cheapest architecture that still gives you a clear, maintainable, responsive interface for the job in front of you? For a surprising number of business applications, the answer is still server-rendered HTML with a few smart interaction patterns and a backend that owns its behavior openly. Quarkus is very good at that kind of work, and this cluster should make the case without treating plain HTML like a guilty secret.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Quarkus Graceful Shutdown That Holds Up During Rolling Deploys]]></title><description><![CDATA[A hands-on guide to readiness, shutdown delay, timeout, and Kubernetes timing so in-flight requests finish cleanly during rolling deploys.]]></description><link>https://www.the-main-thread.com/p/quarkus-graceful-shutdown</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-graceful-shutdown</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Mon, 25 May 2026 06:08:54 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2cf12763-e7ed-47ac-ab3f-3f260c9bb2a6_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Rolling deploys are when I find out a service was never really ready to go away. The pod gets a <code>SIGTERM</code>, the load balancer still has the instance in rotation for another probe interval, and a payment handoff that was fine a second ago comes back as <code>503 Service Unavailable</code> or just drops the connection. The app looked healthy in unit tests. Production was doing a choreographed shutdown and nobody told the JVM.</p><p>Quarkus has had graceful shutdown settings for a while. In <a href="https://quarkus.io/blog/quarkus-3-32-released/">Quarkus 3.32</a> the HTTP stack got a meaningful upgrade (<a href="https://github.com/quarkusio/quarkus/pull/50975">PR #50975</a>): during shutdown it tries to <strong>answer</strong> requests instead of spraying <code>503</code> at everything. That helps. It does not replace the deploy protocol you still need: fail readiness, stop new traffic, let in-flight HTTP finish, then exit.</p><p>We build <strong>OrderBridge</strong>, a tiny payment handoff API, then prove shutdown behavior with a script that keeps a long request in flight while the JVM receives <code>SIGTERM</code>. The sample uses <strong>Quarkus 3.35.2</strong>.</p><h2><strong>What we build</strong></h2><p><strong>OrderBridge</strong> is a small service that:</p><ul><li><p>exposes <code>GET /orders/{id}</code> for a quick status check;</p></li><li><p>exposes <code>POST /orders/handoff</code> that simulates a five-second payment gateway call;</p></li><li><p>exposes SmallRye Health liveness and readiness on <code>/q/health/live</code> and <code>/q/health/ready</code>;</p></li><li><p>logs startup, shutdown delay, and shutdown events;</p></li><li><p>ships two Quarkus profiles: <code>naive</code> (defaults) and <code>graceful</code> (timeout + delay);</p></li><li><p>includes a bash script that terminates the packaged app mid-handoff so you can see the difference.</p></li></ul><h2><strong>What you need</strong></h2><p>I assume you have shipped Jakarta REST apps and have seen Kubernetes readiness probes in the wild. This is not a hello-world.</p><ul><li><p>JDK <strong>21</strong></p></li><li><p><strong>Quarkus CLI</strong> or Maven</p></li><li><p><code>curl</code> and bash for the shutdown script</p></li><li><p>About <strong>40 minutes</strong></p></li></ul><h2><strong>Project setup</strong></h2><p>Create the project:</p><pre><code><code>quarkus create app com.orderbridge:orderbridge-graceful-shutdown \
  --extension='rest-jackson,smallrye-health' \
  --java=21 \
  --no-code
cd orderbridge-graceful-shutdown
</code></code></pre><p>Extensions:</p><ul><li><p><code>rest-jackson</code> &#8212; JSON endpoints and the HTTP stack that participates in graceful shutdown</p></li><li><p><code>smallrye-health</code> &#8212; readiness and liveness probes</p></li></ul><p>Use package <code>com.orderbridge</code> for application code.</p><p>Add the Maven wrapper from the <a href="https://quarkus.io/guides/getting-started">Quarkus getting started guide</a> if your tree does not already include <code>./mvnw</code>, or clone the finished sample linked at the end.</p><h2><strong>Order status and payment handoff</strong></h2><p>Payment handoffs are the interesting case: they run for seconds while deploys happen for seconds. A fast status endpoint gives us something cheap to hit while the slow one is in flight.</p><h3><strong>DTO records</strong></h3><pre><code><code>package com.orderbridge;

public record HandoffRequest(String orderId, long amountCents) {
}
</code></code></pre><pre><code><code>package com.orderbridge;

public record HandoffResult(String orderId, String status, long elapsedMs) {
}
</code></code></pre><pre><code><code>package com.orderbridge;

public record OrderStatus(String orderId, String status) {
}
</code></code></pre><h3><strong>OrderService</strong></h3><p>The handoff sleeps to mimic an external gateway. Production code should use timeouts, cancellation, and idempotency keys &#8212; not <code>Thread.sleep</code>. For teaching shutdown, sleep is honest about &#8220;long request in flight.&#8221;</p><pre><code><code>package com.orderbridge;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

import org.eclipse.microprofile.config.inject.ConfigProperty;
import org.jboss.logging.Logger;

import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class OrderService {

    private static final Logger LOG = Logger.getLogger(OrderService.class);

    private final Map&lt;String, String&gt; statuses = new ConcurrentHashMap&lt;&gt;();

    @ConfigProperty(name = "orderbridge.handoff.delay-ms", defaultValue = "5000")
    long handoffDelayMs;

    public OrderStatus status(String orderId) {
        String status = statuses.getOrDefault(orderId, "CREATED");
        return new OrderStatus(orderId, status);
    }

    public HandoffResult handoff(HandoffRequest request) {
        long started = System.currentTimeMillis();
        LOG.infof("Starting payment handoff for order %s", request.orderId());
        statuses.put(request.orderId(), "HANDOFF_IN_PROGRESS");

        try {
            Thread.sleep(handoffDelayMs);
        } catch (InterruptedException interrupted) {
            Thread.currentThread().interrupt();
            statuses.put(request.orderId(), "HANDOFF_INTERRUPTED");
            throw new IllegalStateException("Payment handoff interrupted during shutdown", interrupted);
        }

        statuses.put(request.orderId(), "HANDOFF_COMPLETE");
        long elapsedMs = System.currentTimeMillis() - started;
        LOG.infof("Payment handoff complete for order %s in %d ms", request.orderId(), elapsedMs);
        return new HandoffResult(request.orderId(), "HANDOFF_COMPLETE", elapsedMs);
    }
}
</code></code></pre><h3><strong>OrderResource</strong></h3><pre><code><code>package com.orderbridge;

import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/orders")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class OrderResource {

    private final OrderService orderService;

    public OrderResource(OrderService orderService) {
        this.orderService = orderService;
    }

    @GET
    @Path("/{id}")
    public OrderStatus get(@PathParam("id") String orderId) {
        return orderService.status(orderId);
    }

    @POST
    @Path("/handoff")
    public HandoffResult handoff(HandoffRequest request) {
        return orderService.handoff(request);
    }
}
</code></code></pre><h3><strong>Base configuration</strong></h3><p>In <code>src/main/resources/application.properties</code>:</p><pre><code><code>orderbridge.handoff.delay-ms=5000

%script.quarkus.http.port=18080
%script.quarkus.log.console.level=INFO

%graceful.quarkus.shutdown.timeout=15s
%graceful.quarkus.shutdown.delay-enabled=true
%graceful.quarkus.shutdown.delay=5s

%test.orderbridge.handoff.delay-ms=100
%test.quarkus.shutdown.timeout=30s
%test.quarkus.shutdown.delay-enabled=true
%test.quarkus.shutdown.delay=2s
</code></code></pre><p>The <code>script</code> profile pins port <code>18080</code> so our shutdown script does not fight dev mode on <code>8080</code>. The <code>graceful</code> profile turns on the shutdown recipe we want in production. Tests use a 100 ms handoff so <code>./mvnw test</code> stays snappy.</p><p>Run dev mode and try the endpoints:</p><pre><code><code>./mvnw quarkus:dev
</code></code></pre><pre><code><code>curl -s http://localhost:8080/orders/ORD-1
curl -s -X POST http://localhost:8080/orders/handoff \
  -H 'Content-Type: application/json' \
  -d '{"orderId":"ORD-1","amountCents":2500}'
</code></code></pre><p>The handoff blocks for about five seconds, then returns <code>"status":"HANDOFF_COMPLETE"</code>.</p><h2><strong>Readiness vs liveness during shutdown</strong></h2><p><strong>Liveness</strong> answers: is the process alive? If this fails, Kubernetes restarts the pod.</p><p><strong>Readiness</strong> answers: should traffic be sent here? If this fails, the pod stays running but is removed from Service endpoints.</p><p>During a rolling update, you want readiness to flip <strong>DOWN</strong> while the instance can still finish work it already accepted. That is exactly what Quarkus shutdown delay is for. The <a href="https://quarkus.io/guides/lifecycle#graceful-shutdown">lifecycle guide</a> describes a delay window where HTTP still runs, but readiness reports down so orchestrators and load balancers stop sending <strong>new</strong> connections.</p><p>Check probes manually:</p><pre><code><code>curl -s http://localhost:8080/q/health/live
curl -s http://localhost:8080/q/health/ready
</code></code></pre><p>While the app is running, both return <code>"status":"UP"</code>.</p><h2><strong>See naive shutdown hurt the client</strong></h2><p>With no <code>quarkus.shutdown.timeout</code> and no delay, Quarkus does not wait for your handoff the way you expect. The server log may still show &#8220;handoff complete,&#8221; while the client sees a dead connection.</p><p>The module ships <code>scripts/demonstrate-shutdown.sh</code>. It packages the app, starts the JVM, fires a long <code>POST /orders/handoff</code>, sends <code>SIGTERM</code>, and polls readiness.</p><p>Run the <strong>naive</strong> profile (script port only &#8212; no graceful settings):</p><pre><code><code>./scripts/demonstrate-shutdown.sh naive
</code></code></pre><p>On my machine the handoff line looked like:</p><pre><code><code>handoff HTTP 000 (total 0.521690s)
</code></code></pre><p>HTTP <code>000</code> means curl never got a response &#8212; connection gone. The log often still shows the handoff finishing a few seconds later, which is the frustrating split-brain: server thought it was fine, client already gave up.</p><p>That is the bug we are fixing: shutdown was never part of the contract we tested.</p><h2><strong>Add </strong><code>quarkus.shutdown.timeout</code></h2><p><code>quarkus.shutdown.timeout</code> is a <strong>runtime</strong> setting. When set, Quarkus waits for active HTTP requests to complete before tearing down, up to the limit. The <a href="https://quarkus.io/guides/lifecycle#graceful-shutdown">lifecycle guide</a> documents it; it is off by default.</p><p>Add only the timeout first (you can use a throwaway profile or add it to <code>%graceful</code> later):</p><pre><code><code>quarkus.shutdown.timeout=15s
</code></code></pre><p><strong>Too short</strong> &#8212; in-flight handoffs get cut off; clients see resets or errors even though the pod had time to drain.</p><p><strong>Too long</strong> &#8212; deploys stall because Kubernetes will eventually send <code>SIGKILL</code> when <code>terminationGracePeriodSeconds</code> runs out.</p><p>Fifteen seconds is generous for our five-second fake gateway, but it leaves headroom for real network jitter.</p><h2><strong>Enable shutdown delay (readiness first)</strong></h2><p>Timeout alone does not tell the load balancer to stop <strong>new</strong> traffic early. For that you need delay:</p><ul><li><p><code>quarkus.shutdown.delay-enabled=true</code> &#8212; <strong>build time</strong>. You must package the app with this set; flipping it only at runtime is not enough.</p></li><li><p><code>quarkus.shutdown.delay</code> &#8212; <strong>runtime</strong> duration of the pre-shutdown phase.</p></li></ul><p>Our <code>%graceful</code> profile sets:</p><pre><code><code>%graceful.quarkus.shutdown.timeout=15s
%graceful.quarkus.shutdown.delay-enabled=true
%graceful.quarkus.shutdown.delay=5s
</code></code></pre><p>Package with the graceful profile baked in:</p><pre><code><code>./mvnw package -DskipTests -Dquarkus.profile=graceful,script
</code></code></pre><p>What happens on <code>SIGTERM</code> with delay enabled:</p><ol><li><p>Delay phase starts. Readiness goes <strong>DOWN</strong> (SmallRye reports a <code>Graceful Shutdown</code> check).</p></li><li><p>Existing connections can still complete work.</p></li><li><p>After the delay, Quarkus moves toward full shutdown, honoring <code>quarkus.shutdown.timeout</code> for active HTTP.</p></li><li><p>Process exits.</p></li></ol><p>Run the script in graceful mode:</p><pre><code><code>./scripts/demonstrate-shutdown.sh graceful
</code></code></pre><p>Excerpt from a successful run:</p><pre><code><code>-- Readiness before shutdown: UP --
ready HTTP 200
-- Polling readiness during shutdown --
[15:24:23] readiness HTTP 503
[15:24:24] readiness HTTP 503
...
-- Handoff result --
{"orderId":"ORD-SHUTDOWN","status":"HANDOFF_COMPLETE","elapsedMs":5003}
handoff HTTP 200 (total 5.101526s)
</code></code></pre><p>Readiness flips to <strong>503</strong> while the handoff still returns <strong>200</strong>. That is the protocol you want: orchestrator sees &#8220;not ready,&#8221; client already in flight gets an answer.</p><h2><strong>Lifecycle hooks for visibility</strong></h2><p>Observers make the sequence visible in logs &#8212; useful when someone swears &#8220;readiness failed too early.&#8221;</p><pre><code><code>package com.orderbridge;

import org.jboss.logging.Logger;

import io.quarkus.runtime.ShutdownEvent;
import io.quarkus.runtime.StartupEvent;
import io.quarkus.runtime.ShutdownDelayInitiatedEvent;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Observes;

@ApplicationScoped
public class ShutdownLifecycle {

    private static final Logger LOG = Logger.getLogger(ShutdownLifecycle.class);

    void onStart(@Observes StartupEvent event) {
        LOG.info("OrderBridge is ready to accept traffic");
    }

    void onShutdownDelay(@Observes ShutdownDelayInitiatedEvent event) {
        LOG.info("Shutdown delay started &#8212; readiness should fail while HTTP still winds down");
    }

    void onStop(@Observes ShutdownEvent event) {
        LOG.info("OrderBridge shutdown event fired");
    }
}
</code></code></pre><p>You can use <code>@ShutdownDelayInitiated</code> on a method instead of the event observer; behavior is the same. Methods marked that way run when delay starts &#8212; keep them fast, because they participate in the shutdown path.</p><p>In graceful mode you should see <code>Shutdown delay started</code> before <code>OrderBridge shutdown event fired</code>, and SmallRye log lines like <code>Reporting health down status</code> with a <code>Graceful Shutdown</code> check.</p><h2><strong>Kubernetes timing that actually matches Quarkus</strong></h2><p>Quarkus only controls what happens <strong>inside</strong> the pod after <code>SIGTERM</code>. Your platform still needs enough time and sensible probes.</p><p><code>terminationGracePeriodSeconds</code> &#8212; must be at least <strong>delay + timeout + buffer</strong>. With our example (5s delay + 15s timeout), I would not go below 25s; 30s is a comfortable default.</p><p><strong>Readiness probe </strong><code>periodSeconds</code><strong> and </strong><code>failureThreshold</code> &#8212; control how quickly the Service removes the pod from endpoints. If the probe period is 10s and threshold is 3, it can take ~30s before Kubernetes stops routing even after readiness is DOWN. Align that with how fast your ingress or service mesh drains.</p><p><code>preStop</code><strong> hooks</strong> &#8212; optional sleep can help legacy load balancers that ignore readiness. Prefer fixing probe and delay alignment first; arbitrary <code>sleep 15</code> in <code>preStop</code> hides misconfiguration.</p><p><strong>Liveness during shutdown</strong> &#8212; do not point liveness at something that fails during graceful drain unless you want the kubelet to restart a pod that is intentionally winding down.</p><h2><strong>What graceful shutdown does not cover</strong></h2><p>The <a href="https://quarkus.io/guides/lifecycle#graceful-shutdown">lifecycle guide</a> is explicit: only extensions that opt in participate. Today the documented graceful path is <strong>HTTP</strong>. Kafka consumers, scheduled jobs, and background queues need their own drain story.</p><p>Long business work still needs <strong>application-level timeouts</strong>. <code>quarkus.shutdown.timeout</code> bounds how long Quarkus waits on HTTP; it does not make an unbounded database migration safe.</p><p>Native image and dev mode follow the same configuration ideas, but always re-run your shutdown script on the artifact you actually deploy.</p><h2><strong>Prove it</strong></h2><p>Unit and resource tests:</p><pre><code><code>./mvnw test
</code></code></pre><p>Integration test against the packaged runner:</p><pre><code><code>./mvnw verify
</code></code></pre><p>Shutdown demonstrations:</p><pre><code><code>./scripts/demonstrate-shutdown.sh naive
./scripts/demonstrate-shutdown.sh graceful
</code></code></pre><p>Expect naive mode to show readiness dropping only when the process is already gone, and the handoff client to fail often. Expect graceful mode to show readiness <strong>503</strong> for several seconds and the handoff to finish with <strong>HTTP 200</strong>.</p><h2><strong>Closing</strong></h2><p>Graceful shutdown in Quarkus is a small set of properties, but the real work is the deploy protocol: readiness fails first, new traffic stops, in-flight HTTP gets a bounded chance to finish, then the process exits. The 3.32 HTTP improvements reduce surprise <code>503</code> responses during that window; they do not remove the need for delay and timeout.</p><p>When you wire this into a real service, copy the recipe: enable delay at <strong>build</strong> time, set delay and timeout for your slowest acceptable request, align Kubernetes termination and probes, and keep a script like ours that proves behavior under <code>SIGTERM</code>, not just in a happy-path integration test.</p><p>Source for the full sample lives in the <code>orderbridge-graceful-shutdown</code><a href="https://github.com/myfear/the-main-thread/tree/main/orderbridge-graceful-shutdown"> repository</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Quarkus API Error Contracts That Reduce Client Friction]]></title><description><![CDATA[A practical reading path for Java teams that want stable error payloads, cleaner HTTP responses, stronger OpenAPI contracts, and saner versioning.]]></description><link>https://www.the-main-thread.com/p/quarkus-api-errors-rfc9457</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-api-errors-rfc9457</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Sun, 24 May 2026 06:08:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5145662f-93b0-4340-829d-0eeeaebe8ec1_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Open any random internal service and there is a fair chance the error story still sounds like this: status code, vague message, stack trace if the framework got emotional, and a Slack thread later because no two consumers interpreted it the same way.</p><p>I do not think error handling is the glamorous part of API work. I do think it is one of the clearest ways to tell whether a team treats its API as a product or as a side effect. Good error responses shorten debugging, reduce support noise, and make clients simpler. Bad ones create folklore. Folklore always grows faster than docs.</p><p>This page pulls together the Quarkus pieces on API errors, HTTP responses, OpenAPI, versioning, and deprecation into one reading path. The common thread is simple: your API contract includes the unhappy path, and it gets expensive when you design that part last.</p><h2><strong>Start with the error contract, not the annotation pile</strong></h2><p>A lot of teams begin with framework mechanics. Which mapper should I write? Which exception class should I throw? Which annotation turns this into JSON?</p><p>Those are real implementation questions, but they come second. First you need to decide what a client can rely on when things go wrong. Is there a stable error shape? Are validation problems distinct from business conflicts? Can clients automate against the response, or do they need to guess from prose?</p><p>These are the right entry points:</p><ul><li><p><a href="https://www.the-main-thread.com/p/rfc-9457-quarkus-api-error-handling-swagger">RFC 9457 API error handling in Quarkus</a> is the central piece if you want a modern, explicit error format instead of improvised JSON</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-rfc9457-api-error-handling">Quarkus RFC 9457 API error handling</a> is a good companion when you want the same topic from a slightly different angle</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-rfc7807-error-handling-java">Quarkus RFC 7807 error handling</a> matters because many teams still encounter that terminology in existing systems and older discussions</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-http-response-guide-java-developers">Mastering HTTP responses in Quarkus</a> broadens the conversation beyond exceptions and into deliberate response design</p></li></ul><p>If I had to collapse the whole cluster into one sentence, it would be this: treat error payloads as part of the interface, not as whatever falls out after a thrown exception.</p><h2><strong>Documentation and runtime behavior need to agree</strong></h2><p>One of the more annoying failure modes in API work is when the docs tell a cleaner story than the implementation. The OpenAPI spec shows one thing, Swagger UI suggests another, and the running service still finds a third option under pressure.</p><p>That gap is not cosmetic. If your contract only exists in generated documentation, consumers will learn the truth from production incidents instead.</p><p>This is where I would go next:</p><ul><li><p><a href="https://www.the-main-thread.com/p/quarkus-openapi-java-coffee-api-tutorial">OpenAPI and Quarkus for Java developers</a> is the practical foundation for making the contract visible</p></li><li><p><a href="https://www.the-main-thread.com/p/springdoc-vs-quarkus-openapi-zero-config">Springdoc vs Quarkus OpenAPI</a> is helpful if your mental model still comes from the Spring world</p></li><li><p><a href="https://www.the-main-thread.com/p/implementing-zalando-restful-api">Implementing Zalando RESTful API guidelines</a> pushes the conversation from &#8220;valid JSON&#8221; to &#8220;consistent platform behavior&#8221;</p></li></ul><p>My bias here is boring consistency. I would rather ship a smaller API with one clear problem format, one predictable validation story, and one documented deprecation policy than a larger API that improvises every time it needs to say no.</p><h2><strong>The contract changes over time, which is where teams get sentimental</strong></h2><p>API versioning and deprecation discussions have a habit of becoming philosophical. They do not need to. The real question is usually much narrower: how do you evolve the contract without making client behavior weird, brittle, or expensive to support?</p><p>If your error story is already unstable, versioning gets even uglier because now both success and failure semantics can drift at the same time. That is how &#8220;backward compatible enough&#8221; becomes a quarterly ritual in damage control.</p><p>These pieces belong together:</p><ul><li><p><a href="https://www.the-main-thread.com/p/quarkus-api-versioning-strategies-java">Quarkus API versioning strategies</a> covers the mechanics and trade-offs of changing public contracts</p></li><li><p><a href="https://www.the-main-thread.com/p/ai-api-versioning-quarkus-3-strategies">AI API versioning with Quarkus</a> is worth reading if your endpoints front models or agentic workflows that change faster than the transport layer</p></li><li><p><a href="https://www.the-main-thread.com/p/when-to-deprecate-apis-java-quarkus-guide">When to deprecate APIs in Java and Quarkus</a> is the piece I would hand to anyone who thinks deprecation is just a note in a changelog</p></li></ul><p>The trade-off is not between purity and pragmatism. The trade-off is between making change explicit now or paying for hidden inconsistency later. I prefer the first bill.</p><h2><strong>A useful API error stack has layers</strong></h2><p>By this point the cluster is really about four connected layers.</p><ul><li><p><strong>Error shape</strong> gives clients something stable to parse</p></li><li><p><strong>HTTP response design</strong> keeps status codes and payload intent aligned</p></li><li><p><strong>OpenAPI and guidelines</strong> make the contract visible before runtime</p></li><li><p><strong>Versioning and deprecation</strong> keep that contract survivable after change</p></li></ul><p>Miss one of those layers and the others start compensating badly. Teams without a stable error shape tend to over-explain in docs. Teams without decent docs tend to lean on tribal knowledge. Teams without deprecation discipline keep old client assumptions alive for much longer than anyone admits in architecture reviews.</p><p>That is why I like this cluster as a group instead of as isolated articles. Each post solves one slice. Together they describe what mature API behavior feels like.</p><h2><strong>The reading order I would use</strong></h2><p>If your team is still improvising error responses, I would read these in order:</p><ol><li><p><a href="https://www.the-main-thread.com/p/rfc-9457-quarkus-api-error-handling-swagger">RFC 9457 API error handling in Quarkus</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-rfc7807-error-handling-java">Quarkus RFC 7807 error handling</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-http-response-guide-java-developers">Mastering HTTP responses in Quarkus</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-openapi-java-coffee-api-tutorial">OpenAPI and Quarkus for Java developers</a></p></li><li><p><a href="https://www.the-main-thread.com/p/implementing-zalando-restful-api">Implementing Zalando RESTful API guidelines</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-api-versioning-strategies-java">Quarkus API versioning strategies</a></p></li><li><p><a href="https://www.the-main-thread.com/p/when-to-deprecate-apis-java-quarkus-guide">When to deprecate APIs in Java and Quarkus</a></p></li></ol><p>If you are migrating from Spring, move <a href="https://www.the-main-thread.com/p/springdoc-vs-quarkus-openapi-zero-config">Springdoc vs Quarkus OpenAPI</a> earlier. If your biggest pain is client churn, move versioning and deprecation earlier. The shape of the path can change. The basic idea should not.</p><h2><strong>What this cluster should change in practice</strong></h2><p>After working through this material, I want the team to stop saying &#8220;we handle errors&#8221; when what they really mean is &#8220;exceptions eventually become JSON.&#8221;</p><p>Good API design is quieter than that. Clients know what failed. The payload shape stays stable enough to automate against. Status codes carry real intent. Docs tell the same story as runtime behavior. Deprecations arrive with a plan instead of a shrug. That is the standard worth aiming for, and Quarkus gives you enough control to get there without turning the service into annotation archaeology.</p>]]></content:encoded></item><item><title><![CDATA[Quarkus Reflection-Free Jackson Serializers: Migrate with Contract Tests]]></title><description><![CDATA[Use a small catalog API to enable reflection-free Jackson, keep tricky JSON payloads honest with contract tests, and benchmark the change without guessing.]]></description><link>https://www.the-main-thread.com/p/quarkus-reflection-free-jackson</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-reflection-free-jackson</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Sat, 23 May 2026 06:08:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0f1441e7-5162-40a8-b88d-6988d9f769b1_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Reflection-based JSON serialization is the kind of cost I can ignore right up to the moment I care about startup, native images, or a REST path that actually gets hit. Then it is suddenly everywhere: field introspection on every request, extra GraalVM reachability configuration, and one more place where JVM and native behavior can drift apart.</p><p>Quarkus has been chipping away at that problem with build-time metaprogramming: generated <code>StdSerializer</code> implementations that write JSON without poking through DTO fields via reflection at runtime. Mario Fusco walked through the mechanics and the ~12% throughput lift on a synthetic benchmark in the <a href="https://quarkus.io/blog/quarkus-metaprogramming/">metaprogramming blog post</a>. For a long time, the feature stayed opt-in behind <code>quarkus.rest.jackson.optimization.enable-reflection-free-serializers=true</code>.</p><p>In April 2026 the team <a href="https://quarkus.io/blog/reflection-free-jsckson-serializers/">announced planned default enablement in Quarkus 3.35</a>. That default did not flip in 3.35 after all. Community testing found edge cases, and the <a href="https://quarkus.io/blog/quarkus-3-35-released/">3.35 release notes</a> moved the change to <strong>3.36</strong>. For now, the safe mental model is still <strong>opt-in migration plus contract tests</strong>, not &#8220;flip a version and hope.&#8221;</p><p>We make that migration concrete on a small catalog API: baseline Jackson, reflection-free serializers on a profile, JSON regressions for generics and polymorphism, and a repeatable benchmark harness.</p><p>The sample uses <strong>Quarkus 3.35.2</strong>. Re-check the REST guide after upgrades because serializer coverage is moving quickly.</p><h2><strong>What we build</strong></h2><p>We end up with <strong>CatalogAPI</strong>, a product catalog service that:</p><ul><li><p>exposes CRUD on <code>/products</code> backed by PostgreSQL (Dev Services in dev/test);</p></li><li><p>returns <strong>record DTOs</strong>, a <strong>generic </strong><code>Page&lt;T&gt;</code><strong> envelope</strong>, and <strong>polymorphic catalog payloads</strong>;</p></li><li><p>serializes a <code>Money</code><strong> value type</strong> through a custom Jackson serializer registered with <code>ObjectMapperCustomizer</code>;</p></li><li><p>runs the same JSON contract tests under <strong>baseline</strong> and <code>reflection-free</code> profiles;</p></li><li><p>includes a script to compare cold startup and throughput between those profiles.</p></li></ul><h2><strong>What you need</strong></h2><p>I assume you already write Jakarta REST resources and have shipped Jackson DTOs before. This is a migration tutorial, not a Jackson primer.</p><ul><li><p>JDK <strong>21</strong></p></li><li><p><strong>Quarkus CLI</strong> or Maven</p></li><li><p>Docker or Podman if you want <strong>native image</strong> builds with containerized GraalVM</p></li><li><p>Basic Jackson annotations (<code>@JsonSubTypes</code>, custom serializers)</p></li><li><p>About <strong>45 minutes</strong></p></li></ul><h2><strong>Project setup</strong></h2><p>Create the project:</p><pre><code><code>quarkus create app com.catalogapi:catalogapi-reflection-free-jackson \
  --extension='quarkus-rest-jackson,hibernate-orm-panache,jdbc-postgresql,smallrye-openapi' \
  --java=21 \
  --no-code
</code></code></pre><p>We only need four extensions here:</p><ul><li><p><code>quarkus-rest-jackson</code> &#8212; REST endpoints and the Jackson stack we are migrating</p></li><li><p><code>hibernate-orm-panache</code> and <code>jdbc-postgresql</code> &#8212; small realistic persistence layer</p></li><li><p><code>smallrye-openapi</code> &#8212; generated OpenAPI so the service looks like an internal API, not a toy benchmark</p></li></ul><p>Use package <code>com.catalogapi</code> for application code and <code>com.catalogapi.json</code> for DTOs.</p><h2><strong>Catalog model and CRUD</strong></h2><h3><strong>Product entity</strong></h3><pre><code><code>package com.catalogapi;

import io.quarkus.hibernate.orm.panache.PanacheEntity;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;

@Entity
public class Product extends PanacheEntity {

    @Column(nullable = false, unique = true)
    public String sku;

    @Column(nullable = false)
    public String name;

    @Column(nullable = false)
    public int priceCents;

    @Column(nullable = false)
    public String category;
}
</code></code></pre><h3><strong>Seed data</strong></h3><p>Create <code>src/main/resources/import.sql</code>:</p><pre><code><code>INSERT INTO Product (id, sku, name, priceCents, category) VALUES (nextval('product_SEQ'), 'SKU-001', 'Mechanical Keyboard', 12999, 'peripherals');
INSERT INTO Product (id, sku, name, priceCents, category) VALUES (nextval('product_SEQ'), 'SKU-002', 'USB-C Hub', 4999, 'peripherals');
INSERT INTO Product (id, sku, name, priceCents, category) VALUES (nextval('product_SEQ'), 'SKU-003', '27-inch Monitor', 34999, 'displays');
INSERT INTO Product (id, sku, name, priceCents, category) VALUES (nextval('product_SEQ'), 'SKU-004', 'Desk Bundle', 52998, 'bundles');
</code></code></pre><p>Use <code>nextval('product_SEQ')</code> so Hibernate&#8217;s sequence stays aligned with fixed seed rows. If you only insert literal <code>id</code> values, the first <code>persist()</code> after startup can collide with the primary key.</p><h3><strong>Dev configuration</strong></h3><p>In <code>application.properties</code>:</p><pre><code><code>%dev.quarkus.datasource.db-kind=postgresql
%dev.quarkus.hibernate-orm.database.generation=drop-and-create
%dev.quarkus.hibernate-orm.sql-load-script=import.sql

%test.quarkus.datasource.db-kind=h2
%test.quarkus.datasource.jdbc.url=jdbc:h2:mem:catalogtest;DB_CLOSE_DELAY=-1;MODE=PostgreSQL
%test.quarkus.hibernate-orm.schema-management.strategy=drop-and-create
%test.quarkus.hibernate-orm.sql-load-script=import.sql
</code></code></pre><p>Dev mode uses PostgreSQL Dev Services. Tests use in-memory H2, so <code>./mvnw test</code> stays Docker-free on the machine running CI.</p><h3><strong>Product resource</strong></h3><pre><code><code>package com.catalogapi;

import java.util.List;

import org.eclipse.microprofile.openapi.annotations.Operation;
import org.eclipse.microprofile.openapi.annotations.tags.Tag;

import com.catalogapi.json.ProductInput;

import jakarta.transaction.Transactional;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.NotFoundException;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.PUT;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;

@Path("/products")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
@Tag(name = "Products")
public class ProductResource {

    @GET
    @Operation(summary = "List all products")
    public List&lt;Product&gt; list() {
        return Product.listAll();
    }

    @GET
    @Path("/{id}")
    @Operation(summary = "Get one product by id")
    public Product get(@PathParam("id") long id) {
        return Product.&lt;Product&gt;findByIdOptional(id)
                .orElseThrow(NotFoundException::new);
    }

    @POST
    @Transactional
    @Operation(summary = "Create a product")
    public Response create(ProductInput input) {
        Product product = new Product();
        product.sku = input.sku();
        product.name = input.name();
        product.priceCents = input.priceCents();
        product.category = input.category();
        product.persist();
        return Response.status(Response.Status.CREATED).entity(product).build();
    }

    @PUT
    @Path("/{id}")
    @Transactional
    @Operation(summary = "Update a product")
    public Product update(@PathParam("id") long id, ProductInput input) {
        Product product = Product.&lt;Product&gt;findByIdOptional(id)
                .orElseThrow(NotFoundException::new);
        product.sku = input.sku();
        product.name = input.name();
        product.priceCents = input.priceCents();
        product.category = input.category();
        return product;
    }
}
</code></code></pre><p><code>ProductInput</code> is just a small record:</p><pre><code><code>package com.catalogapi.json;

public record ProductInput(String sku, String name, int priceCents, String category) {
}
</code></code></pre><h3><strong>Prove CRUD</strong></h3><p>Start dev mode:</p><pre><code><code>./mvnw quarkus:dev
</code></code></pre><p>List products:</p><pre><code><code>curl -s http://localhost:8080/products | jq .
</code></code></pre><p>You should see the four seeded products. Fetch one next:</p><pre><code><code>curl -s http://localhost:8080/products/1 | jq .
</code></code></pre><p>Expected shape:</p><pre><code><code>{
  "id": 1,
  "sku": "SKU-001",
  "name": "Mechanical Keyboard",
  "priceCents": 12999,
  "category": "peripherals"
}
</code></code></pre><h2><strong>JSON edge cases worth testing before you flip the switch</strong></h2><p>Real services rarely stop at flat entities. CatalogAPI adds four patterns that tend to find serializer gaps quickly.</p><h3><strong>Record summaries with custom </strong><code>Money</code><strong> serialization</strong></h3><pre><code><code>package com.catalogapi.json;

public record Money(String currency, long amountMinor) {
}
</code></code></pre><pre><code><code>package com.catalogapi.json;

public record ProductSummary(long id, String sku, String name, Money price) {
}
</code></code></pre><pre><code><code>package com.catalogapi.json;

import java.io.IOException;

import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.databind.SerializerProvider;
import com.fasterxml.jackson.databind.ser.std.StdSerializer;

public class MoneySerializer extends StdSerializer&lt;Money&gt; {

    public MoneySerializer() {
        super(Money.class);
    }

    @Override
    public void serialize(Money value, JsonGenerator generator, SerializerProvider serializers) throws IOException {
        generator.writeStartObject();
        generator.writeStringField("currency", value.currency());
        generator.writeNumberField("amountMinor", value.amountMinor());
        generator.writeStringField("display", value.currency() + " " + formatMinor(value.amountMinor()));
        generator.writeEndObject();
    }

    private static String formatMinor(long amountMinor) {
        long major = amountMinor / 100;
        long minor = Math.abs(amountMinor % 100);
        return major + "." + (minor &lt; 10 ? "0" : "") + minor;
    }
}
</code></code></pre><p>Register the module at startup:</p><pre><code><code>package com.catalogapi.jackson;

import com.catalogapi.json.Money;
import com.catalogapi.json.MoneySerializer;
import com.fasterxml.jackson.databind.module.SimpleModule;

import io.quarkus.jackson.ObjectMapperCustomizer;
import jakarta.inject.Singleton;

@Singleton
public class CatalogJacksonCustomizer implements ObjectMapperCustomizer {

    @Override
    public void customize(com.fasterxml.jackson.databind.ObjectMapper mapper) {
        SimpleModule module = new SimpleModule();
        module.addSerializer(Money.class, new MoneySerializer());
        mapper.registerModule(module);
    }
}
</code></code></pre><p>I keep the entity-to-summary mapping in a tiny helper:</p><pre><code><code>package com.catalogapi;

import com.catalogapi.json.Money;
import com.catalogapi.json.ProductSummary;

final class ProductMapper {

    private ProductMapper() {
    }

    static ProductSummary toSummary(Product product) {
        return new ProductSummary(
                product.id,
                product.sku,
                product.name,
                new Money("USD", product.priceCents));
    }
}
</code></code></pre><p>Add <code>toMoney</code> next to <code>toSummary</code> in <code>ProductMapper</code>, then add two endpoints on <code>ProductResource</code> (imports for <code>Page</code> and <code>ProductSummary</code> are omitted below only where they are already on the class):</p><pre><code><code>    @GET
    @Path("/summaries")
    @Operation(summary = "List product summaries as records with custom Money serialization")
    public List&lt;ProductSummary&gt; summaries() {
        return Product.&lt;Product&gt;listAll().stream()
                .map(ProductMapper::toSummary)
                .toList();
    }

    @GET
    @Path("/page")
    @Operation(summary = "Paged product summaries in a generic envelope")
    public Page&lt;ProductSummary&gt; page() {
        List&lt;ProductSummary&gt; items = Product.&lt;Product&gt;listAll().stream()
                .map(ProductMapper::toSummary)
                .toList();
        return new Page&lt;&gt;(items, items.size());
    }
</code></code></pre><p>With <code>Page</code> defined as:</p><pre><code><code>package com.catalogapi.json;

import java.util.List;

public record Page&lt;T&gt;(List&lt;T&gt; items, int total) {
}
</code></code></pre><p>Checkpoint:</p><pre><code><code>curl -s http://localhost:8080/products/summaries | jq '.[0].price'
</code></code></pre><p>Expected:</p><pre><code><code>{
  "currency": "USD",
  "amountMinor": 12999,
  "display": "USD 129.99"
}
</code></code></pre><h3><strong>Polymorphic catalog payloads</strong></h3><pre><code><code>package com.catalogapi.json;

import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
@JsonSubTypes({
        @JsonSubTypes.Type(value = ProductView.class, name = "product"),
        @JsonSubTypes.Type(value = BundleView.class, name = "bundle")
})
public sealed interface CatalogPayload permits ProductView, BundleView {
}
</code></code></pre><pre><code><code>package com.catalogapi.json;

public record ProductView(long id, String sku, Money price) implements CatalogPayload {
}
</code></code></pre><pre><code><code>package com.catalogapi.json;

import java.util.List;

public record BundleView(String name, List&lt;String&gt; skuList, Money totalPrice) implements CatalogPayload {
}
</code></code></pre><pre><code><code>package com.catalogapi;

import java.util.List;

import org.eclipse.microprofile.openapi.annotations.Operation;
import org.eclipse.microprofile.openapi.annotations.tags.Tag;

import com.catalogapi.json.BundleView;
import com.catalogapi.json.CatalogPayload;
import com.catalogapi.json.ProductView;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/catalog/payloads")
@Produces(MediaType.APPLICATION_JSON)
@Tag(name = "Catalog payloads")
public class CatalogPayloadResource {

    @GET
    @Path("/demo")
    @Operation(summary = "Polymorphic catalog payloads for contract testing")
    public List&lt;CatalogPayload&gt; demo() {
        Product keyboard = Product.find("sku", "SKU-001").firstResult();
        Product hub = Product.find("sku", "SKU-002").firstResult();
        Product bundle = Product.find("sku", "SKU-004").firstResult();

        return List.of(
                new ProductView(keyboard.id, keyboard.sku, ProductMapper.toMoney(keyboard)),
                new ProductView(hub.id, hub.sku, ProductMapper.toMoney(hub)),
                new BundleView(
                        bundle.name,
                        List.of("SKU-001", "SKU-002"),
                        ProductMapper.toMoney(bundle)));
    }
}
</code></code></pre><p>Add <code>ProductMapper.toMoney</code> alongside <code>toSummary</code>.</p><p>Checkpoint:</p><pre><code><code>curl -s http://localhost:8080/catalog/payloads/demo | jq '.[2]'
</code></code></pre><p>You should see <code>"type": "bundle"</code> and a <code>totalPrice.display</code> field.</p><h2><strong>Lock the baseline JSON contract</strong></h2><p>Before enabling reflection-free serializers, capture what &#8220;correct&#8221; looks like in tests. REST Assured plus Hamcrest json-path is boring in the best way: it keeps regressions out of eyeball diffs.</p><p>Shared assertions live in <code>JsonContractAssertions</code>:</p><pre><code><code>package com.catalogapi;

import static io.restassured.RestAssured.given;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.hasItem;
import static org.hamcrest.Matchers.hasSize;
import static org.hamcrest.Matchers.is;

final class JsonContractAssertions {

    private JsonContractAssertions() {
    }

    static void assertSummariesContract() {
        given().when()
                .get("/products/summaries")
                .then()
                .statusCode(200)
                .body("$", hasSize(4))
                .body("sku", hasItem("SKU-001"))
                .body("find { it.sku == 'SKU-001' }.price.currency", equalTo("USD"))
                .body("find { it.sku == 'SKU-001' }.price.amountMinor", equalTo(12999))
                .body("find { it.sku == 'SKU-001' }.price.display", equalTo("USD 129.99"));
    }

    static void assertPageContract() {
        given().when()
                .get("/products/page")
                .then()
                .statusCode(200)
                .body("total", equalTo(4))
                .body("items", hasSize(4))
                .body("items[0].id", is(1))
                .body("items[0].price.display", equalTo("USD 129.99"));
    }

    static void assertPolymorphicContract() {
        given().when()
                .get("/catalog/payloads/demo")
                .then()
                .statusCode(200)
                .body("$", hasSize(3))
                .body("[0].type", equalTo("product"))
                .body("[0].sku", equalTo("SKU-001"))
                .body("[2].type", equalTo("bundle"))
                .body("[2].name", equalTo("Desk Bundle"))
                .body("[2].skuList", hasSize(2))
                .body("[2].totalPrice.display", equalTo("USD 529.98"));
    }
}
</code></code></pre><p>Baseline tests:</p><pre><code><code>package com.catalogapi;

import org.junit.jupiter.api.Test;

import io.quarkus.test.junit.QuarkusTest;

@QuarkusTest
class JsonContractBaselineTest {

    @Test
    void summariesMatchBaselineContract() {
        JsonContractAssertions.assertSummariesContract();
    }

    @Test
    void pageMatchesBaselineContract() {
        JsonContractAssertions.assertPageContract();
    }

    @Test
    void polymorphicPayloadMatchesBaselineContract() {
        JsonContractAssertions.assertPolymorphicContract();
    }
}
</code></code></pre><p>Run:</p><pre><code><code>./mvnw test
</code></code></pre><p>All tests should pass on the default profile.</p><h2><strong>Enable reflection-free serializers</strong></h2><p>Add a dedicated profile in <code>application.properties</code>:</p><pre><code><code>%reflection-free.quarkus.rest.jackson.optimization.enable-reflection-free-serializers=true
</code></code></pre><p>Run dev mode with the profile:</p><pre><code><code>./mvnw quarkus:dev -Dquarkus.profile=reflection-free
</code></code></pre><p>Re-run the same <code>curl</code> checks. On CatalogAPI the payloads matched baseline in my runs, but your service may not be that polite. That is why the tests exist.</p><h3><strong>Run the same tests under reflection-free</strong></h3><p><code>ReflectionFreeProfile</code> activates the profile in tests:</p><pre><code><code>package com.catalogapi;

import java.util.Map;

import io.quarkus.test.junit.QuarkusTestProfile;

public class ReflectionFreeProfile implements QuarkusTestProfile {

    @Override
    public Map&lt;String, String&gt; getConfigOverrides() {
        return Map.of(
                "quarkus.rest.jackson.optimization.enable-reflection-free-serializers", "true");
    }
}
</code></code></pre><p>Keep the active profile as <code>test</code> (the default for <code>@QuarkusTest</code>). Only override the serializer flag. If you replace the whole profile with <code>reflection-free</code>, it is easy to drop the <code>%test</code> datasource settings and start debugging the wrong failure.</p><pre><code><code>package com.catalogapi;

import org.junit.jupiter.api.Test;

import io.quarkus.test.junit.QuarkusTest;
import io.quarkus.test.junit.TestProfile;

@QuarkusTest
@TestProfile(ReflectionFreeProfile.class)
class JsonContractReflectionFreeTest {

    @Test
    void summariesMatchReflectionFreeContract() {
        JsonContractAssertions.assertSummariesContract();
    }

    @Test
    void pageMatchesReflectionFreeContract() {
        JsonContractAssertions.assertPageContract();
    }

    @Test
    void polymorphicPayloadMatchesReflectionFreeContract() {
        JsonContractAssertions.assertPolymorphicContract();
    }
}
</code></code></pre><p>Run <code>./mvnw test</code> again. If anything fails here but passed in baseline, you have a migration bug &#8212; not a &#8220;maybe&#8221; problem.</p><h2><strong>What the build actually generates (and what it skips)</strong></h2><p>Watch the build log with reflection-free enabled. For CatalogAPI you will see generated serializers for DTOs like <code>ProductSummary</code> and <code>Page</code>, while <code>Product</code><strong> itself is skipped</strong> because JPA&#8217;s <code>@Column</code> is not supported by the generator yet:</p><pre><code><code>Skipping generation of reflection-free Jackson serializer for class com.catalogapi.Product
because it contains the unsupported Jackson annotation jakarta.persistence.Column
</code></code></pre><p>That detail matters in production: <strong>returning Panache entities directly from REST still goes through reflection-based Jackson</strong> for this entity. Migration work should focus on DTOs you control, or you should accept mixed mode until coverage catches up. The <a href="https://quarkus.io/guides/rest">REST guide</a> documents the optimization flag; the metaprogramming post explains the Gizmo-generated <code>StdSerializer</code> classes registered on the shared <code>ObjectMapper</code>.</p><p>Custom serializers registered through <code>ObjectMapperCustomizer</code> remain part of the contract &#8212; our <code>MoneySerializer</code> is exactly the kind of module you should re-test after flipping the flag.</p><h2><strong>Benchmarks without fooling yourself</strong></h2><p>The module includes <code>scripts/compare-json-serialization.sh</code>. It packages a JVM runner with an in-memory <code>benchmark</code> profile (H2, seeded data), so you do not need PostgreSQL running for measurements:</p><pre><code><code>%benchmark.quarkus.datasource.db-kind=h2
%benchmark.quarkus.datasource.jdbc.url=jdbc:h2:mem:catalog;DB_CLOSE_DELAY=-1;MODE=PostgreSQL
%benchmark.quarkus.hibernate-orm.schema-management.strategy=drop-and-create
%benchmark.quarkus.hibernate-orm.sql-load-script=import.sql
</code></code></pre><p>Run baseline and reflection-free back to back:</p><pre><code><code>./scripts/compare-json-serialization.sh
./scripts/compare-json-serialization.sh reflection-free
</code></code></pre><p>On my machine (Apple Silicon laptop, local loopback) a recent run looked like:</p><ul><li><p><strong>Baseline</strong> &#8212; cold ready in ~1250 ms; Quarkus log reported <code>started in 0.855s</code> after the process was already warming.</p></li><li><p><strong>Reflection-free</strong> &#8212; cold ready in a similar band; Quarkus log <code>started in 0.876s</code>.</p></li></ul><p>Throughput on <code>/products/summaries</code> with <code>hey</code> was effectively identical between profiles for this small payload. That is still a useful result: <strong>do not expect fireworks on every endpoint</strong>. The gains concentrate on serialization-heavy paths and native-image reachability, not on a four-row catalog listing.</p><p>Treat any numbers as <strong>relative signals</strong>, not scripture. Match JDK, CPU power profile, dataset size, and warmup when you compare before and after in your own environment.</p><h3><strong>Native image proof</strong></h3><p>If native image is part of your deployment story, include it in the migration proof:</p><pre><code><code>./mvnw package -Dnative -Dquarkus.native.container-build=true -Dquarkus.profile=benchmark,reflection-free
</code></code></pre><p>Compare runner size under <code>target/*-runner</code> and cold-start time the same way you would for any native rollout. Reflection-free serializers reduce Jackson&#8217;s reflection footprint; they do not replace native configuration for libraries you still register reflectively.</p><h2><strong>Production migration checklist</strong></h2><p>When you roll this out on a real service, I would keep the checklist short:</p><ol><li><p><strong>Enable in staging first</strong> with <code>quarkus.rest.jackson.optimization.enable-reflection-free-serializers=true</code> on a canary profile.</p></li><li><p><strong>Run JSON contract tests</strong> on every response type you care about &#8212; records, generics, polymorphism, custom serializers, views.</p></li><li><p><strong>Watch build logs</strong> for &#8220;Skipping generation&#8221; lines; map those classes to DTOs or accept reflection fallback.</p></li><li><p><strong>Keep CI native builds</strong> if you ship native images &#8212; serializer changes show up in reachability and startup, not only in unit tests.</p></li><li><p><strong>Know the escape hatch</strong> &#8212; set the property back to <code>false</code> if you hit an unsupported Jackson feature. Fighting with <code>@RegisterForReflection</code> on DTOs is usually the wrong lever.</p></li><li><p><strong>Track Quarkus 3.36</strong> &#8212; default enablement is coming; tests you write now are the safety net when your platform BOM moves.</p></li></ol><h2><strong>Closing</strong></h2><p>Reflection-free Jackson serializers are Quarkus doing what it usually does best: push work to build time, trim runtime reflection, and make native image less annoying. They are not a silent drop-in for every <code>@Column</code>-annotated entity or every exotic Jackson module.</p><p>CatalogAPI shows a practical migration path: baseline behavior, explicit profile, shared contract tests, and benchmarks that stay honest about scope. Run the tests, read the build log, measure the endpoints that actually matter in your service, and keep the config escape hatch one property away.</p><p>Source for the full sample lives in the <code>catalogapi-reflection-free-jackson</code><a href="https://github.com/myfear/the-main-thread/tree/main/catalogapi-reflection-free-jackson"> repository</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Building a Quarkus Testing Strategy That Reflects Real Production Risk]]></title><description><![CDATA[A curated path through QuarkusTest, QuarkusIntegrationTest, JUnit 6, contract tests, browser tests, and coverage so Java teams can match test strategy to real failure modes.]]></description><link>https://www.the-main-thread.com/p/quarkus-testing-reading-path</link><guid isPermaLink="false">https://www.the-main-thread.com/p/quarkus-testing-reading-path</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Fri, 22 May 2026 06:08:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/846083c5-44aa-4e60-ad7d-62ece7e98515_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Green builds are generous. They will tell you everything is fine right up to the moment your packaged app fails in CI, your browser flow breaks on a real page, or your mock quietly protected you from the one problem you actually needed to see.</p><p>That is part of why I like Quarkus testing. The framework makes it pleasantly easy to get started, which means you can move from zero tests to useful feedback quickly. The trap is that teams often stop at the first layer that feels respectable. A couple of <code>@QuarkusTest</code> classes, some REST assertions, maybe a coverage report, and now everyone feels responsible. Meanwhile the real boundaries of the system are still mostly untested.</p><p>This page is the map I wish more teams started with. It pulls the Quarkus testing articles on The Main Thread into one path, so you can build a test strategy that matches how the application actually fails.</p><h2><strong>The first mistake is usually about boundaries</strong></h2><p>Most testing problems are not tool problems at first. They are boundary problems. Teams mix up app-level tests, packaged-artifact tests, contract tests, browser tests, and architectural checks, then wonder why the suite feels both slow and incomplete.</p><p>If you fix the boundaries, tool choice gets easier. If you skip that step, every new tool just gives you another way to be vaguely confident.</p><p>These are the pieces I would start with:</p><ul><li><p><a href="https://www.the-main-thread.com/p/quarkus-testing-quarkustest-vs-quarkusintegrationtest">QuarkusTest vs QuarkusIntegrationTest</a> explains where the line really is between running inside the Quarkus test harness and testing the built artifact you are about to ship</p></li><li><p><a href="https://www.the-main-thread.com/p/junit6-quarkus-modern-java-testing">JUnit 6 for modern Quarkus testing</a> is the right place to get current with the test stack itself instead of carrying old assumptions forward</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-testing-2026-java-component-tests">Quarkus testing in 2026 with component tests</a> is where the conversation becomes more honest, because component tests tend to match the seams where production systems actually wobble</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-dev-services-continuous-testing">Quarkus Dev Services and continuous testing</a> is worth reading early if your inner loop still feels slower than it should</p></li></ul><p>That group gives you the vocabulary. Once you know what kind of test you are trying to write, you stop asking one test style to do four jobs badly.</p><h2><strong>HTTP tests are easy to write and easy to oversell</strong></h2><p>A lot of Quarkus services expose REST endpoints, so HTTP testing is usually where teams spend their first real effort. That is fine. It is also where optimism becomes a coding standard if nobody is careful.</p><p>A passing endpoint test proves one narrow thing: the service answered the request you wrote. It does not automatically prove your schema is stable, your downstream contracts behave, or your packaged application still works outside the warm comfort of the dev test harness.</p><p>For that layer, I would use this set:</p><ul><li><p><a href="https://www.the-main-thread.com/p/quarkus-api-testing-restassured-pact-jqwik">API testing with RestAssured, Pact, and jqwik</a> is the broadest piece in the cluster, because it shows how request-level checks, contract thinking, and property-style testing fit together</p></li><li><p><a href="https://www.the-main-thread.com/p/mock-external-apis-quarkus-wiremock-java">Mock external APIs with Quarkus and WireMock</a> matters when your service depends on systems you do not control and you still want deterministic failures</p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-rest-api-testing-vs-spring-mockmvc">Quarkus REST API testing vs Spring MockMvc</a> is useful if your team is migrating habits from Spring and keeps reaching for the wrong mental model</p></li></ul><p>My preference here is simple: treat endpoint tests as contract checks for your application surface, not as a magic umbrella for everything behind it. Once people stop pretending a neat <code>200 OK</code> assertion proves the whole service is healthy, the rest of the testing stack starts to make more sense.</p><h2><strong>Some problems only show up when the app behaves like a real app</strong></h2><p>There is a class of failure that unit tests and narrow HTTP checks will never show you clearly enough. Browser behavior, packaging mistakes, wiring drift, architectural erosion, and awkward user flows all live here.</p><p>That does not mean every project needs a giant end-to-end circus. It means you should add the smallest wider-angle tests that catch the kinds of mistakes your team actually makes.</p><p>This is the part of the cluster I would reach for next:</p><ul><li><p><a href="https://www.the-main-thread.com/p/quarkus-playwright-end-to-end-browser-testing-java">Playwright end-to-end browser testing with Quarkus</a> covers the UI path when a browser is part of the system, which means the browser should stop being treated as a rumor</p></li><li><p><a href="https://www.the-main-thread.com/p/architecture-testing-java-quarkus-taikai">Architecture testing with Quarkus and Taikai</a> helps when the risk is not a broken endpoint but a codebase slowly turning into a junk drawer</p></li><li><p><a href="https://www.the-main-thread.com/p/mutation-testing-quarkus-java-tutorial">Mutation testing for Quarkus</a> is useful when assertions are technically present but morally absent</p></li></ul><p>I would not start with mutation testing on day one. I would start with cleaner boundaries and a better mix of test types, then use mutation testing where the suite is already stable enough to deserve that level of scrutiny. Otherwise you are paying for a very sophisticated way to confirm the basics were still fuzzy.</p><h2><strong>Coverage is useful, but it is a trailing indicator</strong></h2><p>Coverage numbers are fine. I am not anti-coverage. I am anti-pretending a coverage percentage answers a design question.</p><p>Teams often reach for coverage because it is legible. It produces a number, the number is larger after more work, and dashboards love that sort of thing. The drawback is that coverage does not tell you whether the important paths were checked under meaningful conditions. It tells you what was executed.</p><p>That is why <a href="https://www.the-main-thread.com/p/quarkus-jacoco-test-coverage">JaCoCo test coverage in Quarkus</a> belongs later in the reading path, not first. Read it after you are already thinking clearly about test boundaries, contracts, and component seams. Then the coverage report becomes a guide for missing cases instead of a decorative moral certificate.</p><h2><strong>The reading order I would use</strong></h2><p>If I were building a Quarkus testing strategy from scratch or cleaning up an inherited one, I would read these in this order:</p><ol><li><p><a href="https://www.the-main-thread.com/p/quarkus-testing-quarkustest-vs-quarkusintegrationtest">QuarkusTest vs QuarkusIntegrationTest</a></p></li><li><p><a href="https://www.the-main-thread.com/p/junit6-quarkus-modern-java-testing">JUnit 6 for modern Quarkus testing</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-testing-2026-java-component-tests">Quarkus testing in 2026 with component tests</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-api-testing-restassured-pact-jqwik">API testing with RestAssured, Pact, and jqwik</a></p></li><li><p><a href="https://www.the-main-thread.com/p/mock-external-apis-quarkus-wiremock-java">Mock external APIs with Quarkus and WireMock</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-playwright-end-to-end-browser-testing-java">Playwright end-to-end browser testing with Quarkus</a></p></li><li><p><a href="https://www.the-main-thread.com/p/architecture-testing-java-quarkus-taikai">Architecture testing with Quarkus and Taikai</a></p></li><li><p><a href="https://www.the-main-thread.com/p/quarkus-jacoco-test-coverage">JaCoCo test coverage in Quarkus</a></p></li></ol><p>If your current pain is inner-loop speed, move <a href="https://www.the-main-thread.com/p/quarkus-dev-services-continuous-testing">Quarkus Dev Services and continuous testing</a> up earlier. If your pain is browser behavior, move Playwright earlier. The point is not strict obedience to a list. The point is to stop reading isolated testing articles as if they are interchangeable.</p><h2><strong>What this cluster is really trying to prevent</strong></h2><p>The big failure mode is not &#8220;too few tests.&#8221; The big failure mode is a suite that looks busy while hiding the actual risk profile of the service.</p><p>I want Quarkus teams to get to a calmer place than that. Know which tests prove the code inside the harness. Know which ones prove the packaged app. Know which ones protect contracts with other systems. Know which ones give you browser confidence, structural discipline, and useful coverage data. After that, the green build means a lot more than &#8220;the computer was polite today.&#8221;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Teach a Local Model an Agent Command With LoRA]]></title><description><![CDATA[A hands-on Quarkus tutorial using MLX LM on a Mac to train a small adapter, compare base and adapted behavior, and make a local model honor a private agent contract.]]></description><link>https://www.the-main-thread.com/p/lora-agent-command</link><guid isPermaLink="false">https://www.the-main-thread.com/p/lora-agent-command</guid><dc:creator><![CDATA[Markus Eisele]]></dc:creator><pubDate>Thu, 21 May 2026 06:11:10 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/78c29a66-960c-4ce5-8b47-3aa547b4a353_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When developers see agents in action, the discussion usually lands on harnesses or raw model capability. If the result is weak, the next suggestion is predictable: use a bigger model, add a smarter planner, bolt on more tools, or find a fancier framework that promises to keep the whole thing under control.</p><p>Some of that helps. Some of it is also a distraction. Agent systems do not fail only because the model is too small or the harness is too simple. They also fail because the model is unreliable at the tiny private contracts that make cooperative behavior possible: route labels, planner tags, tool payloads, hand-off markers, or one odd internal command that means &#8220;switch into protocol mode now.&#8221;</p><p>That is the part people tend to miss. Effective agent teams are not just a story about capabilities. They are a story about capabilities plus adapters. Tools give the model reach. Adapters can give it a repeatable behavior inside one narrow lane, which is often the difference between &#8220;interesting demo&#8221; and &#8220;system I can actually wire into code.&#8221;</p><p>This tutorial uses a deliberately small example to make that visible. We are going to teach a local model one agent-specific trick and then make Quarkus prove whether it learned the trick or not.</p><p>Here is the prompt that should make an agent developer slightly nervous:</p><pre><code><code>devcard Dev Services with PostgreSQL</code></code></pre><p>A human can guess what that means. A small local model usually cannot. It may ignore <code>devcard</code>, turn the whole thing into a normal explanation, or produce JSON that looks close enough to fool you right up to the moment your parser throws an exception.</p><p>That is a real agent problem. Prompting helps, but small models are not consistently obedient just because we put &#8220;return JSON only&#8221; in uppercase.</p><p>This is where LoRA becomes interesting in a way that is easy to show and honest to explain. We are not going to teach the model all of Quarkus. We are going to teach it one repeatable habit:</p><ul><li><p>when the prompt contains <code>devcard</code>, emit a strict JSON object</p></li><li><p>when the prompt does not contain <code>devcard</code>, answer like a normal assistant</p></li></ul><p>That sounds small because it is small. That is also why it makes a good tutorial. The adapter effect is visible, the Quarkus app stays understandable, and the agentic point is hard to miss.</p><p>By the end, we will have a local Quarkus app that compares the base model with the adapted one side by side, validates the output, and renders a deterministic developer card only when the model follows the contract.</p><h2><strong>What We Build</strong></h2><p>The flow is intentionally boring:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gpOL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gpOL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 424w, https://substackcdn.com/image/fetch/$s_!gpOL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 848w, https://substackcdn.com/image/fetch/$s_!gpOL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 1272w, https://substackcdn.com/image/fetch/$s_!gpOL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gpOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png" width="458" height="478.13186813186815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1520,&quot;width&quot;:1456,&quot;resizeWidth&quot;:458,&quot;bytes&quot;:215518,&quot;alt&quot;:&quot;Example flow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.the-main-thread.com/i/197689291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Example flow" title="Example flow" srcset="https://substackcdn.com/image/fetch/$s_!gpOL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 424w, https://substackcdn.com/image/fetch/$s_!gpOL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 848w, https://substackcdn.com/image/fetch/$s_!gpOL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 1272w, https://substackcdn.com/image/fetch/$s_!gpOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb862ab46-7142-4c64-bcee-c8799c25c2d1_2213x2310.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The contract looks like this:</p><pre><code><code>{
  "command": "devcard",
  "topic": "dev-services",
  "technologies": ["postgresql"],
  "includeExample": true,
  "includeWarning": true
}</code></code></pre><p>The model is only responsible for producing that object. The Quarkus app does the rest. That split matters because it is how real agent systems stay sane: let the model classify or structure the request, then let ordinary code validate, route, and render.</p><p>One more detail is worth saying out loud before we start. The MLX LM docs note that if you train against a quantized model, the command uses QLoRA under the covers. We are going to use the default 4-bit MLX model from the README, so the mechanics here are technically QLoRA. I am still using &#8220;LoRA&#8221; in the article title and the everyday explanation because that is the umbrella term most people already know.</p><h2><strong>What You Need</strong></h2><p>You need one Mac and a little patience for the first model download. I have not uploaded the sources to my Github because it&#8217;s easy to follow and I really want you to play with the LoRA approach and not the Quarkus features here.</p><ul><li><p>Apple Silicon Mac</p></li><li><p>Java 21 installed</p></li><li><p>Python 3 installed</p></li><li><p>Quarkus CLI installed</p></li><li><p>Enough disk space for one local MLX model and adapter artifacts</p></li><li><p>Internet access for the first model download from Hugging Face</p></li><li><p>Two free local ports: <code>8080</code> for MLX LM and <code>8081</code> for Quarkus</p></li><li><p>Some more &#9749;&#65039;&#9749;&#65039; this time. Mostly because there is way more Python in here than I normally accept.</p></li></ul><p>I am keeping the Java side on plain Quarkus REST plus the REST client. We could hide the model call behind LangChain4j, but I would not do that for the first pass. The interesting part here is the adapter, not another layer of abstraction. BUT make sure to not do this in you production apps. This is a concept implementation to help you understand all of this better!</p><h2><strong>Create the Base Project</strong></h2><p>Let&#8217;s create a workspace for the app and the training artifacts:</p><pre><code><code>mkdir devcard-lora-demo
cd devcard-lora-demo</code></code></pre><p>Create the Quarkus app :</p><pre><code><code>quarkus create app dev.mthread:devcard-agent \
  --extension='rest-jackson,rest-client-jackson'</code></code></pre><p>The two extensions are enough for this walkthrough:</p><ul><li><p><code>rest-jackson</code> gives us a small JSON API surface</p></li><li><p><code>rest-client-jackson</code> lets Quarkus call the local MLX LM HTTP server</p></li></ul><pre><code><code>mkdir -p trainer/data
mkdir -p trainer/adapters

python3 -m venv .venv
source .venv/bin/activate
pip install -U "mlx-lm[train]"</code></code></pre><p>You should end up with this shape:</p><pre><code><code>devcard-lora-demo/
&#9500;&#9472;&#9472; .venv/
&#9500;&#9472;&#9472; devcard-agent/
&#9492;&#9472;&#9472; trainer/
    &#9500;&#9472;&#9472; adapters/
    &#9492;&#9472;&#9472; data/</code></code></pre><p>The model runtime and the Java app live side by side, but they stay separate. That keeps the demo easy to reason about. The training artifacts belong to <code>trainer/</code>. The Quarkus app only knows that there is a local model endpoint and an adapter path string it can send in a request.</p><h2><strong>Teach the Model One New Habit</strong></h2><p>The <code>devcard</code> word is not magic. It is just a token pattern we decide to make meaningful. The training data teaches two behaviors at the same time:</p><ol><li><p>When the prompt includes <code>devcard</code>, return JSON only</p></li><li><p>When the prompt does not include <code>devcard</code>, answer normally</p></li></ol><p>That second half matters more than people think. If you only train the command path, the model tends to overlearn the trick and starts spitting JSON at prompts that were supposed to stay conversational.</p><p>Create <code>trainer/build_dataset.py</code>:</p><pre><code><code>from __future__ import annotations

import json
import random
from pathlib import Path

ROOT = Path(__file__).resolve().parent
DATA_DIR = ROOT / "data"
RANDOM = random.Random(7)

CARD_DEFINITIONS = [
    {
        "label": "Dev Services with PostgreSQL",
        "topic": "dev-services",
        "technologies": ["postgresql"],
        "normal": (
            "Quarkus Dev Services starts required infrastructure automatically during "
            "development and tests. If your application needs PostgreSQL and you did "
            "not configure a JDBC URL, Quarkus can start a container for you so you "
            "can stay focused on the application.\n\n"
            "Example:\n\n"
            "```properties\n"
            "quarkus.datasource.db-kind=postgresql\n"
            "```\n\n"
            "Practical warning: this is a local convenience. Production still needs "
            "an explicit datasource configuration."
        ),
    },
    {
        "label": "Dev Services with Kafka",
        "topic": "dev-services",
        "technologies": ["kafka"],
        "normal": (
            "Dev Services can also start Kafka automatically during local development. "
            "That is useful when you want messaging in `quarkus:dev` without managing "
            "a broker yourself.\n\n"
            "Example:\n\n"
            "```properties\n"
            "mp.messaging.outgoing.orders.connector=smallrye-kafka\n"
            "```\n\n"
            "Practical warning: startup feels simple until the container runtime is "
            "missing or blocked, so keep that dependency visible in the sample."
        ),
    },
    {
        "label": "REST Client with Jackson",
        "topic": "rest-client",
        "technologies": ["jackson"],
        "normal": (
            "Quarkus REST Client with Jackson gives you a typed Java interface for HTTP "
            "calls and handles JSON serialization for request and response payloads.\n\n"
            "Example:\n\n"
            "```java\n"
            "@Path(\"/extensions\")\n"
            "@RegisterRestClient(configKey = \"extensions-api\")\n"
            "public interface ExtensionsClient {\n"
            "    @GET\n"
            "    Set&lt;Extension&gt; list();\n"
            "}\n"
            "```\n\n"
            "Practical warning: treat remote calls like remote calls. A clean Java "
            "interface does not remove timeout, retry, and failure concerns."
        ),
    },
    {
        "label": "Panache entity basics",
        "topic": "panache",
        "technologies": ["hibernate-orm"],
        "normal": (
            "Panache removes a lot of Hibernate ORM boilerplate in Quarkus by giving "
            "you a more direct entity or repository model.\n\n"
            "Example:\n\n"
            "```java\n"
            "@Entity\n"
            "public class Book extends PanacheEntity {\n"
            "    public String title;\n"
            "}\n"
            "```\n\n"
            "Practical warning: Panache makes persistence code shorter, not free. Keep "
            "business logic out of entities unless you really want that coupling."
        ),
    },
    {
        "label": "Typed config with Config Mapping",
        "topic": "config-mapping",
        "technologies": ["smallrye-config"],
        "normal": (
            "Quarkus `@ConfigMapping` turns related configuration keys into a typed Java "
            "interface instead of a stringly-typed scavenger hunt.\n\n"
            "Example:\n\n"
            "```java\n"
            "@ConfigMapping(prefix = \"shipping\")\n"
            "public interface ShippingConfig {\n"
            "    URI endpoint();\n"
            "    Duration timeout();\n"
            "}\n"
            "```\n\n"
            "Practical warning: typed config is only safer if the property names and "
            "scopes stay boring and consistent."
        ),
    },
    {
        "label": "Continuous testing in dev mode",
        "topic": "continuous-testing",
        "technologies": ["junit"],
        "normal": (
            "Continuous testing reruns relevant tests while you stay in `quarkus:dev`, "
            "which tightens the feedback loop without another manual test command.\n\n"
            "Example:\n\n"
            "```bash\n"
            "./mvnw quarkus:dev\n"
            "```\n\n"
            "Practical warning: quick feedback only helps when the tests say something "
            "useful. Bad tests just fail faster."
        ),
    },
]

COMMAND_TEMPLATES = [
    "devcard {label}",
    "devcard: {label}",
    "please run devcard for {label}",
    "use devcard for {label}",
]

NORMAL_TEMPLATES = [
    "Explain {label} in Quarkus.",
    "Give me a short explanation of {label}.",
    "How does {label} work in Quarkus?",
]


def json_contract(definition: dict[str, object]) -&gt; dict[str, object]:
    return {
        "command": "devcard",
        "topic": definition["topic"],
        "technologies": definition["technologies"],
        "includeExample": True,
        "includeWarning": True,
    }


def command_rows() -&gt; list[dict[str, object]]:
    rows = []
    for definition in CARD_DEFINITIONS:
        expected = json_contract(definition)
        answer = json.dumps(expected, separators=(",", ":"))
        for template in COMMAND_TEMPLATES:
            rows.append(
                {
                    "kind": "command",
                    "expected": expected,
                    "messages": [
                        {
                            "role": "user",
                            "content": template.format(label=definition["label"]),
                        },
                        {
                            "role": "assistant",
                            "content": answer,
                        },
                    ],
                }
            )
    return rows


def normal_rows() -&gt; list[dict[str, object]]:
    rows = []
    for definition in CARD_DEFINITIONS:
        for template in NORMAL_TEMPLATES:
            rows.append(
                {
                    "kind": "normal",
                    "messages": [
                        {
                            "role": "user",
                            "content": template.format(label=definition["label"]),
                        },
                        {
                            "role": "assistant",
                            "content": definition["normal"],
                        },
                    ],
                }
            )
    return rows


def write_jsonl(path: Path, rows: list[dict[str, object]]) -&gt; None:
    with path.open("w", encoding="utf-8") as handle:
        for row in rows:
            handle.write(json.dumps(row, ensure_ascii=False))
            handle.write("\n")


def main() -&gt; None:
    DATA_DIR.mkdir(parents=True, exist_ok=True)

    rows = command_rows() + normal_rows()
    RANDOM.shuffle(rows)

    total = len(rows)
    train_cutoff = int(total * 0.7)
    valid_cutoff = int(total * 0.85)

    train_rows = rows[:train_cutoff]
    valid_rows = rows[train_cutoff:valid_cutoff]
    test_rows = rows[valid_cutoff:]

    write_jsonl(DATA_DIR / "train.jsonl", train_rows)
    write_jsonl(DATA_DIR / "valid.jsonl", valid_rows)
    write_jsonl(DATA_DIR / "test.jsonl", test_rows)

    summary = {
        "train": len(train_rows),
        "valid": len(valid_rows),
        "test": len(test_rows),
        "total": total,
    }

    print(json.dumps(summary, indent=2))


if __name__ == "__main__":
    main()</code></code></pre><p>Build the dataset:</p><pre><code><code>python trainer/build_dataset.py</code></code></pre><p>The MLX LM LoRA guide says local datasets need <code>train.jsonl</code> and optionally <code>valid.jsonl</code>, with <code>test.jsonl</code> used for evaluation. It also says unknown keys are ignored by the loader. That is why the script adds <code>kind</code> and <code>expected</code> metadata for our evaluation step without breaking training.</p><p>The output count is intentionally small. That is enough to make the behavior visible on a local Mac. It is not enough to brag about a robust benchmark. If you want stronger results, add more phrasing variation before you add more topics.</p><h2><strong>Train the Adapter</strong></h2><p>The current MLX LM README uses <code>mlx-community/Llama-3.2-3B-Instruct-4bit</code> as the default quick-start model, so I am sticking with that here. It is a safer tutorial choice than inventing a random model pick.</p><p>Train from the project root:</p><pre><code><code>MODEL="mlx-community/Llama-3.2-3B-Instruct-4bit"

mlx_lm.lora \
  --model "$MODEL" \
  --train \
  --data trainer/data \
  --adapter-path trainer/adapters/devcard-lora \
  --mask-prompt \
  --iters 300 \
  --batch-size 1 \
  --learning-rate 1e-5</code></code></pre><p>There are three details worth calling out:</p><ul><li><p><code>--mask-prompt</code> is important for this dataset shape because we only want the loss on the assistant answer, not on the prompt tokens</p></li><li><p>the model path points at a quantized 4-bit model, so per the MLX LM docs this run is effectively QLoRA</p></li><li><p>the adapter is saved separately from the base model, which is exactly what we want for the comparison later</p></li></ul><p>You can inspect the adapter size after training:</p><pre><code><code>du -sh trainer/adapters/devcard-lora</code></code></pre><p>That number is the easiest way to make &#8220;parameter-efficient&#8221; stop sounding like a conference slide. The base model stays where it is. The adapter is the learned delta. In my demo case it has 107M.</p><h2><strong>Measure the Behavior Before You Touch Java</strong></h2><p>The MLX LM docs include a <code>--test</code> mode that calculates perplexity, and that is fine as far as it goes. For this tutorial, I care more about contract obedience than a language-model metric. I want to know how often the model emits valid <code>devcard</code> JSON when it should, and how often it wrongly emits <code>devcard</code> JSON when it should not.</p><p>Create <code>trainer/evaluate.py</code>:</p><pre><code><code>from __future__ import annotations

import argparse
import json
from pathlib import Path
from urllib import request

ROOT = Path(__file__).resolve().parent
TEST_DATA = ROOT / "data" / "test.jsonl"


def extract_message(choice: dict[str, object]) -&gt; str:
    message = choice["message"]

    if isinstance(message, str):
        return message

    if isinstance(message, dict):
        content = message.get("content")

        if isinstance(content, str):
            return content

        if isinstance(content, list):
            texts = []
            for item in content:
                if isinstance(item, dict) and item.get("type") == "text":
                    text = item.get("text")
                    if isinstance(text, str):
                        texts.append(text)

            if texts:
                return "".join(texts)

    raise TypeError(
        "Unsupported message shape in response: "
        + f"{type(message).__name__}"
    )


def call_model(url: str, model: str, prompt: str, adapter: str | None) -&gt; str:
    payload = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt,
            }
        ],
        "temperature": 0.0,
        "max_tokens": 200,
    }

    if adapter:
        payload["adapters"] = adapter

    body = json.dumps(payload).encode("utf-8")
    req = request.Request(
        url,
        data=body,
        headers={"Content-Type": "application/json"},
        method="POST",
    )

    with request.urlopen(req) as response:
        data = json.load(response)

    return extract_message(data["choices"][0])


def load_rows() -&gt; list[dict[str, object]]:
    rows = []
    with TEST_DATA.open(encoding="utf-8") as handle:
        for line in handle:
            rows.append(json.loads(line))
    return rows


def evaluate_command_case(raw: str, expected: dict[str, object]) -&gt; bool:
    try:
        parsed = json.loads(raw)
    except json.JSONDecodeError:
        return False

    return parsed == expected


def evaluate_normal_case(raw: str) -&gt; bool:
    try:
        parsed = json.loads(raw)
    except json.JSONDecodeError:
        return True

    return not (
        isinstance(parsed, dict) and parsed.get("command") == "devcard"
    )


def main() -&gt; None:
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--url",
        default="http://127.0.0.1:8080/v1/chat/completions",
    )
    parser.add_argument(
        "--model",
        default="mlx-community/Llama-3.2-3B-Instruct-4bit",
    )
    parser.add_argument("--adapter")
    args = parser.parse_args()

    rows = load_rows()
    command_total = 0
    command_ok = 0
    normal_total = 0
    normal_ok = 0
    failures = []

    for row in rows:
        prompt = row["messages"][0]["content"]
        raw = call_model(args.url, args.model, prompt, args.adapter)

        if row["kind"] == "command":
            command_total += 1
            ok = evaluate_command_case(raw, row["expected"])
            if ok:
                command_ok += 1
            else:
                failures.append({"kind": "command", "prompt": prompt, "raw": raw})
        else:
            normal_total += 1
            ok = evaluate_normal_case(raw)
            if ok:
                normal_ok += 1
            else:
                failures.append({"kind": "normal", "prompt": prompt, "raw": raw})

    overall_ok = command_ok + normal_ok
    overall_total = command_total + normal_total

    print(
        json.dumps(
            {
                "adapter": args.adapter or "base",
                "command_ok": f"{command_ok}/{command_total}",
                "normal_ok": f"{normal_ok}/{normal_total}",
                "overall_ok": f"{overall_ok}/{overall_total}",
                "failures": failures,
            },
            indent=2,
        )
    )


if __name__ == "__main__":
    main()</code></code></pre><p>If you mostly live in Java, read this script as a tiny integration test harness, not as &#8220;now we switch to a Python article.&#8221;</p><p>The structure is simple:</p><ul><li><p>load_rows() loads the held-out test cases from test.jsonl</p></li><li><p>call_model(...) sends one HTTP request to the MLX LM server, with or without an adapter path</p></li><li><p>evaluate_command_case(...) is the strict assertion for command prompts: the model output must parse as JSON and match the expected object exactly</p></li><li><p>evaluate_normal_case(...) is the negative assertion for ordinary prompts: the model should not suddenly emit a fake devcard command object</p></li><li><p>main() loops over the test cases, counts passes and failures, and prints one JSON summary at the end</p></li></ul><p>If you want the Java mental model, this is closer to a parameterized integration test than to a training script. The test fixture is test.jsonl. The system under test is the model server. The assertions are &#8220;did the command prompt produce the exact contract?&#8221; and &#8220;did the normal prompt stay normal?&#8221;</p><p>Start the base model server from the project root:</p><pre><code><code>mlx_lm.server --model "$MODEL"</code></code></pre><p>The current MLX LM server guide says this starts on <code>localhost:8080</code> by default. It also documents an OpenAI-like <code>/v1/chat/completions</code> endpoint and an <code>adapters</code> request field, which is what lets us compare base and adapted behavior without swapping the whole model server.</p><p>In another terminal, still from the project root, run the evaluation once without an adapter and once with it:</p><pre><code><code>source .venv/bin/activate
python trainer/evaluate.py --model "$MODEL"
python trainer/evaluate.py --model "$MODEL" --adapter trainer/adapters/devcard-lora</code></code></pre><p>The output is a compact scorecard:</p><ul><li><p>adapter tells you whether you ran the base model or the adapted one</p></li><li><p>command_ok is how many command-style prompts produced the exact expected JSON contract</p></li><li><p>normal_ok is how many non-command prompts stayed non-command prompts</p></li><li><p>overall_ok is the combined total</p></li><li><p>failures contains the raw model output for any missed case, which is usually the most interesting part</p></li></ul><p>A base-model run often looks something like this:</p><pre><code><code>{
  "adapter": "base",
  "command_ok": "0/5",
  "normal_ok": "2/2",
  "overall_ok": "2/7",
  "failures": [
    {
      "kind": "command",
      "prompt": "devcard Dev Services with Kafka",
      "raw": "**DevCard: Dev Services with Kafka**\n..."
    }
  ]
}</code></code></pre><p>That result is more or less a random failure. The base model kept treating devcard as ordinary language and invented a meaning for it. In my runs, that usually shows up as a confident prose answer, a hallucinated product name, or a near miss that looks plausible to a human and useless to a parser.</p><p>The adapted run should move in the opposite direction:</p><pre><code><code>{
  "adapter": "trainer/adapters/devcard-lora",
  "command_ok": "5/5",
  "normal_ok": "2/2",
  "overall_ok": "7/7",
  "failures": []
}</code></code></pre><p>This is the behavior change we care about. The adapter did not make the model &#8220;more intelligent&#8221; in some vague general sense. It made the model better at one narrow contract:</p><ul><li><p>when the prompt contains the private command word, emit the house JSON shape</p></li><li><p>when the prompt is ordinary, stay ordinary</p></li></ul><p>What I want to see is not perfection. I want to see direction:</p><ul><li><p>the base run should fail more command cases</p></li><li><p>the adapted run should pass more command cases</p></li><li><p>the adapted run should still behave normally on non-command prompts</p></li></ul><p>If the adapted run starts failing normal prompts, the usual fix is not &#8220;train longer.&#8221; The usual fix is &#8220;add better negative examples.&#8221; If the base run already passes everything, the command was probably too easy or too close to ordinary language to make the adapter effect visible.</p><h2><strong>Build the Quarkus App</strong></h2><p>Now we wire the same comparison into Java.</p><p>Set the Quarkus app to <code>8081</code> so it does not collide with the MLX server, and keep the model settings in typed config because plain strings in random services are exactly how these demos become annoying to maintain.</p><p>Replace <code>devcard-agent/src/main/resources/application.properties</code> with this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;2c2a9ce1-ef88-40e5-9b71-241f782c5834&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">quarkus.http.port=8081

quarkus.rest-client.mlx.url=http://127.0.0.1:8080
quarkus.rest-client.mlx.connect-timeout=2000
quarkus.rest-client.mlx.read-timeout=120000

devcard.model=mlx-community/Llama-3.2-3B-Instruct-4bit
devcard.adapter=trainer/adapters/devcard-lora</code></pre></div><p>Create <code>devcard-agent/src/main/java/dev/mthread/devcard/config/DevcardConfig.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;69936e4e-0713-4d1f-acbd-1eb8eac0a4de&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.mthread.devcard.config;

import io.smallrye.config.ConfigMapping;

@ConfigMapping(prefix = "devcard")
public interface DevcardConfig {

    String model();

    String adapter();
}</code></pre></div><p>Create <code>devcard-agent/src/main/java/dev/mthread/devcard/mlx/MlxChatClient.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;47dd087f-c6a6-4797-9cfe-fca97aa5fa89&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.mthread.devcard.mlx;

import java.util.List;

import org.eclipse.microprofile.rest.client.inject.RegisterRestClient;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.JsonNode;

import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/v1/chat/completions")
@RegisterRestClient(configKey = "mlx")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public interface MlxChatClient {

        @POST
        ChatResponse chat(ChatRequest request);

        record Message(String role, String content) {
        }

        @JsonInclude(JsonInclude.Include.NON_NULL)
        record ChatRequest(
                        String model,
                        List&lt;Message&gt; messages,
                        String adapters,
                        Double temperature,
                        @JsonProperty("max_tokens") Integer maxTokens) {
        }

        @JsonIgnoreProperties(ignoreUnknown = true)
        record Choice(
                        int index,
                        JsonNode message,
                        @JsonProperty("finish_reason") String finishReason) {
        }

        @JsonIgnoreProperties(ignoreUnknown = true)
        record Usage(
                        @JsonProperty("prompt_tokens") int promptTokens,
                        @JsonProperty("completion_tokens") int completionTokens,
                        @JsonProperty("total_tokens") int totalTokens) {
        }

        @JsonIgnoreProperties(ignoreUnknown = true)
        record ChatResponse(
                        String model,
                        List&lt;Choice&gt; choices,
                        Usage usage) {
        }
}</code></pre></div><p>The slightly odd part is the response shape. The MLX server guide documents <code>choices[].message</code> as plain text, not the nested <code>message.content</code> shape some OpenAI-style clients expect. That is why I am using a direct REST client here instead of pretending every compatible API is identical in practice.</p><p>Create <code>devcard-agent/src/main/java/dev/mthread/devcard/DevcardCommand.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;897cdeeb-7a0c-4b3a-8f07-f4de71ef2918&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.mthread.devcard;

import java.util.List;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;

@JsonIgnoreProperties(ignoreUnknown = true)
public record DevcardCommand(
        String command,
        String topic,
        List&lt;String&gt; technologies,
        boolean includeExample,
        boolean includeWarning) {

    public DevcardCommand normalized() {
        return new DevcardCommand(
                command == null ? "" : command.trim(),
                topic == null ? "" : topic.trim(),
                technologies == null ? List.of() : List.copyOf(technologies),
                includeExample,
                includeWarning);
    }

    public void validate() {
        if (!"devcard".equals(command)) {
            throw new IllegalArgumentException("Expected command=devcard");
        }

        if (topic.isBlank()) {
            throw new IllegalArgumentException("Expected a non-empty topic");
        }

        if (!includeExample || !includeWarning) {
            throw new IllegalArgumentException(
                    "Expected includeExample=true and includeWarning=true");
        }
    }
}</code></pre></div><p>Create <code>devcard-agent/src/main/java/dev/mthread/devcard/DevcardService.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;0ab5e76c-8e67-4934-ace1-f32996a35108&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.mthread.devcard;

import java.util.List;

import org.eclipse.microprofile.rest.client.inject.RestClient;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;

import dev.mthread.devcard.config.DevcardConfig;
import dev.mthread.devcard.mlx.MlxChatClient;
import dev.mthread.devcard.mlx.MlxChatClient.ChatRequest;
import dev.mthread.devcard.mlx.MlxChatClient.ChatResponse;
import dev.mthread.devcard.mlx.MlxChatClient.Message;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class DevcardService {

    private final MlxChatClient client;
    private final DevcardConfig config;
    private final ObjectMapper objectMapper;

    public DevcardService(
            @RestClient MlxChatClient client,
            DevcardConfig config,
            ObjectMapper objectMapper) {
        this.client = client;
        this.config = config;
        this.objectMapper = objectMapper;
    }

    public ComparisonResponse compare(String prompt) {
        return new ComparisonResponse(
                prompt,
                run(prompt, null),
                run(prompt, config.adapter()));
    }

    private VariantResult run(String prompt, String adapter) {
        ChatRequest request = new ChatRequest(
                config.model(),
                List.of(new Message("user", prompt)),
                adapter,
                0.0,
                200);

        ChatResponse response = client.chat(request);
        String raw = extractMessage(response);

        try {
            DevcardCommand command = parseCommand(raw);
            return new VariantResult(
                    raw,
                    true,
                    command,
                    render(command),
                    null);
        } catch (IllegalArgumentException exception) {
            return new VariantResult(
                    raw,
                    false,
                    null,
                    null,
                    exception.getMessage());
        }
    }

    private String extractMessage(ChatResponse response) {
        if (response.choices() == null || response.choices().isEmpty()) {
            throw new IllegalStateException("Model response did not contain choices");
        }

        var message = response.choices().get(0).message();
        if (message == null || message.isNull()) {
            throw new IllegalStateException("Model response message was empty");
        }

        if (message.isTextual()) {
            return message.asText();
        }

        if (message.isObject()) {
            var content = message.get("content");

            if (content != null &amp;&amp; content.isTextual()) {
                return content.asText();
            }

            if (content != null &amp;&amp; content.isArray()) {
                StringBuilder builder = new StringBuilder();
                for (var item : content) {
                    if ("text".equals(item.path("type").asText()) &amp;&amp; item.has("text")) {
                        builder.append(item.get("text").asText());
                    }
                }

                if (builder.length() &gt; 0) {
                    return builder.toString();
                }
            }
        }

        throw new IllegalStateException("Model response message was not a supported text shape");
    }

    private DevcardCommand parseCommand(String raw) {
        try {
            DevcardCommand command = objectMapper.readValue(raw, DevcardCommand.class)
                    .normalized();
            command.validate();
            return command;
        } catch (JsonProcessingException exception) {
            throw new IllegalArgumentException(exception.getOriginalMessage(), exception);
        }
    }

    private String render(DevcardCommand command) {
        return switch (command.topic()) {
            case "dev-services" -&gt; renderDevServices(command.technologies());
            case "rest-client" -&gt; renderRestClient();
            case "panache" -&gt; renderPanache();
            case "config-mapping" -&gt; renderConfigMapping();
            case "continuous-testing" -&gt; renderContinuousTesting();
            default -&gt; throw new IllegalArgumentException(
                    "Unsupported topic: " + command.topic());
        };
    }

    private String renderDevServices(List&lt;String&gt; technologies) {
        if (technologies.contains("postgresql")) {
            return """
                    Dev Services starts required infrastructure automatically during local development and tests when Quarkus can infer what you need and you have not already configured it yourself. In the PostgreSQL case, that means you can add the driver, run `quarkus:dev`, and let Quarkus spin up a database container for you instead of wiring a JDBC URL by hand.

                    Example:

                    ```properties
                    quarkus.datasource.db-kind=postgresql
                    ```

                    Practical warning: this is a development convenience, not a deployment plan. It also depends on a working container runtime, which means the demo feels magical right up to the moment Podman or Docker is missing.
                    """;
        }

        if (technologies.contains("kafka")) {
            return """
                    Dev Services can do the same thing for Kafka, which is useful when you want messaging in local development without maintaining a broker by hand. The Java part stays small because the infrastructure bootstrapping moves into the Quarkus extension.

                    Example:

                    ```properties
                    mp.messaging.outgoing.orders.connector=smallrye-kafka
                    ```

                    Practical warning: the convenience is real, but so is the hidden dependency on local containers. Keep that visible in your docs and your onboarding steps.
                    """;
        }

        throw new IllegalArgumentException("Unsupported Dev Services technology");
    }

    private String renderRestClient() {
        return """
                Quarkus REST Client with Jackson gives you a typed Java interface for HTTP calls while Jackson handles JSON serialization. The useful part for an agentic application is not elegance. It is that you can keep the model boundary explicit and still write ordinary Java on your side of the fence.

                Example:

                ```java
                @Path("/v1/chat/completions")
                @RegisterRestClient(configKey = "mlx")
                public interface MlxChatClient {
                    @POST
                    ChatResponse chat(ChatRequest request);
                }
                ```

                Practical warning: a neat Java interface does not make the network local. Keep timeouts, retries, and failure behavior visible in the code.
                """;
    }

    private String renderPanache() {
        return """
                Panache removes a lot of the repetitive Hibernate ORM code that turns simple examples into longer articles than they need to be. For a Java developer, the appeal is not that it is magical. It is that the persistence intent becomes easier to read.

                Example:

                ```java
                @Entity
                public class Book extends PanacheEntity {
                    public String title;
                }
                ```

                Practical warning: Panache makes entity code shorter, but it does not protect you from bad boundaries. Do not turn entities into a storage layer and a business layer at the same time.
                """;
    }

    private String renderConfigMapping() {
        return """
                `@ConfigMapping` gives you typed configuration instead of a scattered collection of property lookups. That fits agentic apps nicely because model settings, endpoint URLs, and timeouts usually travel together and deserve one explicit home.

                Example:

                ```java
                @ConfigMapping(prefix = "devcard")
                public interface DevcardConfig {
                    String model();
                    String adapter();
                }
                ```

                Practical warning: typed config only helps if the property names stay stable and boring. If every sample invents a new prefix, you just moved the mess into an interface.
                """;
    }

    private String renderContinuousTesting() {
        return """
                Continuous testing keeps the feedback loop inside `quarkus:dev`, which is useful when you are changing prompts, parser rules, and renderer code in short cycles. You notice breakage sooner, which is the whole point.

                Example:

                ```bash
                ./mvnw quarkus:dev
                ```

                Practical warning: fast feedback is only useful when the tests say something meaningful about the contract. A flaky parser test is still flaky, just sooner.
                """;
    }

    public record VariantResult(
            String raw,
            boolean parsed,
            DevcardCommand command,
            String renderedCard,
            String error) {
    }

    public record ComparisonResponse(
            String prompt,
            VariantResult base,
            VariantResult adapted) {
    }
}</code></pre></div><p>Create <code>devcard-agent/src/main/java/dev/mthread/devcard/CompareResource.java</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;java&quot;,&quot;nodeId&quot;:&quot;c0272a96-80a2-4cf0-9653-0f5776ad039d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-java">package dev.mthread.devcard;

import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.WebApplicationException;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;

@Path("/api/devcards")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public class CompareResource {

    private final DevcardService service;

    public CompareResource(DevcardService service) {
        this.service = service;
    }

    @POST
    public DevcardService.ComparisonResponse compare(PromptRequest request) {
        if (request == null || request.prompt() == null || request.prompt().isBlank()) {
            throw new WebApplicationException(
                    "prompt is required",
                    Response.Status.BAD_REQUEST);
        }

        return service.compare(request.prompt().trim());
    }

    public record PromptRequest(String prompt) {
    }
}</code></pre></div><p>There is no model magic in the Java layer. That is the point. The model either gives us a valid <code>DevcardCommand</code> or it does not. If it does, normal Java takes over. If it does not, we keep the raw output and the parse error visible instead of pretending everything is fine.</p><h2><strong>Run the Comparison End to End</strong></h2><p>Keep the MLX server running from the project root because the server resolves the <code>adapters</code> path relative to the directory it started in. That detail is easy to miss and wastes a lot of time the first time you move files around.</p><p>In another terminal, start Quarkus:</p><pre><code><code>cd devcard-lora-demo/devcard-agent
./mvnw quarkus:dev</code></code></pre><p>Now hit the comparison endpoint:</p><pre><code><code>curl -s http://127.0.0.1:8081/api/devcards \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"devcard Dev Services with PostgreSQL"}' \
  | python -m json.tool</code></code></pre><p>The exact text will vary, but the shape you want is simple:</p><ul><li><p><code>base.parsed</code> is often <code>false</code></p></li><li><p><code>adapted.parsed</code> is <code>true</code></p></li><li><p><code>adapted.command</code> contains the structured contract</p></li><li><p><code>adapted.renderedCard</code> contains the deterministic explanation produced by Quarkus</p></li></ul><p>A representative response looks like this:</p><pre><code><code>{
  "prompt": "devcard Dev Services with PostgreSQL",
  "base": {
    "raw": "Quarkus Dev Services starts infrastructure automatically during development and tests when your application needs it.",
    "parsed": false,
    "command": null,
    "renderedCard": null,
    "error": "Unrecognized token 'Quarkus'"
  },
  "adapted": {
    "raw": "{\"command\":\"devcard\",\"topic\":\"dev-services\",\"technologies\":[\"postgresql\"],\"includeExample\":true,\"includeWarning\":true}",
    "parsed": true,
    "command": {
      "command": "devcard",
      "topic": "dev-services",
      "technologies": [
        "postgresql"
      ],
      "includeExample": true,
      "includeWarning": true
    },
    "renderedCard": "Dev Services starts required infrastructure automatically during local development and tests when Quarkus can infer what you need and you have not already configured it yourself.",
    "error": null
  }
}</code></code></pre><p>That is the whole teaching moment in one payload. Same base model. Same server. Same prompt. One request adds the adapter path, and the app suddenly has something stable enough to validate and route.</p><p>Try a normal prompt next:</p><pre><code><code>curl -s http://127.0.0.1:8081/api/devcards \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"Explain Dev Services with PostgreSQL in Quarkus."}' \
  | python -m json.tool</code></code></pre><p>For this one, I want both branches to fail strict parsing because the prompt did not ask for protocol mode. That is not a bug. That is proof that the command word is carrying the behavior.</p><h2><strong>Why This Helps Agentic Applications</strong></h2><p>Agentic apps do not just need fluent language. They need durable little agreements.</p><p>Sometimes that agreement is a tool call schema. Sometimes it is a planner label. Sometimes it is a routing payload that three downstream components quietly depend on. The ugly truth is that a model can know plenty about your domain and still be bad at that part. General competence is not the same thing as protocol obedience.</p><p>LoRA helps when the gap is narrow and behavioral:</p><ul><li><p>teach a model a private command word</p></li><li><p>teach it a house JSON schema</p></li><li><p>teach it a small routing vocabulary</p></li><li><p>teach it that one branch must stay terse and deterministic</p></li></ul><p>It is the wrong tool when the gap is factual freshness or broad new knowledge. If you need today&#8217;s docs, customer-specific records, or fast-changing operational state, use retrieval or tools. A small adapter is not a substitute for a real data boundary.</p><p>That is why I like this demo more than &#8220;fine-tune the model to sound like my blog.&#8221; Style transfer is real, but it does not explain the agentic value nearly as clearly. A command contract does.</p><h2><strong>Where This Breaks First</strong></h2><p>This kind of demo fails in predictable ways, which is honestly a good sign.</p><p><strong>The model starts returning JSON too often</strong></p><p>That usually means the positive examples drowned out the negative ones. Add more ordinary prompts that talk about the same Quarkus topics without the command word. The fix is usually dataset balance, not more training steps.</p><p><strong>The model invents a near-miss topic</strong></p><p>You ask for <code>dev-services</code> and get <code>devservice</code> or <code>dev-services-postgres</code>. That is why the Java side validates the contract instead of trusting vibes. Keep the topic set small, explicit, and boring.</p><p><strong>The adapter path works in one shell and breaks in another</strong></p><p>The current MLX LM server guide says the <code>adapters</code> path is resolved relative to the directory where the server started. If you start the server from the wrong place, the model request fails even though the Quarkus app is configured correctly.</p><p><strong>The local demo turns into an accidental production design</strong></p><p>The current MLX LM HTTP server guide explicitly says the built-in server is not recommended for production because it only implements basic security checks. Treat this tutorial as a local development pattern and a learning aid, not a deployment recipe.</p><p><strong>People start asking whether the adapter learned Quarkus</strong></p><p>No. It learned a narrow response pattern around prompts you cared about. That is still useful. Just do not oversell what happened.</p><h2><strong>A Few Useful Extensions</strong></h2><p>Once the base demo works, there are a few directions worth exploring:</p><ul><li><p>add more command words such as <code>toolplan</code> or <code>routecard</code></p></li><li><p>swap the deterministic renderer for real tool routing</p></li><li><p>keep the same Quarkus app and compare multiple adapters against the same base model</p></li><li><p>move the model call behind LangChain4j after the protocol behavior is stable</p></li></ul><p>I would do that in exactly that order. First prove the behavior change. Then add framework niceties.</p><h2><strong>Close the Loop</strong></h2><p>The reason this tutorial works is that it stays honest. We did not train a local model to become a Quarkus expert. We trained it to honor one small contract that a Java application can validate and use. That is a much better story for LoRA in agentic systems because it lines up with how these systems actually fail.</p><p>Prompting alone gets you part of the way there. A tiny adapter can make the model much more predictable inside one narrow lane. Once you have that, ordinary Quarkus code can do the rest of the job, which is exactly where I prefer the complexity to live.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.the-main-thread.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.the-main-thread.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>