Build a Local AI Jury in Quarkus with Ollama

A hands-on guide to parallel model calls, Quarkus Signals, and a judge step that can return UNCERTAIN when the text stays messy.

Jun 25, 2026

Single-model AI demos often treat the first confident answer as if it were a decision. A local model can be fast, cheap, and still wrong in a very polished way. When the text is sarcastic, mixed, or culturally loaded, “the model answered” does not mean “the system knows.”

We build a small Quarkus service that asks twice. granite4:3b and mistral both classify the same text. If they agree, the service returns the consensus. If they disagree, we emit a DisagreementEvent over Quarkus Signals and wake up a third model, qwen3:4b, to adjudicate. The judge can also say UNCERTAIN. That is better than a confident lie.

Read more about Quarkus Signals in my earlier overview.

Quarkus Signals: Build In-Process Messaging Without a Broker

Markus Eisele

Jun 7

Read full story

What we build

VerdictIQ is a small REST app with two endpoints:

POST /verdict accepts { "text": "..." } and returns a verdict ID immediately
GET /verdict/{id} returns the panel state, both model opinions, and the final result

The workflow is small enough to hold in your head:

Store a PENDING verdict
Call granite4:3b and mistral in parallel
If they agree, mark the verdict COMPLETE
If they disagree, publish a DisagreementEvent
Let a Signals receiver call qwen3:4b
Either settle on a final label or admit UNCERTAIN

Everything stays inside one Quarkus process. No broker, no external queue, and no fake “we will harden this later” step. Signals fit here because the whole workflow is local.

What you need

You already know Quarkus REST, CDI, and basic Java concurrency. The new parts here are named AI services and Signals receivers.

JDK 25
Quarkus CLI 3.36.2 or Maven 3.9+
Ollama installed locally
Roughly 6 GB of free disk if you use granite4:3b, mistral, and qwen3:4b
About ☕️☕️☕️☕️ (local models are slow and need a lot of RAM!)

The control flow before code

Before we touch Java, I want the state model to be explicit. Otherwise this becomes another AI demo where “async” means “some stuff happened and then JSON appeared.”

We have three outcomes:

Consensus — the panel agrees, so the result is complete without the judge
Adjudication — the panel disagrees, so the judge picks the final label
Uncertainty — the judge returns UNCERTAIN, which is still a valid completion, not a crash

We also keep one separate failure state:

Failure — model call, parsing, or adjudication blew up, so the verdict becomes FAILED

UNCERTAIN is a model judgment. FAILED is a system problem. If you collapse them into one field, your API lies.

Project setup

Create the project or follow along with the sources in my Github repository:

quarkus create app dev.verdictiq:verdictiq \
  --extension='rest-jackson,quarkus-signals,quarkus-langchain4j-ollama' \
  --platform-version=3.36.2 \
  --java=25 \
  --no-code
cd verdictiq

Extension roles:

rest-jackson — JSON HTTP endpoints
quarkus-signals — in-process async messaging with publish(), send(), and request()
quarkus-langchain4j-ollama — local chat models through Ollama

Pre-pull the models before you start dev mode. This sample expects a local Ollama runtime, and the first request is a bad time to discover you still need three model downloads:

ollama pull granite4:3b
ollama pull mistral
ollama pull qwen3:4b

I started with llama3.2 for the first panel lane. It looked fine in raw Ollama tests, but LangChain4j’s direct ModelVerdict mapping came back with null labels. granite4:3b is more reliable there. Mistral usually works too, but it can still return incomplete fields, so we normalize those responses before the workflow uses them.

Start with the verdict model

The API contract starts with this record. If it is wrong, the rest of the code only makes the wrong answer faster.

Create src/main/java/dev/verdictiq/model/Sentiment.java:

package dev.verdictiq.model;

public enum Sentiment {
    POSITIVE,
    NEGATIVE,
    NEUTRAL,
    UNCERTAIN
}

Create VerdictStatus.java:

package dev.verdictiq.model;

public enum VerdictStatus {
    PENDING,
    COMPLETE,
    FAILED
}

Create ModelVerdict.java:

package dev.verdictiq.model;

import dev.langchain4j.model.output.structured.Description;

public record ModelVerdict(
        @Description("One of POSITIVE, NEGATIVE, NEUTRAL, or UNCERTAIN")
        Sentiment label,
        @Description("One short sentence explaining the classification")
        String reason) {

    public ModelVerdict normalized() {
        Sentiment safeLabel = label != null ? label : Sentiment.UNCERTAIN;
        String safeReason = reason != null && !reason.isBlank() ? reason : "Model returned no reason.";
        if (safeLabel == label && safeReason.equals(reason)) {
            return this;
        }
        return new ModelVerdict(safeLabel, safeReason);
    }
}

Now create PanelVerdict.java:

package dev.verdictiq.model;

public record PanelVerdict(
        String id,
        String text,
        VerdictStatus status,
        Sentiment graniteLabel,
        String graniteReason,
        Sentiment mistralLabel,
        String mistralReason,
        boolean agreement,
        Sentiment finalVerdict,
        String finalReason,
        boolean abstained) {

    public static PanelVerdict pending(String id, String text) {
        return new PanelVerdict(id, text, VerdictStatus.PENDING, null, null, null, null, false, null, null, false);
    }

    public static PanelVerdict consensus(String id, String text, ModelVerdict granite, ModelVerdict mistral) {
        return new PanelVerdict(
                id,
                text,
                VerdictStatus.COMPLETE,
                granite.label(),
                granite.reason(),
                mistral.label(),
                mistral.reason(),
                true,
                granite.label(),
                "Panel consensus",
                false);
    }

    public static PanelVerdict disagreement(String id, String text, ModelVerdict granite, ModelVerdict mistral) {
        return new PanelVerdict(
                id,
                text,
                VerdictStatus.PENDING,
                granite.label(),
                granite.reason(),
                mistral.label(),
                mistral.reason(),
                false,
                null,
                null,
                false);
    }

    public PanelVerdict adjudicated(Sentiment finalVerdict, String finalReason, boolean abstained) {
        return new PanelVerdict(
                id,
                text,
                VerdictStatus.COMPLETE,
                graniteLabel,
                graniteReason,
                mistralLabel,
                mistralReason,
                agreement,
                finalVerdict,
                finalReason,
                abstained);
    }

    public PanelVerdict failed(String finalReason) {
        return new PanelVerdict(
                id,
                text,
                VerdictStatus.FAILED,
                graniteLabel,
                graniteReason,
                mistralLabel,
                mistralReason,
                agreement,
                null,
                finalReason,
                false);
    }
}

I keep the state transitions on the record because the workflow code stays readable.

We also need the signal payload. Create DisagreementEvent.java:

package dev.verdictiq.model;

public record DisagreementEvent(
        String verdictId,
        String text,
        Sentiment graniteLabel,
        String graniteReason,
        Sentiment mistralLabel,
        String mistralReason) {
}

Finally, create the REST DTOs. SubmitVerdictRequest.java:

package dev.verdictiq.model;

public record SubmitVerdictRequest(String text) {
}

and SubmissionAccepted.java:

package dev.verdictiq.model;

public record SubmissionAccepted(String id, String status) {
}

Define the three AI services

The three AI services return ModelVerdict directly, and LangChain4j generates the JSON schema from the record. The @Description annotations on ModelVerdict help the model understand each field.

Create src/main/java/dev/verdictiq/ai/GranitePanelist.java:

package dev.verdictiq.ai;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.verdictiq.model.ModelVerdict;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(modelName = "granite")
public interface GranitePanelist {

    @SystemMessage("""
            You are a sentiment analysis expert.
            Classify the sentiment of the text.
            Use UNCERTAIN when the text is genuinely ambiguous.
            Keep the reason to one short sentence.
            """)
    @UserMessage("Analyze the sentiment of this text: {{text}}")
    ModelVerdict classify(String text);
}

Create MistralPanelist.java:

package dev.verdictiq.ai;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.verdictiq.model.ModelVerdict;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(modelName = "mistral")
public interface MistralPanelist {

    @SystemMessage("""
            You are a sentiment analysis expert.
            Classify the sentiment of the text.
            Use UNCERTAIN when the text is genuinely ambiguous.
            Keep the reason to one short sentence.
            """)
    @UserMessage("Analyze the sentiment of this text: {{text}}")
    ModelVerdict classify(String text);
}

and JudgeAiService.java:

package dev.verdictiq.ai;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.verdictiq.model.ModelVerdict;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(modelName = "judge")
public interface JudgeAiService {

    @SystemMessage("""
            You are a senior sentiment arbiter.
            Two models already reviewed the same text and disagreed.
            Use UNCERTAIN when the text is genuinely ambiguous.
            Keep the reason to one short sentence.
            """)
    @UserMessage("""
            Text: {{text}}
            Model A said {{labelA}} because: {{reasonA}}
            Model B said {{labelB}} because: {{reasonB}}
            Choose the final label.
            """)
    ModelVerdict adjudicate(String text, String labelA, String reasonA, String labelB, String reasonB);
}

The modelName attribute on @RegisterAiService is the currently documented format. You may still see older snippets using @ModelName directly on the interface. For AI services, modelName = "..." is the clearer option now.

Configure the named models

Now wire the models in src/main/resources/application.properties:

quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.timeout=240s
quarkus.langchain4j.ollama.devservices.enabled=false
quarkus.langchain4j.devservices.enabled=false

# Named panel models
quarkus.langchain4j.granite.chat-model.provider=ollama
quarkus.langchain4j.ollama.granite.chat-model.model-id=granite4:3b
quarkus.langchain4j.ollama.granite.chat-model.temperature=0.0
quarkus.langchain4j.ollama.granite.timeout=120s

quarkus.langchain4j.mistral.chat-model.provider=ollama
quarkus.langchain4j.ollama.mistral.chat-model.model-id=mistral
quarkus.langchain4j.ollama.mistral.chat-model.temperature=0.0
quarkus.langchain4j.ollama.mistral.timeout=120s

# Named judge model
quarkus.langchain4j.judge.chat-model.provider=ollama
quarkus.langchain4j.ollama.judge.chat-model.model-id=qwen3:4b
quarkus.langchain4j.ollama.judge.chat-model.temperature=0.0
quarkus.langchain4j.ollama.judge.timeout=120s

Every model lane is named, including granite
Named models need the generic provider bridge, like quarkus.langchain4j.mistral.chat-model.provider=ollama
Provider-specific settings for named models live under quarkus.langchain4j.ollama.<name>.*

Named lanes also need their own timeout. The global quarkus.langchain4j.ollama.timeout does not cover them. Without quarkus.langchain4j.ollama.judge.timeout, the judge falls back to a 10-second HTTP client limit. qwen3:4b can take 20-30 seconds on a laptop, so the disagreement path looks stuck at PENDING even when the panel already finished. I keep temperature at 0.0 because we are classifying, not creative writing.

Add a store for polling

The POST endpoint returns immediately, so the GET endpoint needs somewhere to poll. This tutorial keeps that state in memory on purpose. We are exploring workflow shape, not distributed persistence.

Create src/main/java/dev/verdictiq/service/VerdictStore.java:

package dev.verdictiq.service;

import java.util.Optional;
import java.util.concurrent.ConcurrentHashMap;

import dev.verdictiq.model.PanelVerdict;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class VerdictStore {

    private final ConcurrentHashMap<String, PanelVerdict> store = new ConcurrentHashMap<>();

    public void put(PanelVerdict verdict) {
        store.put(verdict.id(), verdict);
    }

    public Optional<PanelVerdict> find(String id) {
        return Optional.ofNullable(store.get(id));
    }
}

This works for one-process demos and Ollama integration tests, where the point is the async boundary. If you need durable history later, keep the polling contract and move the store to a database.

Run the panel on virtual threads

Both panel calls are blocking HTTP calls to Ollama. They do not belong on the request thread, and they definitely do not belong on an event loop.

We will create one application-scoped virtual-thread executor and qualify it so the intent stays visible.

Create src/main/java/dev/verdictiq/qualifier/PanelWork.java:

package dev.verdictiq.qualifier;

import static java.lang.annotation.ElementType.FIELD;
import static java.lang.annotation.ElementType.METHOD;
import static java.lang.annotation.ElementType.PARAMETER;
import static java.lang.annotation.RetentionPolicy.RUNTIME;

import java.lang.annotation.Retention;
import java.lang.annotation.Target;

import jakarta.inject.Qualifier;

@Qualifier
@Retention(RUNTIME)
@Target({ FIELD, PARAMETER, METHOD })
public @interface PanelWork {
}

Now create src/main/java/dev/verdictiq/service/PanelExecutorProducer.java:

package dev.verdictiq.service;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import dev.verdictiq.qualifier.PanelWork;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Disposes;
import jakarta.enterprise.inject.Produces;

@ApplicationScoped
public class PanelExecutorProducer {

    @Produces
    @ApplicationScoped
    @PanelWork
    ExecutorService panelExecutor() {
        return Executors.newVirtualThreadPerTaskExecutor();
    }

    void close(@Disposes @PanelWork ExecutorService executorService) throws Exception {
        executorService.close();
    }
}

The executor exists once, stays explicit, and shuts down with the application.

Activate request context for off-thread AI calls

LangChain4j registers @RegisterAiService beans as request-scoped by default. That is fine on the HTTP thread. It breaks the moment we call those beans from the virtual-thread executor or from a Signals receiver.

The fix is a separate @ApplicationScoped invoker with @ActivateRequestContext on each method. CDI interceptors do not apply to self-invocation, so this cannot live inside VerdictPanel or PanelArbiter as a private helper.

Create src/main/java/dev/verdictiq/ai/PanelAiInvoker.java:

package dev.verdictiq.ai;

import dev.verdictiq.model.ModelVerdict;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.context.control.ActivateRequestContext;

@ApplicationScoped
public class PanelAiInvoker {

    private final GranitePanelist granitePanelist;
    private final MistralPanelist mistralPanelist;
    private final JudgeAiService judgeAiService;

    public PanelAiInvoker(
            GranitePanelist granitePanelist,
            MistralPanelist mistralPanelist,
            JudgeAiService judgeAiService) {
        this.granitePanelist = granitePanelist;
        this.mistralPanelist = mistralPanelist;
        this.judgeAiService = judgeAiService;
    }

    @ActivateRequestContext
    public ModelVerdict classifyWithGranite(String text) {
        return granitePanelist.classify(text).normalized();
    }

    @ActivateRequestContext
    public ModelVerdict classifyWithMistral(String text) {
        return mistralPanelist.classify(text).normalized();
    }

    @ActivateRequestContext
    public ModelVerdict adjudicate(
            String text,
            String labelA,
            String reasonA,
            String labelB,
            String reasonB) {
        return judgeAiService.adjudicate(text, labelA, reasonA, labelB, reasonB).normalized();
    }
}

normalized() is the guardrail. Local models can return null labels even when the HTTP call succeeds. We coerce those gaps to UNCERTAIN before the workflow stores panel state or wakes the judge.

Without this bridge, the first live curl poll after submit will come back FAILED with a RequestScoped context was not active message. The stub-based tests still pass because the test alternatives are @ApplicationScoped.

Build the panel workflow

Now we can write the main service.

Save a PENDING verdict
Run both panelists in parallel
Store consensus immediately or store the disagreement state
Publish a DisagreementEvent only when needed

Create src/main/java/dev/verdictiq/service/VerdictPanel.java:

package dev.verdictiq.service;

import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.UUID;

import dev.verdictiq.ai.PanelAiInvoker;
import dev.verdictiq.model.DisagreementEvent;
import dev.verdictiq.model.ModelVerdict;
import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.qualifier.PanelWork;
import io.quarkus.signals.Signal;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class VerdictPanel {

    private final PanelAiInvoker panelAiInvoker;
    private final VerdictStore store;
    private final Signal<DisagreementEvent> disagreementSignal;
    private final ExecutorService panelExecutor;

    public VerdictPanel(
            PanelAiInvoker panelAiInvoker,
            VerdictStore store,
            Signal<DisagreementEvent> disagreementSignal,
            @PanelWork ExecutorService panelExecutor) {
        this.panelAiInvoker = panelAiInvoker;
        this.store = store;
        this.disagreementSignal = disagreementSignal;
        this.panelExecutor = panelExecutor;
    }

    public String submit(String text) {
        String id = UUID.randomUUID().toString();
        store.put(PanelVerdict.pending(id, text));
        panelExecutor.submit(() -> runPanel(id, text));
        return id;
    }

    private void runPanel(String id, String text) {
        Future<ModelVerdict> graniteFuture = panelExecutor.submit(() -> panelAiInvoker.classifyWithGranite(text));
        Future<ModelVerdict> mistralFuture = panelExecutor.submit(() -> panelAiInvoker.classifyWithMistral(text));

        try {
            ModelVerdict granite = graniteFuture.get();
            ModelVerdict mistral = mistralFuture.get();

            if (granite.label() == mistral.label()) {
                store.put(PanelVerdict.consensus(id, text, granite, mistral));
                return;
            }

            store.put(PanelVerdict.disagreement(id, text, granite, mistral));

            disagreementSignal.publish(new DisagreementEvent(
                    id,
                    text,
                    granite.label(),
                    granite.reason(),
                    mistral.label(),
                    mistral.reason()));
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            markFailed(id, text, e);
        } catch (ExecutionException | RuntimeException e) {
            markFailed(id, text, e);
        }
    }

    private void markFailed(String id, String text, Exception failure) {
        String message = failure.getCause() != null && failure.getCause().getMessage() != null
                ? failure.getCause().getMessage()
                : failure.getMessage();

        PanelVerdict current = store.find(id).orElse(PanelVerdict.pending(id, text));
        store.put(current.failed(message == null ? "Panel processing failed." : message));
    }
}

The service returns the ID before inference completes, so the HTTP contract stays fast. The actual panel work moves to virtual threads, where blocking Ollama calls are normal.

Notice what publish() is doing here. It is not “send this to the next step.” It is “announce that the panel disagreed.” If we later decide to add an audit receiver, a metric receiver, or a human-review receiver, the panel service does not need to know.

Let Signals wake the judge

Now we wire the disagreement path. We listen for DisagreementEvent, call the judge model, and update the stored verdict.

Create src/main/java/dev/verdictiq/service/PanelArbiter.java:

package dev.verdictiq.service;

import org.jboss.logging.Logger;

import dev.verdictiq.ai.PanelAiInvoker;
import dev.verdictiq.model.DisagreementEvent;
import dev.verdictiq.model.ModelVerdict;
import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.model.Sentiment;
import io.quarkus.signals.Receives;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class PanelArbiter {

    private static final Logger LOG = Logger.getLogger(PanelArbiter.class);

    private final PanelAiInvoker panelAiInvoker;
    private final VerdictStore store;

    public PanelArbiter(PanelAiInvoker panelAiInvoker, VerdictStore store) {
        this.panelAiInvoker = panelAiInvoker;
        this.store = store;
    }

    void onDisagreement(@Receives DisagreementEvent event) {
        PanelVerdict current = store.find(event.verdictId()).orElse(PanelVerdict.pending(event.verdictId(), event.text()));

        try {
            ModelVerdict finalVerdict = panelAiInvoker.adjudicate(
                    event.text(),
                    labelName(event.graniteLabel()),
                    safeReason(event.graniteReason()),
                    labelName(event.mistralLabel()),
                    safeReason(event.mistralReason()));

            boolean abstained = finalVerdict.label() == Sentiment.UNCERTAIN;

            store.put(current.adjudicated(finalVerdict.label(), finalVerdict.reason(), abstained));
        } catch (RuntimeException e) {
            LOG.errorf(e, "Judge failed for verdict %s", event.verdictId());
            store.put(current.failed("Judge failed: " + safeMessage(e)));
        }
    }

    private String labelName(Sentiment label) {
        return label != null ? label.name() : Sentiment.UNCERTAIN.name();
    }

    private String safeReason(String reason) {
        return reason != null && !reason.isBlank() ? reason : "Model returned no reason.";
    }

    private String safeMessage(Throwable failure) {
        return failure.getMessage() == null ? failure.getClass().getSimpleName() : failure.getMessage();
    }
}

I expected to put @Blocking on this receiver because the judge call is another blocking Ollama request. But @Blocking is limited to Quarkus entrypoints, and a Signals receiver is not one of them.

@Receives is on the method parameter, which is the Signals receiver shape the guide documents
The store is updated before and after the signal hop, so polling always has a real state to read

If you are coming from CDI events, qualifier semantics are the part to watch. A receiver with no qualifier listens on @Default, not “all signals of this type.” We do not need qualifiers in this sample, but keep that rule in mind when your workflow grows.

Expose the REST endpoints

The REST layer is pretty straight forward:

Create src/main/java/dev/verdictiq/rest/VerdictResource.java:

package dev.verdictiq.rest;

import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.model.SubmissionAccepted;
import dev.verdictiq.model.SubmitVerdictRequest;
import dev.verdictiq.model.VerdictStatus;
import dev.verdictiq.service.VerdictPanel;
import dev.verdictiq.service.VerdictStore;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;

@Path("/verdict")
@ApplicationScoped
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public class VerdictResource {

    private final VerdictPanel verdictPanel;
    private final VerdictStore store;

    public VerdictResource(VerdictPanel verdictPanel, VerdictStore store) {
        this.verdictPanel = verdictPanel;
        this.store = store;
    }

    @POST
    public Response submit(SubmitVerdictRequest request) {
        if (request == null || request.text() == null || request.text().isBlank()) {
            return Response.status(Response.Status.BAD_REQUEST)
                    .entity(new ErrorMessage("text is required"))
                    .build();
        }

        String id = verdictPanel.submit(request.text());
        return Response.accepted(new SubmissionAccepted(id, VerdictStatus.PENDING.name())).build();
    }

    @GET
    @Path("/{id}")
    public Response get(@PathParam("id") String id) {
        return store.find(id)
                .<Response>map(verdict -> Response.ok(verdict).build())
                .orElseGet(() -> Response.status(Response.Status.NOT_FOUND).build());
    }

    public record ErrorMessage(String error) {
    }
}

The API surface stays small as well.

Run it once before we talk about tests

Start dev mode:

./mvnw quarkus:dev

Submit one ambiguous sentence:

curl -s -X POST http://localhost:8080/verdict \
  -H 'Content-Type: application/json' \
  -d '{"text":"I guess the service was fine, not terrible."}' | jq

Expected shape:

{
  "id": "a6e10a6d-373e-4246-87a2-94be71e0bb1e",
  "status": "PENDING"
}

Poll the verdict. On a disagreement path, the first poll may still show PENDING with both panel opinions filled in while qwen3:4b adjudicates. Keep polling until status leaves PENDING:

curl -s http://localhost:8080/verdict/a6e10a6d-373e-4246-87a2-94be71e0bb1e | jq

On my laptop, that sentence disagreed and the judge abstained:

{
  "id": "a6e10a6d-373e-4246-87a2-94be71e0bb1e",
  "text": "I guess the service was fine, not terrible.",
  "status": "COMPLETE",
  "graniteLabel": "UNCERTAIN",
  "graniteReason": "The phrase 'I guess' indicates uncertainty about the overall experience.",
  "mistralLabel": "NEUTRAL",
  "mistralReason": "The text mentions that the service was 'fine' but also notes it wasn't 'terrible', indicating a neutral sentiment.",
  "agreement": false,
  "finalVerdict": "UNCERTAIN",
  "finalReason": "The phrase 'I guess' indicates uncertainty about the service.",
  "abstained": true
}

That is what we built for. Granite saw hedging. Mistral saw neutral. The judge returned UNCERTAIN and set abstained = true.

When the panel agrees instead, the same poll shape collapses to consensus: matching graniteLabel and mistralLabel, agreement = true, finalReason = "Panel consensus", and abstained = false.

Run the stub tests before you test against real Ollama models. VerdictResourceTest uses fake AI services from src/test/java/dev/verdictiq/testsupport/ instead of Ollama. Those checks stay fast and give the same result every time.

The default suite covers these cases:

blank input returns 400
unknown verdict IDs return 404
panel consensus completes without the judge
disagreement can end in UNCERTAIN
a panel exception becomes FAILED
judge failure becomes FAILED

Keep that suite green on every change. The Ollama test below checks something else.

Test ambiguous sentences with Ollama

This section is where the sample gets useful. Clean demo text is polite. Ambiguous text tells you more.

Create src/test/resources/ambiguous-texts.json:

[
  { "category": "sarcasm", "text": "Oh great, another Monday." },
  { "category": "sarcasm", "text": "Sure, because that always works out so well." },
  { "category": "double-negative", "text": "I wouldn't say it wasn't without its problems." },
  { "category": "cultural-idiom", "text": "That presentation was sick." },
  { "category": "cultural-idiom", "text": "The food was literally fire." },
  { "category": "dry-humor", "text": "Fantastic. Everything is on fire. Wonderful." },
  { "category": "mixed-signal", "text": "The hotel was conveniently located near the airport, which meant we could hear every plane." },
  { "category": "understatement", "text": "The flight delay was not ideal." },
  { "category": "clearly-positive", "text": "This is the best Java framework I have ever used." },
  { "category": "clearly-negative", "text": "The deployment failed and took down production for six hours." }
]

I keep this in a resource file so I can add more sentences over time. When a local model surprises you, add that sentence here.

Now create src/test/java/dev/verdictiq/AmbiguousText.java:

package dev.verdictiq;

public record AmbiguousText(String category, String text) {
}

and src/test/java/dev/verdictiq/VerdictBatteryTest.java:

package dev.verdictiq;

import static io.restassured.RestAssured.given;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertNotNull;

import java.io.IOException;
import java.io.InputStream;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;

import org.jboss.logging.Logger;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.condition.EnabledIfSystemProperty;

import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;

import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.model.SubmissionAccepted;
import dev.verdictiq.model.SubmitVerdictRequest;
import dev.verdictiq.model.VerdictStatus;
import io.quarkus.test.junit.QuarkusTest;
import io.restassured.http.ContentType;
import jakarta.inject.Inject;

@QuarkusTest
@EnabledIfSystemProperty(named = "verdictiq.live", matches = "true")
class VerdictBatteryTest {

    private static final Logger LOG = Logger.getLogger(VerdictBatteryTest.class);

    @Inject
    ObjectMapper objectMapper;

    @Test
    void runsTheAmbiguousBattery() throws Exception {
        List<AmbiguousText> samples = loadSamples();
        List<PanelVerdict> results = new ArrayList<>();

        for (AmbiguousText sample : samples) {
            SubmissionAccepted accepted = given()
                    .contentType(ContentType.JSON)
                    .body(new SubmitVerdictRequest(sample.text()))
                    .when()
                    .post("/verdict")
                    .then()
                    .statusCode(202)
                    .extract()
                    .as(SubmissionAccepted.class);

            PanelVerdict verdict = waitForVerdict(accepted.id());
            results.add(verdict);

            assertEquals(VerdictStatus.COMPLETE, verdict.status(),
                    () -> "Verdict did not complete for: " + sample.text() + " (" + verdict.finalReason() + ")");
            assertNotNull(verdict.finalVerdict(), () -> "Final label missing for: " + sample.text());
        }

        logSummary(samples, results);
    }

    private List<AmbiguousText> loadSamples() throws IOException {
        try (InputStream stream = Thread.currentThread().getContextClassLoader().getResourceAsStream("ambiguous-texts.json")) {
            if (stream == null) {
                throw new IllegalStateException("ambiguous-texts.json was not found.");
            }
            return objectMapper.readValue(stream, new TypeReference<List<AmbiguousText>>() {
            });
        }
    }

    private PanelVerdict waitForVerdict(String id) throws InterruptedException {
        Instant deadline = Instant.now().plus(Duration.ofMinutes(2));

        while (Instant.now().isBefore(deadline)) {
            PanelVerdict verdict = given()
                    .when()
                    .get("/verdict/{id}", id)
                    .then()
                    .statusCode(200)
                    .extract()
                    .as(PanelVerdict.class);

            if (verdict.status() != VerdictStatus.PENDING) {
                return verdict;
            }

            Thread.sleep(250);
        }

        throw new IllegalStateException("Timed out waiting for verdict " + id);
    }

    private void logSummary(List<AmbiguousText> samples, List<PanelVerdict> results) {
        long disagreements = results.stream().filter(result -> !result.agreement()).count();
        long abstentions = results.stream().filter(PanelVerdict::abstained).count();

        StringBuilder table = new StringBuilder();
        table.append(System.lineSeparator());
        table.append(String.format("%-18s %-6s %-11s %-11s %-11s %-10s%n",
                "category", "agree", "granite", "mistral", "final", "abstained"));

        for (int i = 0; i < samples.size(); i++) {
            AmbiguousText sample = samples.get(i);
            PanelVerdict verdict = results.get(i);

            table.append(String.format("%-18s %-6s %-11s %-11s %-11s %-10s%n",
                    sample.category(),
                    verdict.agreement(),
                    verdict.graniteLabel(),
                    verdict.mistralLabel(),
                    verdict.finalVerdict(),
                    verdict.abstained()));
        }

        table.append(System.lineSeparator());
        table.append("Disagreements: ").append(disagreements).append(System.lineSeparator());
        table.append("Abstentions: ").append(abstentions);

        LOG.info(table.toString());
    }
}

This test calls real Ollama models. The stub tests above already check the workflow shape. I only enable it with -Dverdictiq.live=true so CI does not fail when Ollama is not running, a model is missing, or a borderline sentence changes behavior after a model update.

The assertion style is important:

We assert the contract shape and workflow completion
We log disagreements and abstentions instead of pretending that every local model pair will disagree on the same rows forever
We do not assert exact labels for every sample

That last rule keeps the test honest. These are probabilistic systems. If you pin exact wording or exact labels for every borderline sentence, you are not testing robustness. You are testing whether Tuesday behaves exactly like Monday.

Prove it

Start with the deterministic suite:

./mvnw test

That run should stay stable even if you have not pulled the Ollama models yet.

Then run the Ollama integration test:

./mvnw test -Dverdictiq.live=true

On a laptop, that run takes about five to six minutes. Expect the usual Quarkus test output plus a logged table similar to this:

category           agree  granite     mistral     final       abstained 
sarcasm            false  NEGATIVE    UNCERTAIN   NEGATIVE    false     
sarcasm            false  NEGATIVE    UNCERTAIN   NEGATIVE    false     
double-negative    true   UNCERTAIN   UNCERTAIN   UNCERTAIN   false     
cultural-idiom     false  POSITIVE    UNCERTAIN   POSITIVE    false     
cultural-idiom     false  POSITIVE    UNCERTAIN   POSITIVE    false     
dry-humor          true   UNCERTAIN   UNCERTAIN   UNCERTAIN   false     
mixed-signal       false  UNCERTAIN   NEGATIVE    NEGATIVE    false     
understatement     false  NEGATIVE    UNCERTAIN   NEGATIVE    false     
clearly-positive   false  POSITIVE    UNCERTAIN   POSITIVE    false     
clearly-negative   false  NEGATIVE    UNCERTAIN   NEGATIVE    false  

Disagreements: 8
Abstentions: 0

The exact rows will vary. Look for the pattern:

clear sentences usually converge
ambiguous sentences disagree more often
some of them deserve UNCERTAIN

If you get zero disagreements across these sentences, the input set is too easy, the two panel models are too similar, or both. That result is still useful. It tells you the panel is not buying you much yet.

Make it survive

The happy path works now. Many AI tutorials stop here. Production trouble usually starts here too.

Model latency is your first queue

The panel looks cheap until you count the calls. Two panel calls plus one judge call turn one request into a small latency budget very quickly. It also raises API cost.

The POST endpoint avoids holding the client open, and the virtual-thread executor keeps blocking calls off the request path. It does not make model latency disappear. If you expect real throughput, add explicit concurrency limits, request budgets, and maybe a durable queue before you decide this belongs on a public path.

`UNCERTAIN` is not failure

UNCERTAIN means the judge looked at the text and refused fake confidence. FAILED means your system broke. Those two states are operationally different, and your API should say so directly.

If you later add alerts or dashboards, count them separately. A spike in UNCERTAIN may mean your input set changed. A spike in FAILED means your service is sick.

Keep disagreement in-process until you actually need a broker

Signals are a good fit here because the whole workflow belongs to one process. We want type-safe async coordination, not distributed choreography.

If the next step is “open a human review task in another service” or “fan this event into analytics and audit storage,” then moving the disagreement event to Reactive Messaging may be the right call. Right now it would be extra machinery with no extra truth.

In-memory verdicts need expiry

VerdictStore is a ConcurrentHashMap, and it keeps every verdict forever. That works for a tutorial. After a few days of real traffic, it becomes a memory problem.

If you keep the polling contract, add one of these next:

a TTL cleanup job
a max-size policy
a persistent store with explicit retention

Otherwise the nice simple polling API becomes a slow memory leak with better naming.

Pre-pull and warm the models

This sample keeps the Ollama base URL explicit and disables Dev Services, so the runtime story stays simple: start Ollama locally, pull the models once, and use the same endpoint everywhere.

For anything beyond the first local run, pre-pull the models and hit the endpoint once before demos or tests. Cold local inference is still cold inference.

Close the loop

We built a Quarkus service that asks twice, disagrees on purpose, and uses Signals to make disagreement visible. The mental shift is simple: a wrong local-model label gets much worse when the system presents it as final.

Quarkus Signals: Build In-Process Messaging Without a Broker

Discussion about this post

Ready for more?

Build a Local AI Jury in Quarkus with Ollama

A hands-on guide to parallel model calls, Quarkus Signals, and a judge step that can return UNCERTAIN when the text stays messy.

Quarkus Signals: Build In-Process Messaging Without a Broker

What we build

What you need

The control flow before code

Project setup

Start with the verdict model

Define the three AI services

Configure the named models

Add a store for polling

Run the panel on virtual threads

Activate request context for off-thread AI calls

Build the panel workflow

Let Signals wake the judge

Expose the REST endpoints

Run it once before we talk about tests

Test ambiguous sentences with Ollama

Prove it

Make it survive

Model latency is your first queue

UNCERTAIN is not failure

Keep disagreement in-process until you actually need a broker

In-memory verdicts need expiry

Pre-pull and warm the models

Close the loop

Discussion about this post

Ready for more?

`UNCERTAIN` is not failure