Build a Local AI Jury in Quarkus with Ollama
A hands-on guide to parallel model calls, Quarkus Signals, and a judge step that can return UNCERTAIN when the text stays messy.
Single-model AI demos often treat the first confident answer as if it were a decision. A local model can be fast, cheap, and still wrong in a very polished way. When the text is sarcastic, mixed, or culturally loaded, “the model answered” does not mean “the system knows.”
We build a small Quarkus service that asks twice. granite4:3b and mistral both classify the same text. If they agree, the service returns the consensus. If they disagree, we emit a DisagreementEvent over Quarkus Signals and wake up a third model, qwen3:4b, to adjudicate. The judge can also say UNCERTAIN. That is better than a confident lie.
Read more about Quarkus Signals in my earlier overview.
What we build
VerdictIQ is a small REST app with two endpoints:
POST /verdictaccepts{ "text": "..." }and returns a verdict ID immediatelyGET /verdict/{id}returns the panel state, both model opinions, and the final result
The workflow is small enough to hold in your head:
Store a
PENDINGverdictCall
granite4:3bandmistralin parallelIf they agree, mark the verdict
COMPLETEIf they disagree, publish a
DisagreementEventLet a Signals receiver call
qwen3:4bEither settle on a final label or admit
UNCERTAIN
Everything stays inside one Quarkus process. No broker, no external queue, and no fake “we will harden this later” step. Signals fit here because the whole workflow is local.
What you need
You already know Quarkus REST, CDI, and basic Java concurrency. The new parts here are named AI services and Signals receivers.
JDK 25
Quarkus CLI 3.36.2 or Maven 3.9+
Ollama installed locally
Roughly 6 GB of free disk if you use
granite4:3b,mistral, andqwen3:4bAbout ☕️☕️☕️☕️ (local models are slow and need a lot of RAM!)
The control flow before code
Before we touch Java, I want the state model to be explicit. Otherwise this becomes another AI demo where “async” means “some stuff happened and then JSON appeared.”
We have three outcomes:
Consensus — the panel agrees, so the result is complete without the judge
Adjudication — the panel disagrees, so the judge picks the final label
Uncertainty — the judge returns
UNCERTAIN, which is still a valid completion, not a crash
We also keep one separate failure state:
Failure — model call, parsing, or adjudication blew up, so the verdict becomes
FAILED
UNCERTAIN is a model judgment. FAILED is a system problem. If you collapse them into one field, your API lies.
Project setup
Create the project or follow along with the sources in my Github repository:
quarkus create app dev.verdictiq:verdictiq \
--extension='rest-jackson,quarkus-signals,quarkus-langchain4j-ollama' \
--platform-version=3.36.2 \
--java=25 \
--no-code
cd verdictiqExtension roles:
rest-jackson— JSON HTTP endpointsquarkus-signals— in-process async messaging withpublish(),send(), andrequest()quarkus-langchain4j-ollama— local chat models through Ollama
Pre-pull the models before you start dev mode. This sample expects a local Ollama runtime, and the first request is a bad time to discover you still need three model downloads:
ollama pull granite4:3b
ollama pull mistral
ollama pull qwen3:4bI started with llama3.2 for the first panel lane. It looked fine in raw Ollama tests, but LangChain4j’s direct ModelVerdict mapping came back with null labels. granite4:3b is more reliable there. Mistral usually works too, but it can still return incomplete fields, so we normalize those responses before the workflow uses them.
Start with the verdict model
The API contract starts with this record. If it is wrong, the rest of the code only makes the wrong answer faster.
Create src/main/java/dev/verdictiq/model/Sentiment.java:
package dev.verdictiq.model;
public enum Sentiment {
POSITIVE,
NEGATIVE,
NEUTRAL,
UNCERTAIN
}Create VerdictStatus.java:
package dev.verdictiq.model;
public enum VerdictStatus {
PENDING,
COMPLETE,
FAILED
}Create ModelVerdict.java:
package dev.verdictiq.model;
import dev.langchain4j.model.output.structured.Description;
public record ModelVerdict(
@Description("One of POSITIVE, NEGATIVE, NEUTRAL, or UNCERTAIN")
Sentiment label,
@Description("One short sentence explaining the classification")
String reason) {
public ModelVerdict normalized() {
Sentiment safeLabel = label != null ? label : Sentiment.UNCERTAIN;
String safeReason = reason != null && !reason.isBlank() ? reason : "Model returned no reason.";
if (safeLabel == label && safeReason.equals(reason)) {
return this;
}
return new ModelVerdict(safeLabel, safeReason);
}
}Now create PanelVerdict.java:
package dev.verdictiq.model;
public record PanelVerdict(
String id,
String text,
VerdictStatus status,
Sentiment graniteLabel,
String graniteReason,
Sentiment mistralLabel,
String mistralReason,
boolean agreement,
Sentiment finalVerdict,
String finalReason,
boolean abstained) {
public static PanelVerdict pending(String id, String text) {
return new PanelVerdict(id, text, VerdictStatus.PENDING, null, null, null, null, false, null, null, false);
}
public static PanelVerdict consensus(String id, String text, ModelVerdict granite, ModelVerdict mistral) {
return new PanelVerdict(
id,
text,
VerdictStatus.COMPLETE,
granite.label(),
granite.reason(),
mistral.label(),
mistral.reason(),
true,
granite.label(),
"Panel consensus",
false);
}
public static PanelVerdict disagreement(String id, String text, ModelVerdict granite, ModelVerdict mistral) {
return new PanelVerdict(
id,
text,
VerdictStatus.PENDING,
granite.label(),
granite.reason(),
mistral.label(),
mistral.reason(),
false,
null,
null,
false);
}
public PanelVerdict adjudicated(Sentiment finalVerdict, String finalReason, boolean abstained) {
return new PanelVerdict(
id,
text,
VerdictStatus.COMPLETE,
graniteLabel,
graniteReason,
mistralLabel,
mistralReason,
agreement,
finalVerdict,
finalReason,
abstained);
}
public PanelVerdict failed(String finalReason) {
return new PanelVerdict(
id,
text,
VerdictStatus.FAILED,
graniteLabel,
graniteReason,
mistralLabel,
mistralReason,
agreement,
null,
finalReason,
false);
}
}I keep the state transitions on the record because the workflow code stays readable.
We also need the signal payload. Create DisagreementEvent.java:
package dev.verdictiq.model;
public record DisagreementEvent(
String verdictId,
String text,
Sentiment graniteLabel,
String graniteReason,
Sentiment mistralLabel,
String mistralReason) {
}Finally, create the REST DTOs. SubmitVerdictRequest.java:
package dev.verdictiq.model;
public record SubmitVerdictRequest(String text) {
}and SubmissionAccepted.java:
package dev.verdictiq.model;
public record SubmissionAccepted(String id, String status) {
}Define the three AI services
The three AI services return ModelVerdict directly, and LangChain4j generates the JSON schema from the record. The @Description annotations on ModelVerdict help the model understand each field.
Create src/main/java/dev/verdictiq/ai/GranitePanelist.java:
package dev.verdictiq.ai;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.verdictiq.model.ModelVerdict;
import io.quarkiverse.langchain4j.RegisterAiService;
@RegisterAiService(modelName = "granite")
public interface GranitePanelist {
@SystemMessage("""
You are a sentiment analysis expert.
Classify the sentiment of the text.
Use UNCERTAIN when the text is genuinely ambiguous.
Keep the reason to one short sentence.
""")
@UserMessage("Analyze the sentiment of this text: {{text}}")
ModelVerdict classify(String text);
}Create MistralPanelist.java:
package dev.verdictiq.ai;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.verdictiq.model.ModelVerdict;
import io.quarkiverse.langchain4j.RegisterAiService;
@RegisterAiService(modelName = "mistral")
public interface MistralPanelist {
@SystemMessage("""
You are a sentiment analysis expert.
Classify the sentiment of the text.
Use UNCERTAIN when the text is genuinely ambiguous.
Keep the reason to one short sentence.
""")
@UserMessage("Analyze the sentiment of this text: {{text}}")
ModelVerdict classify(String text);
}and JudgeAiService.java:
package dev.verdictiq.ai;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.verdictiq.model.ModelVerdict;
import io.quarkiverse.langchain4j.RegisterAiService;
@RegisterAiService(modelName = "judge")
public interface JudgeAiService {
@SystemMessage("""
You are a senior sentiment arbiter.
Two models already reviewed the same text and disagreed.
Use UNCERTAIN when the text is genuinely ambiguous.
Keep the reason to one short sentence.
""")
@UserMessage("""
Text: {{text}}
Model A said {{labelA}} because: {{reasonA}}
Model B said {{labelB}} because: {{reasonB}}
Choose the final label.
""")
ModelVerdict adjudicate(String text, String labelA, String reasonA, String labelB, String reasonB);
}The modelName attribute on @RegisterAiService is the currently documented format. You may still see older snippets using @ModelName directly on the interface. For AI services, modelName = "..." is the clearer option now.
Configure the named models
Now wire the models in src/main/resources/application.properties:
quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.timeout=240s
quarkus.langchain4j.ollama.devservices.enabled=false
quarkus.langchain4j.devservices.enabled=false
# Named panel models
quarkus.langchain4j.granite.chat-model.provider=ollama
quarkus.langchain4j.ollama.granite.chat-model.model-id=granite4:3b
quarkus.langchain4j.ollama.granite.chat-model.temperature=0.0
quarkus.langchain4j.ollama.granite.timeout=120s
quarkus.langchain4j.mistral.chat-model.provider=ollama
quarkus.langchain4j.ollama.mistral.chat-model.model-id=mistral
quarkus.langchain4j.ollama.mistral.chat-model.temperature=0.0
quarkus.langchain4j.ollama.mistral.timeout=120s
# Named judge model
quarkus.langchain4j.judge.chat-model.provider=ollama
quarkus.langchain4j.ollama.judge.chat-model.model-id=qwen3:4b
quarkus.langchain4j.ollama.judge.chat-model.temperature=0.0
quarkus.langchain4j.ollama.judge.timeout=120sEvery model lane is named, including
graniteNamed models need the generic provider bridge, like
quarkus.langchain4j.mistral.chat-model.provider=ollamaProvider-specific settings for named models live under
quarkus.langchain4j.ollama.<name>.*
Named lanes also need their own timeout. The global quarkus.langchain4j.ollama.timeout does not cover them. Without quarkus.langchain4j.ollama.judge.timeout, the judge falls back to a 10-second HTTP client limit. qwen3:4b can take 20-30 seconds on a laptop, so the disagreement path looks stuck at PENDING even when the panel already finished. I keep temperature at 0.0 because we are classifying, not creative writing.
Add a store for polling
The POST endpoint returns immediately, so the GET endpoint needs somewhere to poll. This tutorial keeps that state in memory on purpose. We are exploring workflow shape, not distributed persistence.
Create src/main/java/dev/verdictiq/service/VerdictStore.java:
package dev.verdictiq.service;
import java.util.Optional;
import java.util.concurrent.ConcurrentHashMap;
import dev.verdictiq.model.PanelVerdict;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class VerdictStore {
private final ConcurrentHashMap<String, PanelVerdict> store = new ConcurrentHashMap<>();
public void put(PanelVerdict verdict) {
store.put(verdict.id(), verdict);
}
public Optional<PanelVerdict> find(String id) {
return Optional.ofNullable(store.get(id));
}
}This works for one-process demos and Ollama integration tests, where the point is the async boundary. If you need durable history later, keep the polling contract and move the store to a database.
Run the panel on virtual threads
Both panel calls are blocking HTTP calls to Ollama. They do not belong on the request thread, and they definitely do not belong on an event loop.
We will create one application-scoped virtual-thread executor and qualify it so the intent stays visible.
Create src/main/java/dev/verdictiq/qualifier/PanelWork.java:
package dev.verdictiq.qualifier;
import static java.lang.annotation.ElementType.FIELD;
import static java.lang.annotation.ElementType.METHOD;
import static java.lang.annotation.ElementType.PARAMETER;
import static java.lang.annotation.RetentionPolicy.RUNTIME;
import java.lang.annotation.Retention;
import java.lang.annotation.Target;
import jakarta.inject.Qualifier;
@Qualifier
@Retention(RUNTIME)
@Target({ FIELD, PARAMETER, METHOD })
public @interface PanelWork {
}Now create src/main/java/dev/verdictiq/service/PanelExecutorProducer.java:
package dev.verdictiq.service;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import dev.verdictiq.qualifier.PanelWork;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Disposes;
import jakarta.enterprise.inject.Produces;
@ApplicationScoped
public class PanelExecutorProducer {
@Produces
@ApplicationScoped
@PanelWork
ExecutorService panelExecutor() {
return Executors.newVirtualThreadPerTaskExecutor();
}
void close(@Disposes @PanelWork ExecutorService executorService) throws Exception {
executorService.close();
}
}The executor exists once, stays explicit, and shuts down with the application.
Activate request context for off-thread AI calls
LangChain4j registers @RegisterAiService beans as request-scoped by default. That is fine on the HTTP thread. It breaks the moment we call those beans from the virtual-thread executor or from a Signals receiver.
The fix is a separate @ApplicationScoped invoker with @ActivateRequestContext on each method. CDI interceptors do not apply to self-invocation, so this cannot live inside VerdictPanel or PanelArbiter as a private helper.
Create src/main/java/dev/verdictiq/ai/PanelAiInvoker.java:
package dev.verdictiq.ai;
import dev.verdictiq.model.ModelVerdict;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.context.control.ActivateRequestContext;
@ApplicationScoped
public class PanelAiInvoker {
private final GranitePanelist granitePanelist;
private final MistralPanelist mistralPanelist;
private final JudgeAiService judgeAiService;
public PanelAiInvoker(
GranitePanelist granitePanelist,
MistralPanelist mistralPanelist,
JudgeAiService judgeAiService) {
this.granitePanelist = granitePanelist;
this.mistralPanelist = mistralPanelist;
this.judgeAiService = judgeAiService;
}
@ActivateRequestContext
public ModelVerdict classifyWithGranite(String text) {
return granitePanelist.classify(text).normalized();
}
@ActivateRequestContext
public ModelVerdict classifyWithMistral(String text) {
return mistralPanelist.classify(text).normalized();
}
@ActivateRequestContext
public ModelVerdict adjudicate(
String text,
String labelA,
String reasonA,
String labelB,
String reasonB) {
return judgeAiService.adjudicate(text, labelA, reasonA, labelB, reasonB).normalized();
}
}normalized() is the guardrail. Local models can return null labels even when the HTTP call succeeds. We coerce those gaps to UNCERTAIN before the workflow stores panel state or wakes the judge.
Without this bridge, the first live curl poll after submit will come back FAILED with a RequestScoped context was not active message. The stub-based tests still pass because the test alternatives are @ApplicationScoped.
Build the panel workflow
Now we can write the main service.
Save a
PENDINGverdictRun both panelists in parallel
Store consensus immediately or store the disagreement state
Publish a
DisagreementEventonly when needed
Create src/main/java/dev/verdictiq/service/VerdictPanel.java:
package dev.verdictiq.service;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.UUID;
import dev.verdictiq.ai.PanelAiInvoker;
import dev.verdictiq.model.DisagreementEvent;
import dev.verdictiq.model.ModelVerdict;
import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.qualifier.PanelWork;
import io.quarkus.signals.Signal;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class VerdictPanel {
private final PanelAiInvoker panelAiInvoker;
private final VerdictStore store;
private final Signal<DisagreementEvent> disagreementSignal;
private final ExecutorService panelExecutor;
public VerdictPanel(
PanelAiInvoker panelAiInvoker,
VerdictStore store,
Signal<DisagreementEvent> disagreementSignal,
@PanelWork ExecutorService panelExecutor) {
this.panelAiInvoker = panelAiInvoker;
this.store = store;
this.disagreementSignal = disagreementSignal;
this.panelExecutor = panelExecutor;
}
public String submit(String text) {
String id = UUID.randomUUID().toString();
store.put(PanelVerdict.pending(id, text));
panelExecutor.submit(() -> runPanel(id, text));
return id;
}
private void runPanel(String id, String text) {
Future<ModelVerdict> graniteFuture = panelExecutor.submit(() -> panelAiInvoker.classifyWithGranite(text));
Future<ModelVerdict> mistralFuture = panelExecutor.submit(() -> panelAiInvoker.classifyWithMistral(text));
try {
ModelVerdict granite = graniteFuture.get();
ModelVerdict mistral = mistralFuture.get();
if (granite.label() == mistral.label()) {
store.put(PanelVerdict.consensus(id, text, granite, mistral));
return;
}
store.put(PanelVerdict.disagreement(id, text, granite, mistral));
disagreementSignal.publish(new DisagreementEvent(
id,
text,
granite.label(),
granite.reason(),
mistral.label(),
mistral.reason()));
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
markFailed(id, text, e);
} catch (ExecutionException | RuntimeException e) {
markFailed(id, text, e);
}
}
private void markFailed(String id, String text, Exception failure) {
String message = failure.getCause() != null && failure.getCause().getMessage() != null
? failure.getCause().getMessage()
: failure.getMessage();
PanelVerdict current = store.find(id).orElse(PanelVerdict.pending(id, text));
store.put(current.failed(message == null ? "Panel processing failed." : message));
}
}The service returns the ID before inference completes, so the HTTP contract stays fast. The actual panel work moves to virtual threads, where blocking Ollama calls are normal.
Notice what publish() is doing here. It is not “send this to the next step.” It is “announce that the panel disagreed.” If we later decide to add an audit receiver, a metric receiver, or a human-review receiver, the panel service does not need to know.
Let Signals wake the judge
Now we wire the disagreement path. We listen for DisagreementEvent, call the judge model, and update the stored verdict.
Create src/main/java/dev/verdictiq/service/PanelArbiter.java:
package dev.verdictiq.service;
import org.jboss.logging.Logger;
import dev.verdictiq.ai.PanelAiInvoker;
import dev.verdictiq.model.DisagreementEvent;
import dev.verdictiq.model.ModelVerdict;
import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.model.Sentiment;
import io.quarkus.signals.Receives;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class PanelArbiter {
private static final Logger LOG = Logger.getLogger(PanelArbiter.class);
private final PanelAiInvoker panelAiInvoker;
private final VerdictStore store;
public PanelArbiter(PanelAiInvoker panelAiInvoker, VerdictStore store) {
this.panelAiInvoker = panelAiInvoker;
this.store = store;
}
void onDisagreement(@Receives DisagreementEvent event) {
PanelVerdict current = store.find(event.verdictId()).orElse(PanelVerdict.pending(event.verdictId(), event.text()));
try {
ModelVerdict finalVerdict = panelAiInvoker.adjudicate(
event.text(),
labelName(event.graniteLabel()),
safeReason(event.graniteReason()),
labelName(event.mistralLabel()),
safeReason(event.mistralReason()));
boolean abstained = finalVerdict.label() == Sentiment.UNCERTAIN;
store.put(current.adjudicated(finalVerdict.label(), finalVerdict.reason(), abstained));
} catch (RuntimeException e) {
LOG.errorf(e, "Judge failed for verdict %s", event.verdictId());
store.put(current.failed("Judge failed: " + safeMessage(e)));
}
}
private String labelName(Sentiment label) {
return label != null ? label.name() : Sentiment.UNCERTAIN.name();
}
private String safeReason(String reason) {
return reason != null && !reason.isBlank() ? reason : "Model returned no reason.";
}
private String safeMessage(Throwable failure) {
return failure.getMessage() == null ? failure.getClass().getSimpleName() : failure.getMessage();
}
}I expected to put @Blocking on this receiver because the judge call is another blocking Ollama request. But @Blocking is limited to Quarkus entrypoints, and a Signals receiver is not one of them.
@Receivesis on the method parameter, which is the Signals receiver shape the guide documentsThe store is updated before and after the signal hop, so polling always has a real state to read
If you are coming from CDI events, qualifier semantics are the part to watch. A receiver with no qualifier listens on @Default, not “all signals of this type.” We do not need qualifiers in this sample, but keep that rule in mind when your workflow grows.
Expose the REST endpoints
The REST layer is pretty straight forward:
Create src/main/java/dev/verdictiq/rest/VerdictResource.java:
package dev.verdictiq.rest;
import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.model.SubmissionAccepted;
import dev.verdictiq.model.SubmitVerdictRequest;
import dev.verdictiq.model.VerdictStatus;
import dev.verdictiq.service.VerdictPanel;
import dev.verdictiq.service.VerdictStore;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
@Path("/verdict")
@ApplicationScoped
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public class VerdictResource {
private final VerdictPanel verdictPanel;
private final VerdictStore store;
public VerdictResource(VerdictPanel verdictPanel, VerdictStore store) {
this.verdictPanel = verdictPanel;
this.store = store;
}
@POST
public Response submit(SubmitVerdictRequest request) {
if (request == null || request.text() == null || request.text().isBlank()) {
return Response.status(Response.Status.BAD_REQUEST)
.entity(new ErrorMessage("text is required"))
.build();
}
String id = verdictPanel.submit(request.text());
return Response.accepted(new SubmissionAccepted(id, VerdictStatus.PENDING.name())).build();
}
@GET
@Path("/{id}")
public Response get(@PathParam("id") String id) {
return store.find(id)
.<Response>map(verdict -> Response.ok(verdict).build())
.orElseGet(() -> Response.status(Response.Status.NOT_FOUND).build());
}
public record ErrorMessage(String error) {
}
}The API surface stays small as well.
Run it once before we talk about tests
Start dev mode:
./mvnw quarkus:devSubmit one ambiguous sentence:
curl -s -X POST http://localhost:8080/verdict \
-H 'Content-Type: application/json' \
-d '{"text":"I guess the service was fine, not terrible."}' | jqExpected shape:
{
"id": "a6e10a6d-373e-4246-87a2-94be71e0bb1e",
"status": "PENDING"
}Poll the verdict. On a disagreement path, the first poll may still show PENDING with both panel opinions filled in while qwen3:4b adjudicates. Keep polling until status leaves PENDING:
curl -s http://localhost:8080/verdict/a6e10a6d-373e-4246-87a2-94be71e0bb1e | jqOn my laptop, that sentence disagreed and the judge abstained:
{
"id": "a6e10a6d-373e-4246-87a2-94be71e0bb1e",
"text": "I guess the service was fine, not terrible.",
"status": "COMPLETE",
"graniteLabel": "UNCERTAIN",
"graniteReason": "The phrase 'I guess' indicates uncertainty about the overall experience.",
"mistralLabel": "NEUTRAL",
"mistralReason": "The text mentions that the service was 'fine' but also notes it wasn't 'terrible', indicating a neutral sentiment.",
"agreement": false,
"finalVerdict": "UNCERTAIN",
"finalReason": "The phrase 'I guess' indicates uncertainty about the service.",
"abstained": true
}That is what we built for. Granite saw hedging. Mistral saw neutral. The judge returned UNCERTAIN and set abstained = true.
When the panel agrees instead, the same poll shape collapses to consensus: matching graniteLabel and mistralLabel, agreement = true, finalReason = "Panel consensus", and abstained = false.
Run the stub tests before you test against real Ollama models. VerdictResourceTest uses fake AI services from src/test/java/dev/verdictiq/testsupport/ instead of Ollama. Those checks stay fast and give the same result every time.
The default suite covers these cases:
blank input returns
400unknown verdict IDs return
404panel consensus completes without the judge
disagreement can end in
UNCERTAINa panel exception becomes
FAILEDjudge failure becomes
FAILED
Keep that suite green on every change. The Ollama test below checks something else.
Test ambiguous sentences with Ollama
This section is where the sample gets useful. Clean demo text is polite. Ambiguous text tells you more.
Create src/test/resources/ambiguous-texts.json:
[
{ "category": "sarcasm", "text": "Oh great, another Monday." },
{ "category": "sarcasm", "text": "Sure, because that always works out so well." },
{ "category": "double-negative", "text": "I wouldn't say it wasn't without its problems." },
{ "category": "cultural-idiom", "text": "That presentation was sick." },
{ "category": "cultural-idiom", "text": "The food was literally fire." },
{ "category": "dry-humor", "text": "Fantastic. Everything is on fire. Wonderful." },
{ "category": "mixed-signal", "text": "The hotel was conveniently located near the airport, which meant we could hear every plane." },
{ "category": "understatement", "text": "The flight delay was not ideal." },
{ "category": "clearly-positive", "text": "This is the best Java framework I have ever used." },
{ "category": "clearly-negative", "text": "The deployment failed and took down production for six hours." }
]I keep this in a resource file so I can add more sentences over time. When a local model surprises you, add that sentence here.
Now create src/test/java/dev/verdictiq/AmbiguousText.java:
package dev.verdictiq;
public record AmbiguousText(String category, String text) {
}and src/test/java/dev/verdictiq/VerdictBatteryTest.java:
package dev.verdictiq;
import static io.restassured.RestAssured.given;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import java.io.IOException;
import java.io.InputStream;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import org.jboss.logging.Logger;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.condition.EnabledIfSystemProperty;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import dev.verdictiq.model.PanelVerdict;
import dev.verdictiq.model.SubmissionAccepted;
import dev.verdictiq.model.SubmitVerdictRequest;
import dev.verdictiq.model.VerdictStatus;
import io.quarkus.test.junit.QuarkusTest;
import io.restassured.http.ContentType;
import jakarta.inject.Inject;
@QuarkusTest
@EnabledIfSystemProperty(named = "verdictiq.live", matches = "true")
class VerdictBatteryTest {
private static final Logger LOG = Logger.getLogger(VerdictBatteryTest.class);
@Inject
ObjectMapper objectMapper;
@Test
void runsTheAmbiguousBattery() throws Exception {
List<AmbiguousText> samples = loadSamples();
List<PanelVerdict> results = new ArrayList<>();
for (AmbiguousText sample : samples) {
SubmissionAccepted accepted = given()
.contentType(ContentType.JSON)
.body(new SubmitVerdictRequest(sample.text()))
.when()
.post("/verdict")
.then()
.statusCode(202)
.extract()
.as(SubmissionAccepted.class);
PanelVerdict verdict = waitForVerdict(accepted.id());
results.add(verdict);
assertEquals(VerdictStatus.COMPLETE, verdict.status(),
() -> "Verdict did not complete for: " + sample.text() + " (" + verdict.finalReason() + ")");
assertNotNull(verdict.finalVerdict(), () -> "Final label missing for: " + sample.text());
}
logSummary(samples, results);
}
private List<AmbiguousText> loadSamples() throws IOException {
try (InputStream stream = Thread.currentThread().getContextClassLoader().getResourceAsStream("ambiguous-texts.json")) {
if (stream == null) {
throw new IllegalStateException("ambiguous-texts.json was not found.");
}
return objectMapper.readValue(stream, new TypeReference<List<AmbiguousText>>() {
});
}
}
private PanelVerdict waitForVerdict(String id) throws InterruptedException {
Instant deadline = Instant.now().plus(Duration.ofMinutes(2));
while (Instant.now().isBefore(deadline)) {
PanelVerdict verdict = given()
.when()
.get("/verdict/{id}", id)
.then()
.statusCode(200)
.extract()
.as(PanelVerdict.class);
if (verdict.status() != VerdictStatus.PENDING) {
return verdict;
}
Thread.sleep(250);
}
throw new IllegalStateException("Timed out waiting for verdict " + id);
}
private void logSummary(List<AmbiguousText> samples, List<PanelVerdict> results) {
long disagreements = results.stream().filter(result -> !result.agreement()).count();
long abstentions = results.stream().filter(PanelVerdict::abstained).count();
StringBuilder table = new StringBuilder();
table.append(System.lineSeparator());
table.append(String.format("%-18s %-6s %-11s %-11s %-11s %-10s%n",
"category", "agree", "granite", "mistral", "final", "abstained"));
for (int i = 0; i < samples.size(); i++) {
AmbiguousText sample = samples.get(i);
PanelVerdict verdict = results.get(i);
table.append(String.format("%-18s %-6s %-11s %-11s %-11s %-10s%n",
sample.category(),
verdict.agreement(),
verdict.graniteLabel(),
verdict.mistralLabel(),
verdict.finalVerdict(),
verdict.abstained()));
}
table.append(System.lineSeparator());
table.append("Disagreements: ").append(disagreements).append(System.lineSeparator());
table.append("Abstentions: ").append(abstentions);
LOG.info(table.toString());
}
}This test calls real Ollama models. The stub tests above already check the workflow shape. I only enable it with -Dverdictiq.live=true so CI does not fail when Ollama is not running, a model is missing, or a borderline sentence changes behavior after a model update.
The assertion style is important:
We assert the contract shape and workflow completion
We log disagreements and abstentions instead of pretending that every local model pair will disagree on the same rows forever
We do not assert exact labels for every sample
That last rule keeps the test honest. These are probabilistic systems. If you pin exact wording or exact labels for every borderline sentence, you are not testing robustness. You are testing whether Tuesday behaves exactly like Monday.
Prove it
Start with the deterministic suite:
./mvnw testThat run should stay stable even if you have not pulled the Ollama models yet.
Then run the Ollama integration test:
./mvnw test -Dverdictiq.live=trueOn a laptop, that run takes about five to six minutes. Expect the usual Quarkus test output plus a logged table similar to this:
category agree granite mistral final abstained
sarcasm false NEGATIVE UNCERTAIN NEGATIVE false
sarcasm false NEGATIVE UNCERTAIN NEGATIVE false
double-negative true UNCERTAIN UNCERTAIN UNCERTAIN false
cultural-idiom false POSITIVE UNCERTAIN POSITIVE false
cultural-idiom false POSITIVE UNCERTAIN POSITIVE false
dry-humor true UNCERTAIN UNCERTAIN UNCERTAIN false
mixed-signal false UNCERTAIN NEGATIVE NEGATIVE false
understatement false NEGATIVE UNCERTAIN NEGATIVE false
clearly-positive false POSITIVE UNCERTAIN POSITIVE false
clearly-negative false NEGATIVE UNCERTAIN NEGATIVE false
Disagreements: 8
Abstentions: 0The exact rows will vary. Look for the pattern:
clear sentences usually converge
ambiguous sentences disagree more often
some of them deserve
UNCERTAIN
If you get zero disagreements across these sentences, the input set is too easy, the two panel models are too similar, or both. That result is still useful. It tells you the panel is not buying you much yet.
Make it survive
The happy path works now. Many AI tutorials stop here. Production trouble usually starts here too.
Model latency is your first queue
The panel looks cheap until you count the calls. Two panel calls plus one judge call turn one request into a small latency budget very quickly. It also raises API cost.
The POST endpoint avoids holding the client open, and the virtual-thread executor keeps blocking calls off the request path. It does not make model latency disappear. If you expect real throughput, add explicit concurrency limits, request budgets, and maybe a durable queue before you decide this belongs on a public path.
UNCERTAIN is not failure
UNCERTAIN means the judge looked at the text and refused fake confidence. FAILED means your system broke. Those two states are operationally different, and your API should say so directly.
If you later add alerts or dashboards, count them separately. A spike in UNCERTAIN may mean your input set changed. A spike in FAILED means your service is sick.
Keep disagreement in-process until you actually need a broker
Signals are a good fit here because the whole workflow belongs to one process. We want type-safe async coordination, not distributed choreography.
If the next step is “open a human review task in another service” or “fan this event into analytics and audit storage,” then moving the disagreement event to Reactive Messaging may be the right call. Right now it would be extra machinery with no extra truth.
In-memory verdicts need expiry
VerdictStore is a ConcurrentHashMap, and it keeps every verdict forever. That works for a tutorial. After a few days of real traffic, it becomes a memory problem.
If you keep the polling contract, add one of these next:
a TTL cleanup job
a max-size policy
a persistent store with explicit retention
Otherwise the nice simple polling API becomes a slow memory leak with better naming.
Pre-pull and warm the models
This sample keeps the Ollama base URL explicit and disables Dev Services, so the runtime story stays simple: start Ollama locally, pull the models once, and use the same endpoint everywhere.
For anything beyond the first local run, pre-pull the models and hit the endpoint once before demos or tests. Cold local inference is still cold inference.
Close the loop
We built a Quarkus service that asks twice, disagrees on purpose, and uses Signals to make disagreement visible. The mental shift is simple: a wrong local-model label gets much worse when the system presents it as final.



