Build Your First Real Java RAG Pipeline with Quarkus and Docling
Learn how documents become embeddings, how pgvector retrieval works, and why readiness checks matter when your assistant should answer from real files.
I do not like RAG demos that start with clean Markdown. That is usually where the hard part was quietly deleted.
Open a real enterprise PDF and the problem is obvious: tables, headings, footnotes, and multi-column layout all carry meaning. Plain text extraction treats too much of that as decoration. Strip the structure and retrieval feeds the model fragments without enough context. The answer may still sound confident, which is exactly the annoying part.
Docling keeps structure as Markdown-friendly output. Quarkus wires Docling, Postgres with pgvector, and LangChain4j so we stay in ordinary Java and configuration. Agents only stay useful when the knowledge they pull is current and faithful to the source. Here we build one local pipeline: Docling conversion, sentence chunking, embeddings in pgvector, Ollama for chat and embeddings, and guardrails around inputs and outputs.
The system we build here is small enough to run locally but shaped like something you can extend: layout-aware conversion with Docling, pgvector retrieval, local Ollama chat and embedding models, readiness around background indexing, and guardrails around the assistant.
This is an update to the original tutorial and tweaks a couple of things, making sure it aligns with API changes.
Prerequisites
You should be comfortable with Java, REST, and running containers locally (Podman or Docker). The steps use the Quarkus CLI, Maven, PostgreSQL via Dev Services, and Ollama on the host.
Java 21+
Maven 3.9+ and Quarkus CLI (optional but used below)
Podman or Docker (for Dev Services: PostgreSQL, Docling)
Ollama installed locally with pull access for the chat and embedding models you configure
Project Setup
This article uses Quarkus 3.34.3 and Java 21. Create the project:
quarkus create app com.ibm:enterprise-rag \
--package-name=com.ibm \
--extensions=rest-jackson,jdbc-postgresql,quarkus-langchain4j-ollama,quarkus-langchain4j-pgvector,quarkus-docling,quarkus-smallrye-health
cd enterprise-ragExtensions:
rest-jackson: REST endpoints with JSON via Jacksonjdbc-postgresql: JDBC driver and datasource integration for PostgreSQLquarkus-langchain4j-ollama: Chat and embedding models through Ollamaquarkus-langchain4j-pgvector: Embedding store backed by PostgreSQL pgvectorquarkus-docling(io.quarkiverse.docling:quarkus-docling:1.3.0): Docling REST client and Dev Services for the Docling container. This is a Quarkiverse extension, so we pin it separatelyquarkus-smallrye-health: Readiness and liveness endpoints used to hold traffic until ingestion completes
Embeddings, Vector Size, and pgvector
An embedding is a fixed-length array of numbers produced by an embedding model. Similar text tends to land near other similar text in that space. That lets you retrieve chunks with nearest-neighbor search in Postgres through pgvector, not only keyword search.
Dimension is the length of that array. The model fixes it: a given tag always emits the same width. Your database column and quarkus.langchain4j.pgvector.dimension must match that width. If they diverge, the app can fail at startup or when it writes vectors.
At ingest time and at query time you must use the same embedding model so vectors are comparable. If you change the model or its output size, drop or recreate the embedding table and re-ingest.
For Ollama, run ollama show <model> and read embedding length. The default library tag granite-embedding:latest is a compact Granite English model, roughly tens of millions of parameters, with 384 dimensions on typical installs. That is enough for a responsive local loop on a laptop. Larger Granite variants, for example multilingual 278M-class models, often use 768 dimensions and more compute. Use them when you need the extra capacity, and change the pgvector dimension with them.
Chunking uses DocumentBySentenceSplitter with a 200-token target length and 20-token overlap. That is a readable default for sales PDFs. Sentences stay mostly intact, overlap reduces the chance that a boundary cuts a fact in half, and the segment count stays manageable on a laptop. Smaller chunks improve precision for short facts. Longer chunks keep more context but can make retrieval noisier. Adjust this after you inspect real retrieval logs for your corpus.
Quarkus can accept HTTP requests as soon as the core stack is up. Indexing used to block that moment because conversion and embedding ran in a @PostConstruct hook. The flow below separates application ready (the socket is listening) from RAG ready (vectors exist in pgvector). Readiness stays DOWN until the pipeline logs completion. If you bypass health checks and call /bot early, retrieval may still be empty. Which is a very polite way of saying: the bot can answer before it knows anything useful.
Implementation
I split the implementation into four small lanes: startup and readiness, ingestion, retrieval, and the /bot API. The code is longer than the idea, mostly because guardrails and background work need explicit boundaries. That is fine. Invisible magic is rarely where production systems become easier.
IngestionStarter
src/main/java/com/ibm/ingest/IngestionStarter.java keeps startup short. It schedules ingestion after CDI startup, then lets Quarkus open HTTP while Docling and embedding work continue in the background.
package com.ibm.ingest;
import io.quarkus.logging.Log;
import io.quarkus.runtime.StartupEvent;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;
/**
* Kicks off background ingestion after CDI startup so Quarkus can open HTTP without waiting for Docling
* conversion and embedding to finish.
*/
@ApplicationScoped
public class IngestionStarter {
@Inject
DocumentLoader documentLoader;
void onStart(@Observes StartupEvent ignored) {
documentLoader.startAsyncIngestion();
Log.info("Background document ingestion scheduled (readiness will turn UP when indexing completes).");
}
}IndexingState
src/main/java/com/ibm/ingest/IndexingState.java is the small shared flag between ingestion and readiness. The process can be alive before this flag turns true. Traffic should wait until readiness says the index exists.
package com.ibm.ingest;
import java.util.concurrent.atomic.AtomicBoolean;
import jakarta.enterprise.context.ApplicationScoped;
/**
* Tracks whether the initial embedding ingestion has finished. Used for readiness so HTTP traffic
* can wait until pgvector is populated (when health checks are enabled).
*/
@ApplicationScoped
public class IndexingState {
private final AtomicBoolean indexReady = new AtomicBoolean(false);
public boolean isIndexReady() {
return indexReady.get();
}
public void setIndexReady(boolean ready) {
indexReady.set(ready);
}
}DoclingConverter
src/main/java/com/ibm/ingest/DoclingConverter.java hides the Docling Serve task flow behind one method. We submit the file, poll until Docling finishes, and fetch Markdown from the completed task. Each REST call passes ApiMetadata built from quarkus.docling.api-key so the X-Api-Key header matches what Docling Serve expects (Dev Services can inject this; a standalone Docling on localhost with auth enabled needs the same value you configured on the server). I keep this separate from the loader because Docling has enough API shape of its own.
package com.ibm.ingest;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.Base64;
import java.util.Objects;
import ai.docling.serve.api.convert.request.ConvertDocumentRequest;
import ai.docling.serve.api.convert.request.options.ConvertDocumentOptions;
import ai.docling.serve.api.convert.request.options.OutputFormat;
import ai.docling.serve.api.convert.request.source.FileSource;
import ai.docling.serve.api.convert.request.target.InBodyTarget;
import ai.docling.serve.api.convert.response.InBodyConvertDocumentResponse;
import ai.docling.serve.api.task.response.TaskStatus;
import ai.docling.serve.api.task.response.TaskStatusPollResponse;
import io.quarkiverse.docling.runtime.client.ApiMetadata;
import io.quarkiverse.docling.runtime.client.QuarkusDoclingServeClient;
import io.quarkiverse.docling.runtime.config.DoclingRuntimeConfig;
import io.smallrye.mutiny.Uni;
import io.smallrye.mutiny.infrastructure.Infrastructure;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.ws.rs.ProcessingException;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.core.Response.Status.Family;
/**
* Converts files to Markdown via Docling Serve using the Quarkus client's async
* task API
* ({@link QuarkusDoclingServeClient#submitConvertSourceAsync}) and polling
* until completion.
*/
@ApplicationScoped
public class DoclingConverter {
private final QuarkusDoclingServeClient doclingClient;
private final ApiMetadata apiMetadata;
@Inject
public DoclingConverter(QuarkusDoclingServeClient doclingClient, DoclingRuntimeConfig doclingConfig) {
this.doclingClient = doclingClient;
ApiMetadata.Builder metadata = ApiMetadata.builder();
doclingConfig.apiKey().ifPresent(metadata::apiKey);
this.apiMetadata = metadata.build();
}
/**
* Converts a file to Markdown asynchronously (Mutiny). Subscription runs on the
* default worker pool
* so polling and JAX-RS client calls do not block the event loop. Read errors
* become a failed {@link Uni}
* so callers can use this from lambdas without handling checked exceptions.
*/
public Uni<String> convertToMarkdownUni(Path filePath) {
final byte[] bytes;
try {
bytes = Files.readAllBytes(filePath);
} catch (IOException e) {
return Uni.createFrom().failure(e);
}
String base64 = Base64.getEncoder().encodeToString(bytes);
String filename = filePath.getFileName().toString();
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.source(FileSource.builder()
.base64String(base64)
.filename(filename)
.build())
.options(ConvertDocumentOptions.builder()
.toFormat(OutputFormat.MARKDOWN)
.build())
.target(InBodyTarget.builder().build())
.build();
return doclingClient.submitConvertSourceAsync(request, apiMetadata)
.runSubscriptionOn(Infrastructure.getDefaultWorkerPool())
.chain(this::pollUntilSuccess)
.chain(this::fetchMarkdownFromTask);
}
private Uni<TaskStatusPollResponse> pollUntilSuccess(TaskStatusPollResponse status) {
TaskStatus t = status.getTaskStatus();
if (t == TaskStatus.SUCCESS) {
return Uni.createFrom().item(status);
}
if (t == TaskStatus.FAILURE) {
return Uni.createFrom().failure(new IllegalStateException(
"Docling conversion task failed for taskId=" + status.getTaskId()));
}
String taskId = status.getTaskId();
return Uni.createFrom().nullItem()
.onItem().delayIt().by(Duration.ofMillis(200))
.chain(ignored -> Uni.createFrom().item(() -> doclingClient.pollTaskStatus(taskId, 500L, apiMetadata))
.runSubscriptionOn(Infrastructure.getDefaultWorkerPool())
.chain(this::pollUntilSuccess));
}
private Uni<String> fetchMarkdownFromTask(TaskStatusPollResponse completed) {
String taskId = completed.getTaskId();
return Uni.createFrom().item(() -> {
Response response = doclingClient.convertTaskResult(taskId, apiMetadata);
if (response.getStatusInfo().getFamily() != Family.SUCCESSFUL) {
throw new ProcessingException(
"convertTaskResult failed: HTTP " + response.getStatus() + " for taskId=" + taskId);
}
InBodyConvertDocumentResponse inBody = response.readEntity(InBodyConvertDocumentResponse.class);
var document = Objects.requireNonNull(inBody.getDocument(),
"Document conversion returned null document for taskId=" + taskId);
return document.getMarkdownContent();
}).runSubscriptionOn(Infrastructure.getDefaultWorkerPool());
}
}DocumentLoader
src/main/java/com/ibm/ingest/DocumentLoader.java is the actual ingestion pipeline. It finds supported files, converts them to Markdown, splits the text into sentence-sized chunks, embeds each segment, and writes those vectors to pgvector.
Notice the failure behavior: this demo sets readiness UP even when ingestion fails, so local development does not get stuck forever. In production I would be more suspicious. If the assistant needs the knowledge base to be useful, keeping readiness DOWN can be the more honest failure mode.
package com.ibm.ingest;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.document.splitter.DocumentBySentenceSplitter;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import io.quarkus.logging.Log;
import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.Uni;
import io.smallrye.mutiny.infrastructure.Infrastructure;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
/**
* Loads documents from {@code documents/}, converts them with Docling (async task API), splits, and
* stores embeddings. Runs in the background after startup; {@link IndexingState} and readiness reflect
* completion.
*/
@ApplicationScoped
public class DocumentLoader {
private static final List<String> ALLOWED_EXTENSIONS = Arrays.asList("txt", "pdf", "pptx", "ppt", "doc", "docx",
"xlsx", "xls", "csv", "json", "xml", "html");
@Inject
EmbeddingStore<TextSegment> store;
@Inject
EmbeddingModel embeddingModel;
@Inject
DoclingConverter doclingConverter;
@Inject
IndexingState indexingState;
public void startAsyncIngestion() {
indexingState.setIndexReady(false);
Log.info("Starting document loading (background)...");
listEligiblePathsUni()
.chain(paths -> {
if (paths.isEmpty()) {
Log.warn("No documents to process. Skipping embedding generation.");
return Uni.createFrom().voidItem();
}
return Multi.createFrom().iterable(paths)
.onItem().transformToUniAndConcatenate(path -> doclingConverter.convertToMarkdownUni(path)
.map(markdown -> toDocument(path, markdown)))
.collect().asList()
.chain(this::embedAllDocuments);
})
.subscribe().with(
ignored -> finishIngestionSuccess(),
this::finishIngestionFailure);
}
private void finishIngestionSuccess() {
indexingState.setIndexReady(true);
Log.info("Document ingestion pipeline finished; readiness is UP.");
}
private void finishIngestionFailure(Throwable failure) {
Log.error("Document ingestion pipeline failed; readiness set UP so the app is not stuck DOWN.", failure);
indexingState.setIndexReady(true);
}
private Uni<List<Path>> listEligiblePathsUni() {
return Uni.createFrom().item(() -> {
Path documentsPath = Path.of("src/main/resources/documents");
List<Path> paths = new ArrayList<>();
if (!Files.isDirectory(documentsPath)) {
Log.warnf("Documents directory not found or not a directory: %s", documentsPath);
return paths;
}
int skippedCount = 0;
try (var stream = Files.list(documentsPath)) {
for (Path filePath : stream.filter(Files::isRegularFile).toList()) {
String fileName = filePath.getFileName().toString();
String extension = fileExtension(fileName);
if (extension.isEmpty() || !ALLOWED_EXTENSIONS.contains(extension)) {
skippedCount++;
Log.debugf("Skipping file '%s' - extension '%s' is not in allowed list",
fileName, extension.isEmpty() ? "(no extension)" : extension);
continue;
}
paths.add(filePath);
}
} catch (IOException e) {
Log.errorf(e, "Failed to list documents in %s", documentsPath);
}
Log.infof("Found %d file(s) to process (%d skipped by extension).", paths.size(), skippedCount);
return paths;
}).runSubscriptionOn(Infrastructure.getDefaultWorkerPool());
}
private static String fileExtension(String fileName) {
int lastDotIndex = fileName.lastIndexOf('.');
if (lastDotIndex > 0 && lastDotIndex < fileName.length() - 1) {
return fileName.substring(lastDotIndex + 1).toLowerCase();
}
return "";
}
private static Document toDocument(Path filePath, String markdown) {
String fileName = filePath.getFileName().toString();
String extension = fileExtension(fileName);
Map<String, String> meta = new HashMap<>();
meta.put("file", fileName);
meta.put("format", extension);
return Document.document(markdown, new Metadata(meta));
}
private Uni<Void> embedAllDocuments(List<Document> docs) {
if (docs.isEmpty()) {
Log.warn("No documents were successfully converted. Skipping embedding generation.");
return Uni.createFrom().voidItem();
}
DocumentBySentenceSplitter splitter = new DocumentBySentenceSplitter(200, 20);
List<TextSegment> segments = splitter.splitAll(docs);
if (segments.isEmpty()) {
Log.warn("No text segments generated from documents. Skipping embedding storage.");
return Uni.createFrom().voidItem();
}
Log.infof("Generating embeddings for %d text segments...", segments.size());
return Uni.createFrom().item(() -> {
embedSegmentsBlocking(segments);
return null;
}).runSubscriptionOn(Infrastructure.getDefaultWorkerPool()).replaceWithVoid();
}
private void embedSegmentsBlocking(List<TextSegment> segments) {
int embeddedCount = 0;
int errorCount = 0;
try {
if (!segments.isEmpty()) {
TextSegment testSegment = segments.get(0);
var testEmbedding = embeddingModel.embed(testSegment).content();
store.add(testEmbedding, testSegment);
Log.infof("Store test successful. Proceeding with bulk embedding...");
embeddedCount = 1;
}
} catch (jakarta.enterprise.inject.CreationException e) {
Throwable cause = e.getCause();
if (cause instanceof IllegalArgumentException
&& cause.getMessage() != null
&& cause.getMessage().contains("indexListSize")
&& cause.getMessage().contains("zero")) {
Log.errorf("PgVector dimension configuration error detected during store initialization.");
Log.errorf("The dimension property 'quarkus.langchain4j.pgvector.dimension' is being read as 0.");
throw new RuntimeException(
"PgVector store initialization failed. Check application.properties and database configuration.",
e);
}
throw e;
} catch (IllegalArgumentException e) {
if (e.getMessage() != null && e.getMessage().contains("indexListSize") && e.getMessage().contains("zero")) {
Log.errorf("PgVector dimension configuration error. The dimension is being read as 0.");
throw new RuntimeException(
"PgVector dimension misconfiguration. Dimension must be > 0. Check application.properties.", e);
}
throw e;
} catch (Exception e) {
Log.errorf(e, "Failed to test embedding store. This might indicate a configuration issue.");
throw new RuntimeException(
"Embedding store test failed. Please check your database and pgvector configuration.", e);
}
int startIndex = embeddedCount > 0 ? 1 : 0;
for (int i = startIndex; i < segments.size(); i++) {
TextSegment segment = segments.get(i);
try {
var embedding = embeddingModel.embed(segment).content();
store.add(embedding, segment);
embeddedCount++;
if (embeddedCount % 10 == 0) {
Log.infof("Progress: embedded %d/%d segments", embeddedCount, segments.size());
}
} catch (Exception e) {
errorCount++;
Log.errorf(e, "Failed to embed and store segment: %s",
segment.text().substring(0, Math.min(50, segment.text().length())));
}
}
Log.infof("Successfully embedded and stored %d out of %d segments (errors: %d)", embeddedCount,
segments.size(), errorCount);
}
}IngestionReadinessCheck
src/main/java/com/ibm/health/IngestionReadinessCheck.java turns the indexing flag into a standard SmallRye Health readiness signal. This is the line between “the process is running” and “the RAG system can answer with indexed context.”
package com.ibm.health;
import org.eclipse.microprofile.health.HealthCheck;
import org.eclipse.microprofile.health.HealthCheckResponse;
import org.eclipse.microprofile.health.Readiness;
import com.ibm.ingest.IndexingState;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
/**
* Readiness stays {@code DOWN} until background document ingestion and embedding complete.
*/
@Readiness
@ApplicationScoped
public class IngestionReadinessCheck implements HealthCheck {
@Inject
IndexingState indexingState;
@Override
public HealthCheckResponse call() {
if (indexingState.isIndexReady()) {
return HealthCheckResponse.up("ingestion");
}
return HealthCheckResponse.down("ingestion");
}
}DocumentRetrieverAugmentorSupplier
src/main/java/com/ibm/ai/DocumentRetrieverAugmentorSupplier.java connects the custom retriever to the Quarkus LangChain4j AI service. I like making this explicit. Defaults are nice until you need to debug why the model retrieved absolutely nothing with great confidence.
package com.ibm.ai;
import java.util.function.Supplier;
import com.ibm.retrieval.DocumentRetriever;
import dev.langchain4j.rag.RetrievalAugmentor;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
/**
* Wires the custom {@link RetrievalAugmentor} into the Quarkus LangChain4j AI service.
*/
@ApplicationScoped
public class DocumentRetrieverAugmentorSupplier implements Supplier<RetrievalAugmentor> {
private final DocumentRetriever documentRetriever;
@Inject
public DocumentRetrieverAugmentorSupplier(DocumentRetriever documentRetriever) {
this.documentRetriever = documentRetriever;
}
@Override
public RetrievalAugmentor get() {
return documentRetriever;
}
}SalesEnablementBot
src/main/java/com/ibm/ai/SalesEnablementBot.java defines the assistant contract. The system message sets the CloudX scope, the retrieval augmentor supplies document context, and the guardrails check both sides of the model call.
package com.ibm.ai;
import com.ibm.guardrails.HallucinationGuardrail;
import com.ibm.guardrails.InputValidationGuardrail;
import com.ibm.guardrails.OutOfScopeGuardrail;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.guardrail.InputGuardrails;
import dev.langchain4j.service.guardrail.OutputGuardrails;
import io.quarkiverse.langchain4j.RegisterAiService;
@RegisterAiService(retrievalAugmentor = DocumentRetrieverAugmentorSupplier.class)
public interface SalesEnablementBot {
@SystemMessage("""
# ROLE AND SCOPE
You are a Sales Enablement Copilot for CloudX Enterprise Platform.
## YOUR ALLOWED TOPICS (ONLY THESE):
- CloudX product features, capabilities, and architecture
- CloudX pricing tiers: Starter ($499), Professional ($1,999), Enterprise ($5,999)
- CloudX competitive positioning vs CompeteCloud, SkyPlatform, TechGiant
- CloudX migration strategies and implementation approaches
- CloudX customer success stories and ROI data
- CloudX technical specifications (multi-cloud, Kubernetes, supported languages)
## STRICT BOUNDARIES - YOU MUST REFUSE:
❌ Questions about competitor internal operations or roadmaps
❌ Questions about non-CloudX IBM products (Watson, DB2, WebSphere Traditional, etc.)
❌ Requests for pricing negotiations or custom contract terms
❌ Questions about unreleased CloudX features or internal roadmaps
❌ Legal, financial, tax, or investment advice
❌ Personal advice or non-business topics
❌ General technology tutorials not related to CloudX
If asked about prohibited topics, respond EXACTLY:
"I specialize in CloudX Enterprise Platform sales enablement. This question is outside my scope. For [topic], please consult [appropriate resource]."
# SOLUTION MAPPING LOGIC
When a user describes a client scenario, map to CloudX solutions:
- Legacy technology risk / End-of-Support → CloudX Support & Maintenance Solutions
- Legacy infrastructure operations → CloudX Migration & Modernization Platform
- Need faster modernization → CloudX Accelerated Migration Tools
- Containerization / microservices → CloudX Cloud-Native Platform
- AI-assisted modernization → CloudX AI-Powered Modernization Assistant
# RESPONSE STRUCTURE
For valid CloudX questions, provide:
1. **Recommended Solution**: Name the CloudX product/tier
2. **Rationale**: Why it fits the client's pain point
3. **Business Outcome**: Expected ROI or benefit
4. **Proof Point**: Reference a specific customer case study from your documents
5. **Discovery Question**: Suggest a follow-up question for the sales rep
# ACCURACY REQUIREMENTS
- Only cite information from your provided CloudX sales enablement documents
- Never speculate or make up features, pricing, or capabilities
- If information is not in your documents, state: "I don't have that specific information in my CloudX sales materials."
""")
@OutputGuardrails({ OutOfScopeGuardrail.class, HallucinationGuardrail.class })
@InputGuardrails({ InputValidationGuardrail.class })
String chat(@UserMessage String userQuestion);
}DocumentRetriever
src/main/java/com/ibm/retrieval/DocumentRetriever.java embeds the user question, asks pgvector for nearby segments, and passes those segments back as augmentation content. It also logs snippets while you develop. Keep that visibility early; flying blind with retrieval is not character building, it is just slow debugging.
package com.ibm.retrieval;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.store.embedding.EmbeddingStore;
import io.quarkus.logging.Log;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class DocumentRetriever implements RetrievalAugmentor {
private final RetrievalAugmentor augmentor;
private static final int SNIPPET_LENGTH = 200;
DocumentRetriever(EmbeddingStore<TextSegment> store, EmbeddingModel model) {
EmbeddingStoreContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingModel(model)
.embeddingStore(store)
.maxResults(3)
.build();
augmentor = DefaultRetrievalAugmentor
.builder()
.contentRetriever(contentRetriever)
.build();
}
@Override
public AugmentationResult augment(AugmentationRequest augmentationRequest) {
// Perform the augmentation
AugmentationResult result = augmentor.augment(augmentationRequest);
// Log retrieved content snippets for developer visibility
// This helps developers understand what documents are being retrieved
var contents = result.contents();
Log.infof("DocumentRetriever: Retrieved %d document snippet(s) for augmentation", contents.size());
for (int i = 0; i < contents.size(); i++) {
Content content = contents.get(i);
String text = "";
String sourceInfo = "";
try {
// Content has textSegment() method that returns TextSegment
TextSegment segment = content.textSegment();
if (segment != null) {
text = segment.text();
// Try to extract source file information from metadata
var meta = segment.metadata();
if (meta != null) {
// Try to iterate over metadata entries if available
try {
// Metadata might have a way to get values - try toString for now
String metaString = meta.toString();
if (metaString.contains("file=")) {
// Extract file name from metadata string representation
int fileStart = metaString.indexOf("file=") + 5;
int fileEnd = metaString.indexOf(",", fileStart);
if (fileEnd == -1)
fileEnd = metaString.indexOf("}", fileStart);
if (fileEnd > fileStart) {
sourceInfo = " (from: " + metaString.substring(fileStart, fileEnd) + ")";
}
}
} catch (Exception e) {
// If metadata access fails, continue without source info
Log.debugf("Could not extract metadata: %s", e.getMessage());
}
}
}
} catch (Exception e) {
Log.debugf("Could not extract text from content: %s", e.getMessage());
}
// Create a snippet (first SNIPPET_LENGTH chars) for developer visibility
if (!text.isEmpty()) {
String snippet = text.length() > SNIPPET_LENGTH
? text.substring(0, SNIPPET_LENGTH) + "..."
: text;
// Replace newlines with spaces for cleaner log output
snippet = snippet.replace('\n', ' ').replace('\r', ' ');
Log.infof(" [%d] %s%s", i + 1, snippet, sourceInfo);
} else {
Log.infof(" [%d] (content unavailable)%s", i + 1, sourceInfo);
}
}
return result;
}
}HallucinationGuardrail
src/main/java/com/ibm/guardrails/HallucinationGuardrail.java checks the answer after the model produces it. It looks for uncertainty, generic content, contradictions, and known CloudX fact mistakes, then reprompts when the answer breaks the sales enablement contract.
This is still pattern matching. It catches obvious failures and makes the demo behavior visible. It is not a safety program with a trench coat.
package com.ibm.guardrails;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.guardrail.OutputGuardrail;
import dev.langchain4j.guardrail.OutputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
import io.quarkus.logging.Log;
/**
* HallucinationGuardrail detects when the LLM generates responses that:
* - Admit lack of knowledge
* - Are too vague or generic
* - Contain contradictory information
* - Make up facts not present in the CloudX sales enablement materials
* - Provide overly confident answers without proper context
*/
@ApplicationScoped
public class HallucinationGuardrail implements OutputGuardrail {
// Phrases indicating the model doesn't have information
private static final String[] UNCERTAINTY_PHRASES = {
"i don't have that information",
"i don't know",
"i'm not sure",
"i cannot find",
"i don't have access to",
"i'm unable to provide",
"i don't have specific information",
"i cannot confirm",
"i'm not aware of",
"i don't have details about"
};
// Phrases indicating potential hallucination or making up information
private static final String[] HALLUCINATION_INDICATORS = {
"as far as i know",
"i believe",
"i think",
"probably",
"it seems like",
"it appears that",
"i assume",
"i would guess",
"most likely",
"presumably"
};
// Contradictory phrases that might indicate confusion
private static final String[] CONTRADICTION_INDICATORS = {
"however, on the other hand",
"but actually",
"or maybe",
"alternatively, it could be",
"i'm not certain, but"
};
// CloudX-specific facts that should be accurate
private static final String[][] CLOUDX_FACTS = {
// Format: {incorrect_value, correct_value, context}
{ "99.9% uptime", "99.99%", "enterprise tier" },
{ "$599", "$499", "starter tier monthly" },
{ "$2,999", "$1,999", "professional tier monthly" },
{ "aws only", "aws, azure, and google cloud", "multi-cloud support" },
{ "competecloud is cheaper", "cloudx is 8% lower for enterprise", "enterprise pricing" }
};
@Override
public OutputGuardrailResult validate(AiMessage responseFromLLM) {
Log.info("HallucinationGuardrail: Validating LLM response");
String content = responseFromLLM.text();
String contentLower = content.toLowerCase();
Log.debug("HallucinationGuardrail: Response content length: " + content.length() + " characters");
// 1. Check for uncertainty phrases (model admitting it doesn't know)
String uncertaintyPhrase = detectUncertaintyPhrase(contentLower);
if (uncertaintyPhrase != null) {
Log.warn("HallucinationGuardrail: Detected uncertainty phrase: '" + uncertaintyPhrase + "'");
return reprompt(
"The response contains uncertainty phrases. ",
"Please provide a confident answer based strictly on the CloudX sales enablement materials. " +
"If the information is not available in the provided documents, clearly state that the information is not in the available materials rather than expressing uncertainty.");
}
// 2. Check for hallucination indicators (hedging language suggesting
// uncertainty)
String hallucinationIndicator = detectHallucinationIndicator(contentLower);
if (hallucinationIndicator != null) {
Log.warn("HallucinationGuardrail: Detected hallucination indicator: '" + hallucinationIndicator + "'");
return reprompt(
"The response contains hedging language that suggests uncertainty. ",
"Please provide a confident, fact-based answer using only information from the CloudX sales enablement materials. "
+
"If the information is not in the documents, clearly state that the information is not available rather than speculating or using uncertain language.");
}
// 3. Check for contradictory statements
String contradictionIndicator = detectContradictionIndicator(contentLower);
if (contradictionIndicator != null) {
Log.warn("HallucinationGuardrail: Detected contradiction indicator: '" + contradictionIndicator + "'");
return reprompt(
"The response contains contradictory or conflicting statements. ",
"Please provide a clear, consistent answer based on the CloudX sales enablement materials. "
+
"Ensure all information is coherent and does not present conflicting details.");
}
// 4. Check for too short/lazy answers
if (content.trim().length() < 20) {
Log.warn("HallucinationGuardrail: Response too short - " + content.trim().length() + " characters");
return reprompt(
"The response is too brief and lacks sufficient detail. ",
"Please provide a comprehensive response with specific details, examples, and concrete information from the CloudX sales enablement materials.");
}
// 5. Check for overly generic responses
if (isOverlyGeneric(contentLower)) {
Log.warn("HallucinationGuardrail: Response is overly generic - lacks CloudX-specific details");
return reprompt(
"The response is too generic and lacks specific CloudX details. ",
"Please provide concrete information about CloudX features, pricing, capabilities, competitive advantages, "
+
"or specific use cases from the sales enablement materials. Include specific product names, pricing tiers, percentages, or technical details where relevant.");
}
// 6. Check for potential factual errors about CloudX
String factualError = detectFactualError(contentLower);
if (factualError != null) {
Log.warn("HallucinationGuardrail: Detected potential factual error: " + factualError);
return reprompt(
"The response may contain a factual error: " + factualError + ". ",
"Please carefully verify all information against the CloudX sales enablement materials and provide accurate, verified details. "
+
"Only include information that is explicitly stated in the provided documents.");
}
// 7. Check for excessive hedging (multiple uncertainty markers)
int hedgingCount = countHedgingPhrases(contentLower);
if (hedgingCount >= 3) {
Log.warn("HallucinationGuardrail: Excessive hedging detected - " + hedgingCount + " hedging phrases found");
return reprompt(
"The response contains excessive hedging language that suggests uncertainty. ",
"Please provide a confident, fact-based answer using information directly from the CloudX sales enablement materials. "
+
"Avoid hedging phrases and present information with confidence when it is supported by the documents.");
}
// All checks passed
Log.info("HallucinationGuardrail: Response validated successfully - no hallucination indicators detected");
return success();
}
private String detectUncertaintyPhrase(String content) {
for (String phrase : UNCERTAINTY_PHRASES) {
if (content.contains(phrase)) {
return phrase;
}
}
return null;
}
private String detectHallucinationIndicator(String content) {
for (String indicator : HALLUCINATION_INDICATORS) {
if (content.contains(indicator)) {
return indicator;
}
}
return null;
}
private String detectContradictionIndicator(String content) {
for (String indicator : CONTRADICTION_INDICATORS) {
if (content.contains(indicator)) {
return indicator;
}
}
return null;
}
private boolean isOverlyGeneric(String content) {
// Check if response lacks specific CloudX details
String[] specificKeywords = {
"cloudx", "starter tier", "professional tier", "enterprise tier",
"$499", "$1,999", "$5,999", "99.99%", "multi-cloud",
"competecloud", "skyplatform", "techgiant",
"kubernetes", "aws", "azure", "google cloud"
};
int specificCount = 0;
for (String keyword : specificKeywords) {
if (content.contains(keyword)) {
specificCount++;
}
}
// If response is longer than 100 chars but has no specific CloudX details, it's
// too generic
return content.length() > 100 && specificCount == 0;
}
private String detectFactualError(String content) {
// Check for common factual errors about CloudX
for (String[] fact : CLOUDX_FACTS) {
String incorrectValue = fact[0];
String correctValue = fact[1];
String context = fact[2];
if (content.contains(incorrectValue)) {
return "Found '" + incorrectValue + "' but the correct value is '" + correctValue + "' for " + context;
}
}
return null;
}
private int countHedgingPhrases(String content) {
int count = 0;
String[] hedgingPhrases = {
"might", "maybe", "perhaps", "possibly", "could be",
"may be", "seems", "appears", "likely", "probably"
};
for (String phrase : hedgingPhrases) {
if (content.contains(phrase)) {
count++;
}
}
return count;
}
}OutOfScopeGuardrail
src/main/java/com/ibm/guardrails/OutOfScopeGuardrail.java keeps the final answer inside the CloudX sales enablement domain. This matters because a retrieved chunk and a helpful model can still drift into competitor internals, unrelated IBM products, personal advice, or pricing negotiation.
package com.ibm.guardrails;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.guardrail.OutputGuardrail;
import dev.langchain4j.guardrail.OutputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
import io.quarkus.logging.Log;
/**
* OutOfScopeGuardrail ensures the AI assistant stays within the boundaries of
* CloudX sales enablement content and doesn't provide information outside its
* domain.
*
* Based on the sales enablement resources, the scope includes:
* - CloudX Enterprise Platform features, pricing, and capabilities
* - Competitive analysis and positioning (based on public information)
* - Sales methodology and processes
* - Customer success stories and ROI information
* - Technical architecture and supported technologies
* - Migration strategies and implementation approaches
*
* Out of scope includes:
* - Competitor internal operations or confidential information
* - Non-CloudX IBM products or third-party services (unless in context of
* integration/comparison)
* - Legal, financial, tax, or investment advice
* - Personal or non-business advice
* - Confidential customer information or unreleased features
* - Custom pricing negotiations (should be referred to sales team)
* - General technology tutorials unrelated to CloudX
*/
@ApplicationScoped
public class OutOfScopeGuardrail implements OutputGuardrail {
// Keywords indicating competitor-specific internal information (out of scope)
private static final String[] COMPETITOR_INTERNAL_KEYWORDS = {
"competecloud's internal", "competecloud roadmap", "competecloud strategy",
"skyplatform's internal", "skyplatform roadmap", "skyplatform strategy",
"techgiant's internal", "techgiant roadmap", "techgiant strategy",
"competitor's source code", "competitor's architecture"
};
// Keywords indicating non-CloudX products (out of scope)
private static final String[] NON_CLOUDX_PRODUCTS = {
"watson", "db2", "websphere traditional", "maximo", "cognos",
"spss", "qradar", "guardium", "appscan", "rational",
"aws lambda", "azure functions", "google cloud run",
"heroku", "digitalocean", "linode"
};
// Keywords indicating requests for confidential/inappropriate information
private static final String[] CONFIDENTIAL_KEYWORDS = {
"confidential customer", "internal only", "proprietary information",
"trade secret", "non-disclosure", "customer's private",
"competitor's financials", "unreleased feature", "beta feature"
};
// Keywords indicating legal/financial advice requests (out of scope)
private static final String[] ADVICE_KEYWORDS = {
"legal advice", "tax advice", "investment advice", "financial planning",
"should i invest", "legal opinion", "tax implications",
"securities advice", "compliance advice", "audit advice"
};
// Keywords indicating personal/non-business requests (out of scope)
private static final String[] PERSONAL_KEYWORDS = {
"personal recommendation", "what should i do with my career",
"help me with my resume", "dating advice", "health advice",
"medical advice", "therapy", "counseling"
};
// Keywords indicating requests for custom pricing/negotiations (should be
// referred)
private static final String[] NEGOTIATION_KEYWORDS = {
"negotiate my contract", "get me a better deal", "discount my price",
"override the pricing", "special pricing for me", "custom contract terms"
};
@Override
public OutputGuardrailResult validate(AiMessage responseFromLLM) {
Log.info("OutOfScopeGuardrail: Validating LLM response");
String content = responseFromLLM.text().toLowerCase();
Log.debug("OutOfScopeGuardrail: Response content length: " + content.length() + " characters");
// Check for various out-of-scope categories
String detectedIssue = detectOutOfScopeContent(content);
if (detectedIssue != null) {
Log.warn("OutOfScopeGuardrail: Detected out-of-scope content - Issue type: " + detectedIssue);
return buildOutOfScopeResponse(detectedIssue);
}
// Response is in scope
Log.info("OutOfScopeGuardrail: Response validated successfully - content is in scope");
return success();
}
/**
* Detects if the response contains out-of-scope content.
* Returns a description of the issue if found, null otherwise.
*/
private String detectOutOfScopeContent(String content) {
// Priority order: Check most critical violations first
// 1. Check for confidential information (highest priority)
for (String keyword : CONFIDENTIAL_KEYWORDS) {
if (content.contains(keyword)) {
return "confidential";
}
}
// 2. Check for legal/financial advice
for (String keyword : ADVICE_KEYWORDS) {
if (content.contains(keyword)) {
return "advice";
}
}
// 3. Check for personal requests
for (String keyword : PERSONAL_KEYWORDS) {
if (content.contains(keyword)) {
return "personal";
}
}
// 4. Check for competitor internal information
for (String keyword : COMPETITOR_INTERNAL_KEYWORDS) {
if (content.contains(keyword)) {
return "competitor_internal";
}
}
// 5. Check for non-CloudX products (only if not in CloudX context)
for (String product : NON_CLOUDX_PRODUCTS) {
if (content.contains(product) && !isCloudXContext(content)) {
return "non_cloudx_product";
}
}
// 6. Check for pricing negotiation requests
for (String keyword : NEGOTIATION_KEYWORDS) {
if (content.contains(keyword)) {
return "negotiation";
}
}
// 7. Check if response is about general technology not related to CloudX
if (isGeneralTechnologyQuestion(content)) {
return "general_technology";
}
return null;
}
/**
* Checks if the content is discussing a product in the context of CloudX
* (e.g., integration, comparison, migration from)
*/
private boolean isCloudXContext(String content) {
String[] cloudxContextKeywords = {
"cloudx", "integrate with", "migrate from", "compared to",
"alternative to", "replace", "modernize from"
};
for (String keyword : cloudxContextKeywords) {
if (content.contains(keyword)) {
return true;
}
}
return false;
}
/**
* Checks if the response is about general technology topics not related to
* CloudX
*/
private boolean isGeneralTechnologyQuestion(String content) {
// Check if discussing technology without CloudX context
String[] techKeywords = {
"how to program", "learn programming", "tutorial for",
"what is blockchain", "what is ai", "what is machine learning",
"how does the internet work", "what is a database"
};
boolean hasTechKeyword = false;
for (String keyword : techKeywords) {
if (content.contains(keyword)) {
hasTechKeyword = true;
break;
}
}
// If has tech keyword but no CloudX context, it's out of scope
return hasTechKeyword && !isCloudXContext(content);
}
/**
* Builds an appropriate out-of-scope response based on the detected issue.
* Uses reprompt() to guide the LLM to provide a better, in-scope response.
*/
private OutputGuardrailResult buildOutOfScopeResponse(String issueType) {
Log.info("OutOfScopeGuardrail: Building reprompt response for issue type: " + issueType);
String userMessage;
String repromptMessage;
switch (issueType) {
case "confidential":
userMessage = "The response contains references to confidential or proprietary information. ";
repromptMessage = "Please provide a response that only uses publicly available information from the CloudX sales enablement materials. "
+
"Focus on CloudX features, pricing, competitive positioning, and sales methodology without revealing confidential details.";
break;
case "advice":
userMessage = "The response appears to provide legal, financial, or investment advice. ";
repromptMessage = "Please reframe the response to focus on CloudX's business value, ROI calculations, and pricing structure "
+
"without providing specific legal or financial advice. Suggest consulting appropriate advisors for such matters.";
break;
case "personal":
userMessage = "The response addresses personal or non-business matters.";
repromptMessage = "Please provide a response focused on CloudX sales enablement topics such as product features, "
+
"pricing, competitive analysis, sales methodology, or customer success stories.";
break;
case "competitor_internal":
userMessage = "The response discusses competitors' internal strategies or confidential information.";
repromptMessage = "Please limit the response to publicly available competitive comparisons based on the CloudX sales enablement materials. "
+
"Focus on how CloudX compares to competitors using public information and customer feedback.";
break;
case "non_cloudx_product":
userMessage = "The response discusses products or services outside of CloudX Enterprise Platform. ";
repromptMessage = "Please focus the response on CloudX-specific features, capabilities, and use cases. "
+
"If mentioning other products, only do so in the context of CloudX integration, migration, or comparison.";
break;
case "negotiation":
userMessage = "The response attempts to negotiate specific pricing or contract terms. ";
repromptMessage = "Please provide information about standard CloudX pricing tiers, discount guidelines, and the general pricing framework. "
+
"Indicate that specific negotiations should be handled by the sales manager and deal desk team.";
break;
case "general_technology":
userMessage = "The response discusses general technology topics not related to CloudX. ";
repromptMessage = "Please refocus the response on CloudX Enterprise Platform and its applications. " +
"Connect the technology discussion to CloudX use cases, deployment scenarios, or architecture if relevant.";
break;
default:
userMessage = "The response appears to be outside the scope of CloudX sales enablement. ";
repromptMessage = "Please provide a response focused on CloudX Enterprise Platform features, pricing, competitive analysis, "
+
"sales methodology, or customer success stories based on the available sales enablement materials.";
}
// Use reprompt() with both user message and system reprompt instruction
Log.debug("OutOfScopeGuardrail: Reprompting with user message: " + userMessage);
return reprompt(userMessage, repromptMessage);
}
}InputValidationGuardrail
src/main/java/com/ibm/guardrails/InputValidationGuardrail.java runs before the model call. It blocks prompt injection patterns, unrelated personal-service requests, malicious strings, and CloudX-adjacent topics that would turn this assistant into a general-purpose chatbot. That is not the job here.
package com.ibm.guardrails;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.guardrail.InputGuardrail;
import dev.langchain4j.guardrail.InputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
import io.quarkus.logging.Log;
/**
* InputValidationGuardrail validates user input before it reaches the LLM.
* It detects and blocks:
* 1. Prompt injection attempts
* 2. Off-topic questions outside CloudX sales enablement scope
* 3. Malicious or inappropriate content
*
* Based on CloudX sales enablement materials, valid topics include:
* - CloudX Enterprise Platform features and capabilities
* - Pricing and packaging information
* - Competitive analysis and positioning
* - Sales methodology and processes
* - Customer success stories and ROI
* - Technical architecture (multi-cloud, Kubernetes, supported languages)
* - Migration and implementation strategies
*/
@ApplicationScoped
public class InputValidationGuardrail implements InputGuardrail {
// Prompt injection patterns
private static final String[] PROMPT_INJECTION_PATTERNS = {
"ignore previous instructions",
"ignore all previous",
"disregard previous",
"forget previous instructions",
"new instructions:",
"system:",
"you are now",
"act as",
"pretend you are",
"roleplay as",
"simulate being",
"override your",
"bypass your",
"ignore your guidelines",
"forget your role",
"new role:",
"system prompt:",
"assistant:",
"###instruction:",
"###system:",
"[system]",
"<system>",
"sudo mode",
"developer mode",
"jailbreak",
"dan mode"
};
// Off-topic technology combinations (not supported by CloudX)
private static final String[][] OFF_TOPIC_COMBINATIONS = {
// Format: {technology, unsupported_context, boundary_message}
{"python", "google cloud", "CloudX supports Python on AWS, Azure, and Google Cloud. However, I specialize in CloudX sales enablement. For deployment questions, please refer to CloudX technical documentation."},
{"node.js", "heroku", "CloudX supports Node.js but not Heroku deployment. CloudX works with AWS, Azure, and Google Cloud."},
{".net", "digitalocean", "CloudX supports .NET but not DigitalOcean. CloudX is designed for AWS, Azure, and Google Cloud."},
{"ruby", "linode", "CloudX supports Ruby but not Linode. CloudX operates on AWS, Azure, and Google Cloud."}
};
// Topics completely outside CloudX scope
private static final String[] COMPLETELY_OFF_TOPIC = {
// Food & Dining
"recipe", "cooking", "food", "restaurant", "meal", "dinner", "lunch",
// Entertainment
"movie", "film", "entertainment", "music", "song", "concert", "show",
// Sports
"sports", "football", "basketball", "soccer", "baseball", "tennis",
// Weather & Nature
"weather", "climate", "temperature", "forecast",
// Health & Medical
"health", "medical", "doctor", "medicine", "hospital", "disease",
// Personal Life
"dating", "relationship", "romance", "wedding", "marriage",
// Politics & Government
"politics", "election", "government", "president", "senator",
// Finance (non-business)
"cryptocurrency", "bitcoin", "blockchain", "stock market", "forex",
// Gaming
"gaming", "video game", "playstation", "xbox", "nintendo",
// Travel & Booking
"flight", "hotel", "vacation", "travel", "booking", "reservation",
"airline", "airport", "cruise", "trip", "tourism",
// Shopping (non-software)
"shopping", "buy clothes", "fashion", "shoes", "jewelry",
// Education (non-tech)
"homework", "essay", "school assignment", "college application",
// Real Estate
"house", "apartment", "real estate", "mortgage", "rent",
// Automotive
"car", "vehicle", "automobile", "driving", "traffic"
};
// Action verbs for non-CloudX services
private static final String[] OFF_TOPIC_ACTIONS = {
"book me", "book a", "reserve a", "schedule a",
"order me", "buy me", "purchase a",
"find me a", "get me a",
"recommend a restaurant", "recommend a hotel",
"plan my trip", "plan my vacation"
};
// Non-CloudX products (unless in comparison/migration context)
private static final String[] NON_CLOUDX_PRODUCTS = {
"watson", "db2", "websphere traditional", "maximo",
"cognos", "spss", "qradar", "guardium",
"heroku", "digitalocean", "linode", "netlify",
"vercel", "railway", "render"
};
// Malicious content indicators
private static final String[] MALICIOUS_PATTERNS = {
"sql injection", "drop table", "delete from",
"script>", "<iframe", "javascript:",
"eval(", "exec(", "system(",
"../../../", "etc/passwd", "cmd.exe"
};
@Override
public InputGuardrailResult validate(UserMessage userMessage) {
Log.info("InputValidationGuardrail: Validating user input");
String content = userMessage.singleText();
String contentLower = content.toLowerCase();
Log.debug("InputValidationGuardrail: Input length: " + content.length() + " characters");
// 1. Check for prompt injection attempts (highest priority)
String injectionPattern = detectPromptInjection(contentLower);
if (injectionPattern != null) {
Log.warn("InputValidationGuardrail: BLOCKED - Prompt injection detected: '" + injectionPattern + "'");
return failure(buildPromptInjectionResponse());
}
// 2. Check for malicious content
String maliciousPattern = detectMaliciousContent(contentLower);
if (maliciousPattern != null) {
Log.warn("InputValidationGuardrail: BLOCKED - Malicious content detected: '" + maliciousPattern + "'");
return failure(buildMaliciousContentResponse());
}
// 3. Check for off-topic action requests (e.g., "book me a flight")
String offTopicAction = detectOffTopicAction(contentLower);
if (offTopicAction != null) {
Log.warn("InputValidationGuardrail: BLOCKED - Off-topic action request: '" + offTopicAction + "'");
return failure(buildOffTopicActionResponse(offTopicAction));
}
// 4. Check for completely off-topic questions
String offTopicKeyword = detectCompletelyOffTopic(contentLower);
if (offTopicKeyword != null) {
Log.warn("InputValidationGuardrail: BLOCKED - Completely off-topic question: '" + offTopicKeyword + "'");
return failure(buildCompletelyOffTopicResponse(offTopicKeyword));
}
// 5. Check for off-topic technology combinations
String offTopicCombo = detectOffTopicCombination(contentLower);
if (offTopicCombo != null) {
Log.warn("InputValidationGuardrail: BLOCKED - Off-topic technology combination detected");
return failure(offTopicCombo);
}
// 6. Check for non-CloudX products (unless in valid context)
String nonCloudXProduct = detectNonCloudXProduct(contentLower);
if (nonCloudXProduct != null && !isValidCloudXContext(contentLower)) {
Log.warn("InputValidationGuardrail: BLOCKED - Non-CloudX product without valid context: '" + nonCloudXProduct + "'");
return failure(buildNonCloudXProductResponse(nonCloudXProduct));
}
// Input is valid
Log.info("InputValidationGuardrail: Input validated successfully");
return success();
}
/**
* Detects prompt injection attempts
*/
private String detectPromptInjection(String content) {
for (String pattern : PROMPT_INJECTION_PATTERNS) {
if (content.contains(pattern)) {
return pattern;
}
}
return null;
}
/**
* Detects malicious content patterns
*/
private String detectMaliciousContent(String content) {
for (String pattern : MALICIOUS_PATTERNS) {
if (content.contains(pattern)) {
return pattern;
}
}
return null;
}
/**
* Detects off-topic action requests (e.g., "book me a flight")
*/
private String detectOffTopicAction(String content) {
for (String action : OFF_TOPIC_ACTIONS) {
if (content.contains(action)) {
return action;
}
}
return null;
}
/**
* Detects completely off-topic questions
*/
private String detectCompletelyOffTopic(String content) {
for (String keyword : COMPLETELY_OFF_TOPIC) {
if (content.contains(keyword)) {
return keyword;
}
}
return null;
}
/**
* Detects off-topic technology combinations
*/
private String detectOffTopicCombination(String content) {
for (String[] combo : OFF_TOPIC_COMBINATIONS) {
String tech = combo[0];
String unsupportedContext = combo[1];
String message = combo[2];
if (content.contains(tech) && content.contains(unsupportedContext)) {
return message;
}
}
return null;
}
/**
* Detects non-CloudX products
*/
private String detectNonCloudXProduct(String content) {
for (String product : NON_CLOUDX_PRODUCTS) {
if (content.contains(product)) {
return product;
}
}
return null;
}
/**
* Checks if non-CloudX product is mentioned in valid context
* (comparison, migration, integration)
*/
private boolean isValidCloudXContext(String content) {
String[] validContextKeywords = {
"cloudx", "compare", "comparison", "versus", "vs",
"migrate", "migration", "move from", "switch from",
"integrate", "integration", "alternative to",
"replace", "instead of"
};
for (String keyword : validContextKeywords) {
if (content.contains(keyword)) {
return true;
}
}
return false;
}
/**
* Builds response for prompt injection attempts
*/
private String buildPromptInjectionResponse() {
return "I cannot process this request as it appears to contain instructions that would " +
"compromise my intended function. I'm designed to assist with CloudX Enterprise Platform " +
"sales enablement questions, including product features, pricing, competitive analysis, " +
"and sales methodology. Please ask a question related to these topics.";
}
/**
* Builds response for malicious content
*/
private String buildMaliciousContentResponse() {
return "I cannot process this request as it contains potentially malicious content. " +
"I'm here to help with CloudX Enterprise Platform sales enablement questions. " +
"Please ask about CloudX features, pricing, competitive positioning, or sales strategies.";
}
/**
* Builds response for off-topic action requests
*/
private String buildOffTopicActionResponse(String action) {
return "I cannot assist with personal service requests like '" + action + "'. " +
"I'm a CloudX Enterprise Platform sales enablement assistant. I can help you with:\n\n" +
"• CloudX features, capabilities, and technical architecture\n" +
"• Pricing, packaging, and ROI information\n" +
"• Competitive analysis and positioning\n" +
"• Sales methodology and processes\n" +
"• Customer success stories and case studies\n" +
"• Migration and implementation strategies\n\n" +
"Please ask a question related to CloudX sales enablement.";
}
/**
* Builds response for completely off-topic questions
*/
private String buildCompletelyOffTopicResponse(String keyword) {
return "I specialize in CloudX Enterprise Platform sales enablement and cannot assist with " +
"questions about " + keyword + ". I can help you with:\n\n" +
"• CloudX features, capabilities, and technical architecture\n" +
"• Pricing, packaging, and ROI information\n" +
"• Competitive analysis and positioning\n" +
"• Sales methodology and processes\n" +
"• Customer success stories and case studies\n" +
"• Migration and implementation strategies\n\n" +
"Please ask a question related to CloudX sales enablement.";
}
/**
* Builds response for non-CloudX products without valid context
*/
private String buildNonCloudXProductResponse(String product) {
return "I specialize in CloudX Enterprise Platform sales enablement. " +
"While I can discuss " + product + " in the context of CloudX comparisons, migrations, " +
"or integrations, I cannot provide standalone information about it. " +
"If you're interested in how CloudX compares to or integrates with " + product + ", " +
"please rephrase your question to include CloudX in the context.";
}
}BotResponse
src/main/java/com/ibm/api/BotResponse.java keeps the HTTP response shape boring. Successful answers and guardrail failures can use the same JSON wrapper, so the client only has one field to read.
package com.ibm.api;
public record BotResponse(String response) {
}InputGuardrailExceptionMapper
src/main/java/com/ibm/api/InputGuardrailExceptionMapper.java maps blocked input to 400 Bad Request. Without this, a guardrail failure can look like a server problem. That is the wrong kind of drama.
package com.ibm.api;
import dev.langchain4j.guardrail.InputGuardrailException;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.ext.ExceptionMapper;
import jakarta.ws.rs.ext.Provider;
import io.quarkus.logging.Log;
/**
* Exception mapper for InputGuardrailException.
* Maps validation failures from InputValidationGuardrail to structured JSON responses.
*/
@Provider
public class InputGuardrailExceptionMapper implements ExceptionMapper<InputGuardrailException> {
@Override
public Response toResponse(InputGuardrailException exception) {
Log.warn("InputGuardrailException caught: " + exception.getMessage());
// Extract the validation error message from the exception
String errorMessage = exception.getMessage();
if (errorMessage == null || errorMessage.trim().isEmpty()) {
errorMessage = "Input validation failed. Please ensure your question is related to CloudX Enterprise Platform sales enablement.";
}
// Return the error message in the same BotResponse format for consistency
BotResponse errorResponse = new BotResponse(errorMessage);
// Return 400 Bad Request with the structured response
return Response.status(Response.Status.BAD_REQUEST)
.entity(errorResponse)
.type("application/json")
.build();
}
}SalesEnablementResource
src/main/java/com/ibm/api/SalesEnablementResource.java exposes the demo as GET /bot?q=.... The fallback question keeps the endpoint easy to test from a browser, which is a small thing until you are doing the fifth local run.
package com.ibm.api;
import com.ibm.ai.SalesEnablementBot;
import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.QueryParam;
import jakarta.ws.rs.core.MediaType;
@Path("/bot")
public class SalesEnablementResource {
@Inject
SalesEnablementBot bot;
@GET
@Produces(MediaType.APPLICATION_JSON)
public BotResponse ask(@QueryParam("q") String question) {
if (question == null || question.trim().isEmpty()) {
question = "What is the best solution for a client who is migrating to a microservices architecture?";
}
String botResponse = bot.chat(question);
return new BotResponse(botResponse);
}
}Configuration
Now wire the runtime pieces in src/main/resources/application.properties. These settings connect the Java classes above to Ollama, pgvector, and Docling:
# ----------------------------------------
# 1. Ollama configuration (local LLM)
# ----------------------------------------
# Chat model (answers)
quarkus.langchain4j.ollama.chat-model.model-name=gpt-oss:20b
# Embedding model (document + query vectors)
# Default Ollama library tag granite-embedding:latest maps to IBM Granite ~30M English (see `ollama show granite-embedding` → embedding length).
# Larger community tags (for example granite-embedding-278m-multilingual) often use 768 dimensions — always match quarkus.langchain4j.pgvector.dimension to `embedding length` from ollama show.
quarkus.langchain4j.ollama.embedding-model.model-name=granite-embedding:latest
# Set a more generous timeout
quarkus.langchain4j.ollama.timeout=60s
# Logging during development
quarkus.langchain4j.log-requests=false
quarkus.langchain4j.log-responses=false
# ----------------------------------------
# 2. Datasource and pgvector
# ----------------------------------------
quarkus.datasource.db-kind=postgresql
# Use default datasource for pgvector
# Store table name
quarkus.langchain4j.pgvector.table=embeddings
quarkus.langchain4j.pgvector.drop-table-first=true
quarkus.langchain4j.pgvector.create-table=true
# Must equal the embedding model output width (same as `embedding length` from `ollama show <model>`).
# granite-embedding:latest → 384. If you switch to a 768-dim model, set 768 and drop/recreate the table or re-ingest.
quarkus.langchain4j.pgvector.dimension=384
# Optional, but recommended once data grows
quarkus.langchain4j.pgvector.use-index=true
quarkus.langchain4j.pgvector.index-list-size=10
# ----------------------------------------
# 3. Docling
# ----------------------------------------
# Docling Dev Service will start a container in dev mode and testing.
# The extension configures the REST client automatically.
# We configure the docling UI explicitly
quarkus.docling.devservices.enable-ui=true
quarkus.docling.timeout=3M
# Docling Serve may require auth (HTTP 401 without it). Sent as X-Api-Key; match DOCLING_SERVE_API_KEY on the server.
# Dev Services can populate this when it starts the container. If Docling is already listening on the default port,
# Quarkus may skip Dev Services—then set this explicitly to the same key your Docling instance expects.
# quarkus.docling.api-key=your-secret-here
# REST client timeout configuration for Docling
# Increase timeouts for large file processing (sync helper and Quarkus async client)
quarkus.rest-client."io.quarkiverse.docling.runtime.client.DoclingService".connect-timeout=60
quarkus.rest-client."io.quarkiverse.docling.runtime.client.DoclingService".read-timeout=300
quarkus.rest-client."io.quarkiverse.docling.runtime.client.QuarkusDoclingServeClient".connect-timeout=60
quarkus.rest-client."io.quarkiverse.docling.runtime.client.QuarkusDoclingServeClient".read-timeout=300Notes:
quarkus.langchain4j.ollama.timeoutcovers slow local models. Increase it if you see client timeouts.quarkus.langchain4j.pgvector.drop-table-first=trueis fine for demos. Turn it off when the table contains data you care about.REST client keys use the Docling REST client interfaces Quarkus generates:
DoclingServicefor the blocking helper andQuarkusDoclingServeClientfor the Mutiny task API. If these fully qualified class names change in a future extension, copy the new names from the Dev UI or extension docs.quarkus.docling.api-key(orQUARKUS_DOCLING_API_KEY) supplies theX-Api-Keyheader for Docling Serve. If you see 401 Unauthorized fromQuarkusDoclingServeClient, the server is enforcing API key auth and your app must send the matching secret (or align Dev Services with a running container on the default port).Large PDFs may need higher Docling read timeouts. The quarkus-docling issue tracker discusses gateway timeouts for very large uploads.
Static UI
The demo includes a small HTML client at src/main/resources/META-INF/resources/index.html, which is the standard Quarkus static resource location. It posts questions to /bot and renders Markdown in the browser. Copy it from the repository if you create the project from the CLI. I do not repeat it here because the article is already long enough.
Production Hardening
Timeouts and back-pressure: Ollama and Docling run outside your JVM. Set REST and Ollama timeouts explicitly. Configure both DoclingService and QuarkusDoclingServeClient for long-running conversions. For very large PDFs, increase the read timeout and follow upstream guidance on gateway limits.
Docling auth: When Docling Serve enables API keys, configure quarkus.docling.api-key so async convert/poll/result calls include X-Api-Key. Without it you get HTTP 401 from the client.
Startup vs RAG readiness: Background ingestion means the HTTP port opens before pgvector is full. Put /q/health/ready (SmallRye Health) in front of production traffic. The bundled IngestionReadinessCheck stays DOWN until indexing completes. If you call /bot without waiting, answers may have little retrieved context.
Event loop safety: Docling’s Mutiny chain and the embedding loop run on the worker pool, not the Vert.x event loop. Keep blocking LangChain4j calls off the event loop when you extend the pipeline.
Vector store integrity: Changing the embedding model or dimension without recreating the table produces bad retrieval. Treat embedding config like a schema migration. It is less exciting than debugging why every answer is confidently adjacent to the truth.
Guardrails and abuse: The sample uses pattern-based input and output guardrails. They reduce obvious misuse but are not a full safety program. Rate-limit and authenticate any external deployment of /bot.
Observability: Retrieval logging in DocumentRetriever shows which chunks influenced a reply. Keep that in dev, then trim or gate it in production.
Verification
Pull models (example):
ollama pull gpt-oss:20bandollama pull granite-embedding:latestFrom the module root:
./mvnw quarkus:devWatch logs for
Document ingestion pipeline finished; readiness is UP.Optionally poll readiness:
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/q/health/readyExpect 503 while ingestion runs, then 200 when the index is ready. An empty corpus may flip to UP quickly.
Open http://localhost:8080/
for the bundled UI, or call the API after readiness is 200:
curl -s "http://localhost:8080/bot?q=What%20CloudX%20tier%20fits%20a%20regulated%20industry%20customer?"Expect JSON {"response":"..."} with content grounded in your src/main/resources/documents/ files. If you curl immediately on a cold start, the model may answer with little retrieved context until indexing completes.
Conclusion
You now have a single Quarkus module that turns messy PDFs into structured text, stores embeddings in pgvector, and answers through an Ollama-backed model with explicit guardrails. That is enough to start moving toward production agent tooling without changing the basic shape of the stack.
The complete, updated code is available in the enterprise-rag repository.






The URL at the bottom of the post is wrong i get a 404 it should point to the correct URL https://github.com/myfear/the-main-thread/tree/main/enterprise-rag ?