Building an AI-Powered PDF Processing Pipeline with Quarkus, Camel, and LangChain4j
Securely scan, extract, and summarize documents using ClamAV, Camel routes, and local LLMs with Ollama.
In this hands-on tutorial, you'll build a complete file processing pipeline in Java using Quarkus, Apache Camel, and LangChain4j. The pipeline accepts uploaded PDFs, scans them for viruses, extracts their content, and summarizes them using a local LLM served by Ollama. Thanks to the camel-langchain4j
component, we can inject AI-powered analysis directly into our Camel route with zero boilerplate and just business logic.
What You’ll Build
This application does the following:
Accepts PDF file uploads using a non-blocking REST endpoint.
Scans each uploaded file using ClamAV to ensure it's virus-free.
Extracts raw text using
camel-pdf
.Sends the text to a Large Language Model using the
camel-langchain4j-chat
extension for summarization.Returns the summary to the user as a JSON response.
By the end, you’ll understand how to fuse traditional file processing with LLM-based automation securely.
Project Setup
Start by bootstrapping your Quarkus project with the right set of extensions.
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=com.example \
-DprojectArtifactId=pdf-processing-pipeline \
-Dextensions="rest-jackson, camel-quarkus-pdf, camel-quarkus-langchain4j,camel-quarkus-direct,camel-quarkus-langchain4j-chat,langchain4j-ollama, quarkus-antivirus"
cd pdf-processing-pipeline
Key Dependencies:
rest-jackson
: Provides fast, non-blocking REST endpoints.camel-quarkus-pdf
: Extracts text from PDF documents.camel-quarkus-langchain4j
: Integrates LangChain4j into Camel routes.camel-quarkus-langchain4j-chat
: Gives Camel access to chat models.camel-quarkus-direct
: Calls another endpoint from the same Camel Context synchronouslylangchain4j-ollama
: Connects LangChain4j to Ollama and its local models.quarkus-antivirus
: Wraps antivirus capabilities, e.g., ClamAV, for file scanning.
And as usual, feel free to grab the full example from my Github repository.
Configuration
Edit src/main/resources/application.properties
:
# LangChain4j Ollama Configuration
quarkus.langchain4j.ollama.chat-model.model-id=gpt-oss:20b
# File Upload Configuration
quarkus.http.limits.max-body-size=50M
# Antivirus Configuration
quarkus.antivirus.clamav.enabled=true
The application.properties
file defines how the application runs: it configures a local Ollama model (gpt-oss:20b
) for AI processing, sets a 50 MB upload limit so large PDF files can be accepted, and enables ClamAV antivirus scanning to check files before processing. Together, these settings ensure that the service can securely handle and analyze sizable PDFs while protecting against malware.
Virus Scanning Bean
Create a new Java class VirusScanner.java
in src/main/java/com/example
. This bean injects the Antivirus
service and provides a simple method we can call from our Camel route.
package com.example;
import java.io.ByteArrayInputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import io.quarkiverse.antivirus.runtime.Antivirus;
import io.quarkiverse.antivirus.runtime.AntivirusScanResult;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.inject.Named;
@ApplicationScoped
@Named("virusScanner")
public class VirusScanner {
@Inject
Antivirus antivirus;
public byte[] scan(Path filePath) throws Exception {
// Scan the file
byte[] fileBytes = Files.readAllBytes(filePath);
try (var inputStream = new ByteArrayInputStream(fileBytes)) {
List<AntivirusScanResult> results = antivirus.scan(filePath.toString(), inputStream);
for (AntivirusScanResult result : results) {
if (result.getStatus() != 200) {
throw new Exception("Virus found: " + result.getMessage());
}
}
}
return fileBytes; // Return the file content instead of path
}
}
The VirusScanner
class serves as a security checkpoint in the pipeline: it takes uploaded files, wraps their bytes in a ByteArrayInputStream
, and passes that stream to the ClamAV antivirus scanner. This design follows the Quarkus antivirus API contract, which expects an InputStream
, and lets the scanner process data efficiently in chunks instead of as a single byte array. If a threat is detected, the class raises a Exception
; if not, the clean content is returned for further processing.
We also use the @Named annotation to make the bean available to the Camel route.
Reactive File Upload Endpoint
The FileUploadResource
class is the REST entry point for the PDF pipeline. It exposes a POST /upload
endpoint that accepts multipart form data, validates that a non-null PDF file was provided, and rejects invalid input with a 400 response. Valid files are passed as File
objects into the Camel route via a ProducerTemplate
, targeting the direct:processPdf
endpoint for downstream processing. The class handles errors gracefully: Failures are caught and reported as structured HTTP responses. In the overall architecture, it serves as the gateway, bridging HTTP requests with the Camel-driven PDF processing workflow.
Rename the scaffolded GreetingResource
and add below code that accepts multipart file uploads and passes the file path to Camel.
src/main/java/com/example/FileUploadResource.java
package com.example;
import java.io.File;
import org.apache.camel.ProducerTemplate;
import org.jboss.resteasy.reactive.multipart.FileUpload;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
@Path("/upload")
public class FileUploadResource {
@Inject
private ProducerTemplate producerTemplate;
@POST
@Consumes(MediaType.MULTIPART_FORM_DATA)
@Produces(MediaType.APPLICATION_JSON)
public Response uploadFile(@org.jboss.resteasy.reactive.RestForm FileUpload file) {
if (file == null || !file.fileName().toLowerCase().endsWith(".pdf")) {
return Response.status(Response.Status.BAD_REQUEST)
.entity("{\"error\":\"A PDF file must be uploaded\"}")
.build();
}
try {
// Get the uploaded file directly
File uploadedFile = file.uploadedFile().toFile();
// Send the file path to the Camel route and get the result
String result = producerTemplate.requestBody("direct:processPdf", uploadedFile, String.class);
return Response.ok(result).build();
} catch (Exception e) {
return Response.status(Response.Status.INTERNAL_SERVER_ERROR)
.entity("{\"error\":\"Processing failed: " + e.getMessage() + "\"}")
.build();
}
}
}
Key Features:
Validates that the file is a
.pdf
.Sends the file path to Camel via the
direct:processPdf
endpoint.Handles exceptions such as viruses or LLM failures gracefully.
The Camel Route
This is the core of our pipeline where we define the processing flow. By using camel-langchain4j
, we no longer need to create custom AI service beans. The route becomes simpler and more declarative.
Create the route in PdfProcessingRoute.java
.
package com.example;
import org.apache.camel.LoggingLevel;
import org.apache.camel.builder.RouteBuilder;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class PdfProcessingRoute extends RouteBuilder {
@Override
public void configure() throws Exception {
// Global exception handling
onException(Exception.class)
.handled(true) // Mark as handled so it doesn't propagate
.log(LoggingLevel.ERROR, "Error processing file: ${exception.message}")
.setBody(constant("Error: File processing failed"))
.stop(); // Stop further processing
from("direct:processPdf")
.log("New PDF received: ${header.CamelMessageBodyType}")
.log("→ Scanning for viruses...")
.bean("virusScanner", "scan")
.log("✔ File is clean.")
.log("→ Extracting text...")
.to("pdf:extractText")
.log("✔ Extraction done. Text: ${body}")
.log("→ Generating summary via LLM...")
.transform(simple(
"Provide a concise, one-paragraph summary of the following text extracted from a PDF file: ${body}"))
.to("langchain4j-chat:default")
.log("✔ Summary complete: ${body}")
.transform(simple("PDF summary: ${body}"))
.log("✔ Basic processing complete: ${header.CamelMessageBodyType}");
}
}
Key Features:
bean("virusScanner", "scan")
: Scans the file.to("pdf:extractText")
: Extracts raw text from PDF.transform(simple(...))
: Crafts a prompt for the LLM.to("langchain4j-chat:default")
: Sends the prompt to the LangChain4j Ollama-backed model.
This is a full declarative AI pipeline embedded directly in your Camel flow and no glue code, no clutter.
Running the App
Start Quarkus in dev mode:
./mvnw quarkus:dev
Then upload a PDF:
curl -v -X POST -F "file=@/path/to/document.pdf" http://localhost:8080/upload
You’ll see detailed logs for each processing stage. If the document is clean, you’ll get a JSON response with the AI-generated summary. If it contains a virus, the system will fail fast and return an error.
Conclusion
This tutorial demonstrates how to build an AI-infused file processing pipeline in Java with Camel. By combining Quarkus with Apache Camel and LangChain4j, you can integrate modern LLM capabilities into your data flows without rewriting your architecture or drowning in boilerplate.
This setup is ideal for intelligent ingestion systems, secure knowledge workflows, or any use case where file uploads meet AI summarization and content automation.
Want to extend it? Add:
Semantic vector storage of extracted content.
Text classification or named entity recognition.
Human-in-the-loop workflows for flagged documents.
The future of enterprise Java isn’t just fast: It’s intelligent.