Fast and Smart: Content Moderation with Java, Bloom Filters, and Local LLMs
Build a blazing-fast moderation API in Quarkus that filters content intelligently using n-grams, probabilistic data structures, and on-device AI.
Modern applications that allow user-generated content need to moderate it. That usually means running everything through an expensive, latency-sensitive AI model or worse, trusting regex. In this tutorial, we’ll build something smarter.
You’ll learn how to combine a Bloom filter with a local LLM to moderate content efficiently. The Bloom filter gives you high-throughput, low-cost screening, while the LLM provides deep inspection only when necessary. This hybrid architecture balances speed and safety.
Let’s dive into how it works and build the whole thing with Quarkus, LangChain4j, and a local Ollama model.
Why Combine Bloom Filters with LLMs?
LLMs are powerful but expensive. Invoking a local or remote LLM for every comment or message in your system adds latency and burns compute.
But most content isn’t harmful. We can optimize by using a probabilistic filter, like a Bloom filter, to fast-track clearly safe content. The Bloom filter holds known problematic n-grams (short sequences of words). If content doesn’t match any of them, we assume it’s safe and skip the LLM.
Only if there’s a potential match do we escalate to the LLM for semantic understanding.
Here’s the logic flow:
Use a Bloom filter pre-loaded with known bad phrases.
Split incoming text into overlapping n-grams (e.g., 3-word phrases).
If any n-gram might match, hand it off to the LLM.
Otherwise, fast-pass the content as safe.
This is going to be a super short and crisp tutorial but if you want to, you can start from the Github repository.
Now let’s build it.
Prerequisites
Before you begin, ensure your environment is ready:
JDK 17 or newer
Maven 3.8 or newer
Podman (or Docker if you have to) for running Ollama as Dev Service
Create the Quarkus Project
Generate a new Quarkus project with the necessary extensions:
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=org.example \
-DprojectArtifactId=llm-filter-demo \
-DclassName="org.example.moderation.ModerationResource" \
-Dpath="/v1/moderate" \
-Dextensions="quarkus-rest-jackson,quarkus-langchain4j-ollama"
cd llm-filter-demo
Then, add Google Guava for the Bloom filter to your pom.xml
:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>33.4.8-jre</version>
</dependency>
Implement the Bloom Filter Service
This service initializes a Bloom filter with known problematic phrases. Instead of checking entire sentences, it breaks text into 3-word trigrams and checks each one.
Create BloomFilterService.java
:
package org.example.moderation;
package org.example.moderation;
import java.nio.charset.StandardCharsets;
import java.util.List;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import jakarta.annotation.PostConstruct;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class BloomFilterService {
private static final List<String> PROHIBITED_PHRASES = List.of(
"buy illegal items",
"secret cheat codes",
"prohibited substance",
"malicious download link");
private static final int N_GRAM_SIZE = 3;
private BloomFilter<CharSequence> filter;
@PostConstruct
void initialize() {
filter = BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
1000,
0.01);
PROHIBITED_PHRASES.forEach(phrase -> filter.put(phrase.toLowerCase()));
}
public boolean mightContainProblematicNgram(String content) {
if (content == null || content.isBlank())
return false;
String[] words = content.toLowerCase().trim().split("\\s+");
if (words.length < N_GRAM_SIZE) {
return filter.mightContain(String.join(" ", words));
}
for (int i = 0; i <= words.length - N_GRAM_SIZE; i++) {
StringBuilder ngramBuilder = new StringBuilder();
for (int j = 0; j < N_GRAM_SIZE; j++) {
ngramBuilder.append(words[i + j]).append(" ");
}
String ngram = ngramBuilder.toString().trim();
if (filter.mightContain(ngram)) {
return true;
}
}
return false;
}
}
This gives us a lightweight, fast first-pass filter.
Add the LLM Moderator Interface
Now create an interface that defines our AI moderation behavior. With LangChain4j and Quarkus, all we need is an annotated interface.
Create LLMModerator.java
:
package org.example.moderation;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
@RegisterAiService
public interface LLMModerator {
@SystemMessage("""
You are a highly-trained content moderation AI.
Analyze the following text for harmful, unethical, or inappropriate content.
Respond with a single word: either 'SAFE' or 'UNSAFE'. Do not provide any other text or explanation.
""")
@UserMessage("Content to analyze: {{content}}")
String moderate(String content);
}
This prompt gives us strict control over the LLM's response: either "SAFE" or "UNSAFE". Nothing else.
Configure the Application
Point Quarkus to the model it is supposed to use in application.properties
:
quarkus.langchain4j.ollama.chat-model.model-name=llama3
quarkus.langchain4j.ollama.timeout=60s
quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=true
You can tweak the timeout if your model takes longer on your system. We are also going to log requests and responses to the model so we see what is going on.
Build the API Endpoint
Now put it all together by wiring both services into a REST endpoint.
Edit ModerationResource.java
:
package org.example.moderation;
package org.example.moderation;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/v1/moderate")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class ModerationResource {
@Inject
BloomFilterService bloomFilterService;
@Inject
LLMModerator llmModerator;
public record ModerationRequest(String text) {
}
public record ModerationResponse(String text, String status, String checkedBy) {
}
@POST
public ModerationResponse moderate(ModerationRequest request) {
if (!bloomFilterService.mightContainProblematicNgram(request.text())) {
return new ModerationResponse(request.text(), "SAFE", "bloom_filter");
}
String llmResult = llmModerator.moderate(request.text());
return new ModerationResponse(request.text(), llmResult.trim(), "llm_model");
}
}
Now we have a two-layer moderation service, live and ready.
Run and Test the System
Start Quarkus in development mode:
./mvnw quarkus:dev
Test safe content (should pass via Bloom filter):
curl -X POST -H "Content-Type: application/json" \
-d '{"text": "This is a lovely blog post about baking cakes."}' \
http://localhost:8080/v1/moderate
You should see:
{"text":"This is a lovely blog post about baking cakes.","status":"SAFE","checkedBy":"bloom_filter"}
Try content with a known bad phrase:
curl -X POST -H "Content-Type: application/json" \
-d '{"text": "I found some secret cheat codes for my favorite game."}' \
http://localhost:8080/v1/moderate
This will trigger the LLM:
{"text":"I found some secret cheat codes for my favorite game.","status":"SAFE","checkedBy":"llm_model"}
And here’s a tricky one that the Bloom filter won’t catch:
curl -X POST -H "Content-Type: application/json" \
-d '{"text": "I will now explain how to create a phishing website."}' \
http://localhost:8080/v1/moderate
If it passes, it shows the limits of fixed-phrase detection. But the architecture lets you keep evolving.
Where to Go Next
This pattern gives you an efficient, scalable moderation system that prioritizes speed while maintaining accuracy. But it can do more.
Here are ideas to take it further:
Live updates: Add an admin API to dynamically train the Bloom filter at runtime.
Persistence: Save and restore the Bloom filter from disk or a key-value store.
Custom responses: Expand the LLM's output schema for nuanced moderation actions.
Batch processing: Accept multiple texts per request and run in parallel.
Moderation doesn’t have to be slow or expensive. With a little design and Quarkus power, you get both performance and intelligence in one flow.