Why Query Rewriting Matters in RAG Systems
A Java-centric guide to controlling retrieval behavior with LangChain4j and Quarkus
Query rewriting is one of the least visible but most impactful parts of a RAG system. When it is missing, retrieval quality degrades in subtle ways. When it is present and well-designed, downstream components behave more predictably without any additional complexity.
This tutorial focuses on query rewriting as an explicit architectural step, implemented using LangChain4j’s QueryTransformer abstraction and integrated into Quarkus Easy RAG.
The goal is not to introduce another optimization, but to make retrieval behavior intentional and observable.
Why Query Transformers Exist
Retrieval works best when queries are:
explicit
self-contained
aligned with the structure of the underlying documents
User input rarely meets those criteria.
In real applications, queries often contain:
pronouns without a clear referent
shorthand that only makes sense in conversation
missing domain terms that exist in the knowledge base
Vector search will still return results in these cases, but the results tend to be broad, noisy, or misleading. This is not a failure of embeddings. It is a mismatch between how users ask questions and how documents are indexed.
Query rewriting exists to bridge that gap.
The LangChain4j QueryTransformer Model
LangChain4j models query rewriting through the QueryTransformer interface. This is a deliberate design choice.
A QueryTransformer:
receives a query
returns a transformed query
has no access to retrieval results
does not generate an answer
This keeps query rewriting:
deterministic
testable
independent of retrieval or generation concerns
In a Quarkus Easy RAG pipeline, the transformer runs before embeddings are created, which is exactly where rewriting belongs.
Original Query → Transformed Query → RetrievalUnderstanding CompressingQueryTransformer (Conceptually)
The CompressingQueryTransformer in LangChain4j exists to:
Remove unnecessary verbosity
Preserve semantic intent
Produce retrieval-efficient queries
Conceptually, it answers:
“What is the shortest query that still expresses the user’s intent?”
We will extend this idea slightly for RAG-specific use cases:
Ambiguity removal
Intent clarification
Retrieval friendliness
Where Query Transformers Fit in Easy RAG
Easy RAG internally does:
Query → Embedding → Vector Search → Context → LLMWe want:
Query → QueryTransformer → Embedding → Vector Search → Context → LLMThis is exactly what QueryTransformers are designed for.
No hacks. No interception.
Creating the Quarkus Project
We use Quarkus Easy RAG so ingestion is not a separate service or script. Feel free to take a look at the Github repository with the source code.
mvn io.quarkus:quarkus-maven-plugin:create \
-DprojectGroupId=com.example \
-DprojectArtifactId=rag-query-transformer \
-Dextensions="quarkus-langchain4j-easy-rag,quarkus-langchain4j-ollama,quarkus-rest-jackson" \
-DclassName="com.example.ChatbotResource" \
-Dpath="/chat"
cd rag-query-transformerEasy RAG is intentionally opinionated. That is a feature, not a limitation.
Preparing Example Documents for Query Rewriting Tests
To evaluate whether a query transformer is doing useful work, the document set must contain overlapping concepts with different scopes. If every document is perfectly distinct, rewriting appears unnecessary. If everything overlaps, retrieval quality becomes hard to reason about.
The following three documents are deliberately structured to create:
ambiguity around “this” and “it”
overlap between product features
clear signals that rewriting should improve retrieval precision
Document Structure
Create the following directory:
mkdir -p src/main/resources/documentsEasy RAG will ingest these files automatically on startup.
Document 1: Standard Savings Account
File: standard-savings-account.txt
Standard Savings Account
The Standard Savings Account is designed for customers who want a simple way to save money.
Features:
- Annual interest rate of 2.5%
- No monthly maintenance fees
- No minimum balance requirement
- FDIC insured up to $250,000
- Online and mobile banking access
This account does not include travel benefits or international usage features.
It is intended primarily for domestic use.This document explicitly excludes international usage. It allows us to verify whether rewritten queries correctly avoid retrieving it for questions about travel or international use.
Document 2: Premium Checking Account
File: premium-checking-account.txt
Premium Checking Account
The Premium Checking Account is intended for customers who need flexibility and additional services.
Features:
- Worldwide ATM access with no ATM fees
- Debit card usable internationally
- Travel insurance coverage for eligible purchases
- Priority customer support
- Monthly fee of $15, waived with a minimum balance of $5,000
This account is suitable for customers who travel frequently or spend money abroad.This is the primary target for queries such as:
“Can I use this abroad?”
“Does it include travel benefits?”
Without rewriting, retrieval may incorrectly include savings account content.
Document 3: Personal Loan Products
File: personal-loans.txt
Personal Loan Products
The bank offers several personal loan options for different needs.
Available loans include:
- Home improvement loans with rates starting at 5.99% APR
- Auto loans with rates starting at 3.99% APR
- Loan terms ranging from 12 to 84 months
- No prepayment penalties
Personal loans are separate from checking and savings accounts.
Loan products do not include banking or card-related features.This document introduces topic overlap without feature overlap. It helps demonstrate whether rewritten queries remain focused on accounts rather than drifting toward unrelated financial products.
Easy RAG Configuration
quarkus.langchain4j.timeout=60s
quarkus.langchain4j.easy-rag.path=src/main/resources/documents
quarkus.langchain4j.easy-rag.path-type=FILESYSTEM
quarkus.langchain4j.easy-rag.path-matcher=glob:**.txt
quarkus.langchain4j.easy-rag.max-segment-size=400
quarkus.langchain4j.easy-rag.max-overlap-size=50
quarkus.langchain4j.easy-rag.max-results=3
quarkus.langchain4j.easy-rag.path=easy-rag-catalog
quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=true
quarkus.langchain4j.easy-rag.reuse-embeddings.file=banking-embeddings.json
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-textDefining a Query Transformer
The most common rewriting task is to turn a conversational question into a standalone one.
The transformer does not need to understand the full domain. Its responsibility is narrower:
remove ambiguity
make implicit references explicit
preserve intent
The implementation uses a language model, but the output is treated as data, not an answer.
Important constraints applied in the prompt:
do not answer the question
do not add explanations
rewrite only
This keeps the transformer aligned with its role in the pipeline.
This is intentionally low-level and explicit.
package com.example.rag;
import static dev.langchain4j.internal.Utils.getOrDefault;
import static dev.langchain4j.internal.ValidationUtils.ensureNotNull;
import static java.util.Collections.singletonList;
import static java.util.stream.Collectors.joining;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.rag.query.Query;
import dev.langchain4j.rag.query.transformer.QueryTransformer;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.jboss.logging.Logger;
@ApplicationScoped
public class StandaloneQueryTransformer implements QueryTransformer {
private static final Logger LOG = Logger.getLogger(StandaloneQueryTransformer.class);
public static final PromptTemplate DEFAULT_PROMPT_TEMPLATE = PromptTemplate.from(
"""
Rewrite the following question so it is fully self-contained.
Preserve the original intent.
Do not answer the question.
Do not add explanations.
Conversation:
{{chatMemory}}
User query: {{query}}
Rewritten question:
""");
protected final PromptTemplate promptTemplate;
protected final ChatModel chatModel;
@Inject
public StandaloneQueryTransformer(ChatModel chatModel) {
this(chatModel, DEFAULT_PROMPT_TEMPLATE);
}
public StandaloneQueryTransformer(ChatModel chatModel, PromptTemplate promptTemplate) {
this.chatModel = ensureNotNull(chatModel, "chatModel");
this.promptTemplate = getOrDefault(promptTemplate, DEFAULT_PROMPT_TEMPLATE);
}
@Override
public Collection<Query> transform(Query query) {
String originalQuery = query.text();
LOG.infof("Transforming query (before): %s", originalQuery);
long startTime = System.currentTimeMillis();
List<ChatMessage> chatMemory = query.metadata() != null ? query.metadata().chatMemory() : null;
if (chatMemory == null || chatMemory.isEmpty()) {
// no need to compress if there are no previous messages
long duration = System.currentTimeMillis() - startTime;
LOG.infof("Query unchanged (no chat memory), took %d ms", duration);
return singletonList(query);
}
Prompt prompt = createPrompt(query, format(chatMemory));
String compressedQueryText = chatModel.chat(prompt.text());
Query compressedQuery = query.metadata() == null
? Query.from(compressedQueryText)
: Query.from(compressedQueryText, query.metadata());
long duration = System.currentTimeMillis() - startTime;
LOG.infof("Transforming query (after): %s", compressedQueryText);
LOG.infof("Query transformation took %d ms", duration);
return singletonList(compressedQuery);
}
protected String format(List<ChatMessage> chatMemory) {
return chatMemory.stream()
.map(this::format)
.filter(Objects::nonNull)
.collect(joining("\n"));
}
protected String format(ChatMessage message) {
if (message instanceof UserMessage userMessage) {
return "User: " + userMessage.singleText();
} else if (message instanceof AiMessage aiMessage) {
if (aiMessage.hasToolExecutionRequests()) {
return null;
}
return "AI: " + aiMessage.text();
} else {
return null;
}
}
protected Prompt createPrompt(Query query, String chatMemory) {
Map<String, Object> variables = new HashMap<>();
variables.put("query", query.text());
variables.put("chatMemory", chatMemory);
return promptTemplate.apply(variables);
}
}The transformer implements the QueryTransformer interface and follows this flow:
Context Detection: Checks if the query includes chat history in its metadata. If not, it returns the original query.
Conversation Formatting: Formats the chat history into a readable string:
User messages: “User: [text]”
AI messages: “AI: [text]” (excluding tool execution requests)
Filters out null entries
Query Rewriting: Uses a prompt template to instruct an LLM to rewrite the query to be self-contained while preserving intent. The template includes:
The conversation history
The current user query
Instructions to make it standalone without answering or explaining
Result Preservation: Returns the rewritten query with the original metadata intact, maintaining context for downstream processing.
Defining the AI Service
The AI service is intentionally boring.
package com.example;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
import jakarta.enterprise.context.ApplicationScoped;
@RegisterAiService() // no need to declare a retrieval augmentor here, it is automatically generated
// and discovered
@ApplicationScoped
public interface BankingChatbot {
@SystemMessage("""
You are a banking assistant.
Answer only using the provided context.
If the answer is not present, say so.
""")
String chat(@MemoryId String sessionId, @UserMessage String question);
}NOTE: The @AiService is annotated with @ApplicationScoped. If we leave out the annotation here, it automatically becomes @RequestScoped and would lose the memory with every request.
Registering the Query Transformer in Easy RAG
Now we need to extend the simple easy-rag approach with our customization.
package com.example.rag;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Produces;
import jakarta.inject.Inject;
@ApplicationScoped
public class RetrievalAugmentorProducer {
@Inject
ContentRetriever contentRetriever;
@Inject
StandaloneQueryTransformer queryTransformer;
@Produces
@ApplicationScoped
public RetrievalAugmentor createRetrievalAugmentor() {
return DefaultRetrievalAugmentor.builder()
.queryTransformer(queryTransformer)
.contentRetriever(contentRetriever)
.build();
}
}The producer :
Injects EmbeddingStore<TextSegment> — Created by easy-rag from your documents
Injects EmbeddingModel — Your Ollama embedding model (nomic-embed-text)
Creates EmbeddingStoreContentRetriever — Builds the retriever using the store and model
Builds RetrievalAugmentor — Combines your custom StandaloneQueryTransformer with the created retriever
This is exactly how LangChain4j intends it.
REST Endpoint
The API surface here stays super small.
package com.example;
import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/chat")
public class ChatbotResource {
@Inject
BankingChatbot bankingChatbot;
@GET
@Produces(MediaType.APPLICATION_JSON)
public ChatResponse chat(String question) {
return new ChatResponse(bankingChatbot.chat(question));
}
public record ChatResponse(String answer) {
}
}You can wrap this with auth, rate limits, or tracing later.
The core stays stable.
Testing the Impact
Input
curl -X POST "http://localhost:8080/chat" \
-H "Content-Type: text/plain" \
-d "Can I use this abroad?" | jqLog-files:
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Transforming query (before): What does the checking account offer?
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Query unchanged (no chat memory), took 0 msResponse:
{
"answer": "The checking account that is described in the information does not have an explicit name mentioned (e.g., Premium Checking Account). However, it appears to be designed primarily for domestic use and includes features such as online and mobile banking access, no monthly maintenance fees, and no minimum balance requirement."
}Because this is the initial request, there is no chat memory and nothing to validate the question against in context. Let’s try with an ambigous follow-up questions:
curl -X POST "http://localhost:8080/chat" \
-H "Content-Type: text/plain" \
-d "Can I use this abroad?" | jq Log-files:
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Transforming query (before): Can I use this abroad?
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Transforming query (after): Is the checking account suitable for international usage?
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Query transformation took 420 msAnd the response:
{
"answer": "Yes, you can use the Premium Checking Account abroad due to its features:\n\n* Worldwide ATM access with no ATM fees\n* Debit card usable internationally\n* Travel insurance coverage for eligible purchases\n\nHowever, there is no information provided about the Standard Savings Account being suitable for international use or offering any benefits related to traveling."
}Congratulations, you just created a working QueryTransformer.
Common Failure Modes
Transformer answers the question
→ Fix prompt. Add stronger constraints.
Transformer adds assumptions
→ Reduce creativity. Lower temperature.
Transformer over-expands
→ Add compression rules.
Best Practices from Mature RAG Systems
Always rewrite conversational queries
Rewrite before embeddings
Keep transformers deterministic
Log transformed queries
Version transformer prompts
Python teams learned this the hard way.
Java teams can skip that phase.
Where to Go Next
Once QueryTransformers are in place, the next steps usually are:
Multi-query expansion transformers
Metadata-aware transformers
Evaluation of rewritten vs original queries
Conditional rewriting (only when ambiguity is detected)
Each of those builds cleanly on this foundation.
Query rewriting is not a trick.
It is an important AI-infrastructure element. LangChain4j’s QueryTransformer SPI gives Java developers a clean, explicit, testable hook to implement it properly.
Easy RAG handles ingestion and retrieval.
Query Transformers give you control over meaning.
That is the difference between a demo and a system you can trust.


