Why Query Rewriting Matters in RAG Systems

A Java-centric guide to controlling retrieval behavior with LangChain4j and Quarkus

Feb 09, 2026

Query rewriting is one of the least visible but most impactful parts of a RAG system. When it is missing, retrieval quality degrades in subtle ways. When it is present and well-designed, downstream components behave more predictably without any additional complexity.

This tutorial focuses on query rewriting as an explicit architectural step, implemented using LangChain4j’s QueryTransformer abstraction and integrated into Quarkus Easy RAG.

The goal is not to introduce another optimization, but to make retrieval behavior intentional and observable.

Why Query Transformers Exist

Retrieval works best when queries are:

explicit
self-contained
aligned with the structure of the underlying documents

User input rarely meets those criteria.

In real applications, queries often contain:

pronouns without a clear referent
shorthand that only makes sense in conversation
missing domain terms that exist in the knowledge base

Vector search will still return results in these cases, but the results tend to be broad, noisy, or misleading. This is not a failure of embeddings. It is a mismatch between how users ask questions and how documents are indexed.

Query rewriting exists to bridge that gap.

The LangChain4j QueryTransformer Model

LangChain4j models query rewriting through the QueryTransformer interface. This is a deliberate design choice.

A QueryTransformer:

receives a query
returns a transformed query
has no access to retrieval results
does not generate an answer

This keeps query rewriting:

deterministic
testable
independent of retrieval or generation concerns

In a Quarkus Easy RAG pipeline, the transformer runs before embeddings are created, which is exactly where rewriting belongs.

Original Query → Transformed Query → Retrieval

Understanding CompressingQueryTransformer (Conceptually)

The CompressingQueryTransformer in LangChain4j exists to:

Remove unnecessary verbosity
Preserve semantic intent
Produce retrieval-efficient queries

Conceptually, it answers:

“What is the shortest query that still expresses the user’s intent?”

We will extend this idea slightly for RAG-specific use cases:

Ambiguity removal
Intent clarification
Retrieval friendliness

Where Query Transformers Fit in Easy RAG

Easy RAG internally does:

Query → Embedding → Vector Search → Context → LLM

We want:

Query → QueryTransformer → Embedding → Vector Search → Context → LLM

This is exactly what QueryTransformers are designed for.

No hacks. No interception.

Creating the Quarkus Project

We use Quarkus Easy RAG so ingestion is not a separate service or script. Feel free to take a look at the Github repository with the source code.

mvn io.quarkus:quarkus-maven-plugin:create \
  -DprojectGroupId=com.example \
  -DprojectArtifactId=rag-query-transformer \
  -Dextensions="quarkus-langchain4j-easy-rag,quarkus-langchain4j-ollama,quarkus-rest-jackson" \
  -DclassName="com.example.ChatbotResource" \
  -Dpath="/chat"

cd rag-query-transformer

Easy RAG is intentionally opinionated. That is a feature, not a limitation.

Preparing Example Documents for Query Rewriting Tests

To evaluate whether a query transformer is doing useful work, the document set must contain overlapping concepts with different scopes. If every document is perfectly distinct, rewriting appears unnecessary. If everything overlaps, retrieval quality becomes hard to reason about.

The following three documents are deliberately structured to create:

ambiguity around “this” and “it”
overlap between product features
clear signals that rewriting should improve retrieval precision

Document Structure

Create the following directory:

mkdir -p src/main/resources/documents

Easy RAG will ingest these files automatically on startup.

Document 1: Standard Savings Account

File: standard-savings-account.txt

Standard Savings Account

The Standard Savings Account is designed for customers who want a simple way to save money.

Features:
- Annual interest rate of 2.5%
- No monthly maintenance fees
- No minimum balance requirement
- FDIC insured up to $250,000
- Online and mobile banking access

This account does not include travel benefits or international usage features.
It is intended primarily for domestic use.

This document explicitly excludes international usage. It allows us to verify whether rewritten queries correctly avoid retrieving it for questions about travel or international use.

Document 2: Premium Checking Account

File: premium-checking-account.txt

Premium Checking Account

The Premium Checking Account is intended for customers who need flexibility and additional services.

Features:
- Worldwide ATM access with no ATM fees
- Debit card usable internationally
- Travel insurance coverage for eligible purchases
- Priority customer support
- Monthly fee of $15, waived with a minimum balance of $5,000

This account is suitable for customers who travel frequently or spend money abroad.

This is the primary target for queries such as:

“Can I use this abroad?”
“Does it include travel benefits?”

Without rewriting, retrieval may incorrectly include savings account content.

Document 3: Personal Loan Products

File: personal-loans.txt

Personal Loan Products

The bank offers several personal loan options for different needs.

Available loans include:
- Home improvement loans with rates starting at 5.99% APR
- Auto loans with rates starting at 3.99% APR
- Loan terms ranging from 12 to 84 months
- No prepayment penalties

Personal loans are separate from checking and savings accounts.
Loan products do not include banking or card-related features.

This document introduces topic overlap without feature overlap. It helps demonstrate whether rewritten queries remain focused on accounts rather than drifting toward unrelated financial products.

Easy RAG Configuration

quarkus.langchain4j.timeout=60s

quarkus.langchain4j.easy-rag.path=src/main/resources/documents
quarkus.langchain4j.easy-rag.path-type=FILESYSTEM
quarkus.langchain4j.easy-rag.path-matcher=glob:**.txt

quarkus.langchain4j.easy-rag.max-segment-size=400
quarkus.langchain4j.easy-rag.max-overlap-size=50
quarkus.langchain4j.easy-rag.max-results=3

quarkus.langchain4j.easy-rag.path=easy-rag-catalog
quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=true
quarkus.langchain4j.easy-rag.reuse-embeddings.file=banking-embeddings.json

quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text

Defining a Query Transformer

The most common rewriting task is to turn a conversational question into a standalone one.

The transformer does not need to understand the full domain. Its responsibility is narrower:

remove ambiguity
make implicit references explicit
preserve intent

The implementation uses a language model, but the output is treated as data, not an answer.

Important constraints applied in the prompt:

do not answer the question
do not add explanations
rewrite only

This keeps the transformer aligned with its role in the pipeline.

This is intentionally low-level and explicit.

package com.example.rag;

import static dev.langchain4j.internal.Utils.getOrDefault;
import static dev.langchain4j.internal.ValidationUtils.ensureNotNull;
import static java.util.Collections.singletonList;
import static java.util.stream.Collectors.joining;

import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.rag.query.Query;
import dev.langchain4j.rag.query.transformer.QueryTransformer;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.jboss.logging.Logger;

@ApplicationScoped
public class StandaloneQueryTransformer implements QueryTransformer {

    private static final Logger LOG = Logger.getLogger(StandaloneQueryTransformer.class);

    public static final PromptTemplate DEFAULT_PROMPT_TEMPLATE = PromptTemplate.from(
            """
                     Rewrite the following question so it is fully self-contained.
                     Preserve the original intent.
                     Do not answer the question.
                     Do not add explanations.

                    Conversation:
                    {{chatMemory}}

                    User query: {{query}}

                     Rewritten question:
                     """);

    protected final PromptTemplate promptTemplate;
    protected final ChatModel chatModel;

    @Inject
    public StandaloneQueryTransformer(ChatModel chatModel) {
        this(chatModel, DEFAULT_PROMPT_TEMPLATE);
    }

    public StandaloneQueryTransformer(ChatModel chatModel, PromptTemplate promptTemplate) {
        this.chatModel = ensureNotNull(chatModel, "chatModel");
        this.promptTemplate = getOrDefault(promptTemplate, DEFAULT_PROMPT_TEMPLATE);
    }

    @Override
    public Collection<Query> transform(Query query) {
        String originalQuery = query.text();
        LOG.infof("Transforming query (before): %s", originalQuery);

        long startTime = System.currentTimeMillis();

        List<ChatMessage> chatMemory = query.metadata() != null ? query.metadata().chatMemory() : null;
        if (chatMemory == null || chatMemory.isEmpty()) {
            // no need to compress if there are no previous messages
            long duration = System.currentTimeMillis() - startTime;
            LOG.infof("Query unchanged (no chat memory), took %d ms", duration);
            return singletonList(query);
        }

        Prompt prompt = createPrompt(query, format(chatMemory));
        String compressedQueryText = chatModel.chat(prompt.text());
        Query compressedQuery = query.metadata() == null
                ? Query.from(compressedQueryText)
                : Query.from(compressedQueryText, query.metadata());

        long duration = System.currentTimeMillis() - startTime;
        LOG.infof("Transforming query (after): %s", compressedQueryText);
        LOG.infof("Query transformation took %d ms", duration);

        return singletonList(compressedQuery);
    }

    protected String format(List<ChatMessage> chatMemory) {
        return chatMemory.stream()
                .map(this::format)
                .filter(Objects::nonNull)
                .collect(joining("\n"));
    }

    protected String format(ChatMessage message) {
        if (message instanceof UserMessage userMessage) {
            return "User: " + userMessage.singleText();
        } else if (message instanceof AiMessage aiMessage) {
            if (aiMessage.hasToolExecutionRequests()) {
                return null;
            }
            return "AI: " + aiMessage.text();
        } else {
            return null;
        }
    }

    protected Prompt createPrompt(Query query, String chatMemory) {
        Map<String, Object> variables = new HashMap<>();
        variables.put("query", query.text());
        variables.put("chatMemory", chatMemory);
        return promptTemplate.apply(variables);
    }

}

The transformer implements the QueryTransformer interface and follows this flow:

Context Detection: Checks if the query includes chat history in its metadata. If not, it returns the original query.

Conversation Formatting: Formats the chat history into a readable string:

User messages: “User: [text]”

AI messages: “AI: [text]” (excluding tool execution requests)

Filters out null entries

Query Rewriting: Uses a prompt template to instruct an LLM to rewrite the query to be self-contained while preserving intent. The template includes:

The conversation history

The current user query

Instructions to make it standalone without answering or explaining

Result Preservation: Returns the rewritten query with the original metadata intact, maintaining context for downstream processing.

Defining the AI Service

The AI service is intentionally boring.

package com.example;

import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
import jakarta.enterprise.context.ApplicationScoped;

@RegisterAiService() // no need to declare a retrieval augmentor here, it is automatically generated
                     // and discovered
@ApplicationScoped
public interface BankingChatbot {

    @SystemMessage("""
            You are a banking assistant.
            Answer only using the provided context.
            If the answer is not present, say so.
            """)
    String chat(@MemoryId String sessionId, @UserMessage String question);
}

NOTE: The @AiService is annotated with @ApplicationScoped. If we leave out the annotation here, it automatically becomes @RequestScoped and would lose the memory with every request.

Registering the Query Transformer in Easy RAG

Now we need to extend the simple easy-rag approach with our customization.

package com.example.rag;

import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Produces;
import jakarta.inject.Inject;

@ApplicationScoped
public class RetrievalAugmentorProducer {

    @Inject
    ContentRetriever contentRetriever;

    @Inject
    StandaloneQueryTransformer queryTransformer;

    @Produces
    @ApplicationScoped
    public RetrievalAugmentor createRetrievalAugmentor() {
        return DefaultRetrievalAugmentor.builder()
                .queryTransformer(queryTransformer)
                .contentRetriever(contentRetriever)
                .build();
    }
}

The producer :

Injects EmbeddingStore<TextSegment> — Created by easy-rag from your documents

Injects EmbeddingModel — Your Ollama embedding model (nomic-embed-text)

Creates EmbeddingStoreContentRetriever — Builds the retriever using the store and model

Builds RetrievalAugmentor — Combines your custom StandaloneQueryTransformer with the created retriever

This is exactly how LangChain4j intends it.

REST Endpoint

The API surface here stays super small.

package com.example;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/chat")
public class ChatbotResource {

    @Inject
    BankingChatbot bankingChatbot;

    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public ChatResponse chat(String question) {
        return new ChatResponse(bankingChatbot.chat(question));
    }

    public record ChatResponse(String answer) {
    }
}

You can wrap this with auth, rate limits, or tracing later.
The core stays stable.

Testing the Impact

Input

curl -X POST "http://localhost:8080/chat" \
  -H "Content-Type: text/plain" \
  -d "Can I use this abroad?" | jq

Log-files:

[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Transforming query (before): What does the checking account offer?
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Query unchanged (no chat memory), took 0 ms

Response:

{
  "answer": "The checking account that is described in the information does not have an explicit name mentioned (e.g., Premium Checking Account). However, it appears to be designed primarily for domestic use and includes features such as online and mobile banking access, no monthly maintenance fees, and no minimum balance requirement."
}

Because this is the initial request, there is no chat memory and nothing to validate the question against in context. Let’s try with an ambigous follow-up questions:

curl -X POST "http://localhost:8080/chat" \
  -H "Content-Type: text/plain" \
  -d "Can I use this abroad?" | jq

Log-files:

[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Transforming query (before): Can I use this abroad?
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Transforming query (after): Is the checking account suitable for international usage?
[com.example.rag.StandaloneQueryTransformer] (executor-thread-1) Query transformation took 420 ms

And the response:

{
  "answer": "Yes, you can use the Premium Checking Account abroad due to its features:\n\n* Worldwide ATM access with no ATM fees\n* Debit card usable internationally\n* Travel insurance coverage for eligible purchases\n\nHowever, there is no information provided about the Standard Savings Account being suitable for international use or offering any benefits related to traveling."
}

Congratulations, you just created a working QueryTransformer.

Common Failure Modes

Transformer answers the question
→ Fix prompt. Add stronger constraints.

Transformer adds assumptions
→ Reduce creativity. Lower temperature.

Transformer over-expands
→ Add compression rules.

Best Practices from Mature RAG Systems

Always rewrite conversational queries
Rewrite before embeddings
Keep transformers deterministic
Log transformed queries
Version transformer prompts

Python teams learned this the hard way.
Java teams can skip that phase.

Where to Go Next

Once QueryTransformers are in place, the next steps usually are:

Multi-query expansion transformers
Metadata-aware transformers
Evaluation of rewritten vs original queries
Conditional rewriting (only when ambiguity is detected)

Each of those builds cleanly on this foundation.

Query rewriting is not a trick.

It is an important AI-infrastructure element. LangChain4j’s QueryTransformer SPI gives Java developers a clean, explicit, testable hook to implement it properly.

Easy RAG handles ingestion and retrieval.
Query Transformers give you control over meaning.

That is the difference between a demo and a system you can trust.

Discussion about this post

Ready for more?