How to Build a Stateful AI Chat System in Java with Quarkus and LangChain4j

Learn how to implement sliding-window memory, JPA persistence, and multi-session chat for reliable LLM applications.

Dec 01, 2025

Large Language Models are stateless, but real applications depend on state.
Whether you build a support bot, a domain assistant, a tool-calling agent, or a multi-step workflow, your system needs a reliable memory strategy. Without it, even the best model will forget context, repeat itself, or hallucinate missing details.

Quarkus and LangChain4j give you powerful APIs for message handling and chat memory. The defaults are simple and powerful. In multi-user or multi-pod environments, the wrong memory strategy leads to token explosions, latency spikes, and context leakage across users. On top, there’s regulations and compliance requirements.

This tutorial walks you through a complete implementation:

Custom JPA chat memory store
Sliding window contextual memory
Per-session memory isolation
Full persistence in PostgreSQL
Complete REST API + debug endpoint
Verified, real code running on Quarkus 3.29.2 and Java 21

You’ll see how the pieces fit together and how to adapt the design for your own applications.

Prerequisites

You need:

Java 21+
Quarkus CLI
Podman (or Docker for Dev Services)
Ollama (local or as Dev Service)
A model such as llama3.1

Project Setup

You can follow along and implement yourself or grab the code from my Github repository. Create the project:

quarkus create app com.example:chat-memory-tutorial \
  --extension=quarkus-langchain4j-ollama \
  --extension=quarkus-jdbc-postgresql \
  --extension=quarkus-rest-jackson \
  --extension=hibernate-orm-panache  \
cd chat-memory-tutorial

This adds:

quarkus-langchain4j-ollama: LangChain4j integration with Ollama
hibernate-orm-panache: Simplified JPA with Panache
jdbc-postgresql: PostgreSQL driver
rest-jackson: REST endpoints with JSON support

Configure Ollama and our chat-memory:

#Ollama Dev Service
quarkus.langchain4j.ollama.chat-model.model-name=llama3.1
quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=true

# Chat memory max messages
chat-memory.max-messages=20

# Enable HQL Dev UI Console
%dev.quarkus.datasource.dev-ui.allow-sql=true

The chat-memory.max-messages property controls the sliding window size—how many recent messages to send to the LLM.

Understanding Memory in Quarkus + LangChain4j

LangChain4j automatically manages memory for each AI service.
The default is an in-memory message window that keeps 20 recent messages.

Memory Configuration in AI Services

Quarkus LangChain4j automatically manages memory for your AI services.
Every @RegisterAiService keeps track of recent messages in memory so that the model receives relevant context on each request.

@RegisterAiService
public interface MyAssistant {
    String chat(String message);
}

By default, it retains conversation state and injects it into each prompt depending on the AI service’s CDI scope (@ApplicationScoped, @RequestScoped, etc.).
It uses an internal ChatMemory instance stored in application memory, which accumulates messages and evicts old ones when the limit is reached (20 messages by default).

You can customize the default size in your application.properties file:

quarkus.langchain4j.chat-memory.memory-window.max-messages=20

This is a global setting for all AI services that use the default message-window memory.
For production or multi-session scenarios, you’ll usually define explicit memory IDs and custom providers, as we will develop in the following sections.

Memory IDs: The key to multi-user isolation

While the default memory is tied to a single service instance, real-world applications need multiple, isolated memories. Usually one per user, session, or tenant.
Quarkus doesn’t make that choice for you; you must define it through the @MemoryId annotation.

The official documentation states:

“The application code is responsible for providing a unique memory ID for each user or session.”

Your memory ID determines how conversation history is routed and persisted.
It controls whether users share context or operate independently.

Common strategies include:

User-based — memoryId = userId
Session-based — memoryId = sessionId
User + Session (recommended) — memoryId = userId + “:” + sessionId
Tenant + User + Session — for multi-tenant SaaS systems

Passing the Memory ID in Your API Layer

Since REST and reactive applications are stateless, you can’t rely on in-memory sessions like in older servlet applications.
Instead, you generate or extract the ID on every request and pass it to your AI service.

@Path(”/chat”)
public class ChatResource {

    @Inject
    SupportBot bot;

    @POST
    public String ask(
        @HeaderParam(”X-User-Id”) String userId,
        @QueryParam(”session”) String sessionId,
        String message
    ) {
        String memoryId = userId + “:” + sessionId;
        return bot.chat(memoryId, message);
    }
}

And the corresponding AI service:

@RegisterAiService
public interface SupportBot {
    String chat(@MemoryId String memoryId, String message);
}

Each user/session pair now maintains its own conversation thread, even when your Quarkus application scales across multiple pods or restarts.

What Strategies should I use for Generating Memory IDs

There is no single right answer. The best approach depends on how your application authenticates users and manages sessions.
Here are several common patterns you can apply in Quarkus:

HTTP Header or API Token

For APIs, pass a unique header such as X-User-Id or a bearer token subject (sub) claim:

@HeaderParam(”X-User-Id”) String userId;

or, if using JWT:

@Inject
JsonWebToken jwt;

String userId = jwt.getSubject();

This works well for REST APIs, mobile clients, or frontends that already include identity tokens.

Cookie or Session Identifier

When serving browser users, store a generated session token as a cookie:

@CookieParam(”session-id”) String sessionId;

Combine it with the authenticated user ID:

String memoryId = userId + “:” + sessionId;

This lets each browser tab or conversation thread maintain its own memory while still being stateless on the server side.

OAuth2 or OpenID Connect Principal

If your Quarkus app integrates with Keycloak or another OIDC provider, you can inject the current user directly:

@Inject
SecurityIdentity identity;

String userId = identity.getPrincipal().getName();

You can then derive a memory key like:

String memoryId = userId + “:” + UUID.randomUUID();

This ensures isolation across chat sessions for the same authenticated account.

Correlation ID or Request Context

For short-lived contexts such as workflow or tracing, use a correlation ID header:

@HeaderParam(”X-Correlation-Id”) String correlationId;

This is useful when you want the model to maintain short-term continuity across a set of related requests without tying it to a user account.

Client-Side Generated Session ID

The frontend can create a UUID once and reuse it throughout the session:

const sessionId = localStorage.getItem(”sessionId”) || crypto.randomUUID();
localStorage.setItem(”sessionId”, sessionId);

Then include it in every request:

POST /chat?session=abc-123

This keeps the server fully stateless while maintaining continuity for the user.

Multi-Tenant Context

In SaaS systems, prefix the memory ID with the tenant identifier:

String memoryId = tenantId + “::” + userId + “:” + sessionId;

This allows safe separation of conversation data between organizations and simplifies key management in Redis or a shared database.

How Chat Memory Works in Quarkus

(Architecture Overview)

Before we implement providers or stores, you need a mental model of how the system flows.

@MemoryId determines the scope
Provider decides how memory is trimmed
Store decides where memory is persisted
ChatMemory is the runtime buffer
Storage gives durability and multi-pod support

With the architecture clear, we can start implementing.

Configuring Memory in Quarkus

Once you understand how memory is scoped and managed inside each AI service, the next step is configuring how much context to keep and how Quarkus decides what to send to the model.

Quarkus provides two layers of configuration:

Global defaults – applied to all AI services
Per-service overrides – declared via annotations or custom suppliers

Global Configuration

You can control the global memory behavior using properties in application.properties.

# Use the simple message window memory
quarkus.langchain4j.chat-memory.type=MESSAGE_WINDOW

# Maximum number of messages to keep in memory
quarkus.langchain4j.chat-memory.memory-window.max-messages=20

If your model is token-sensitive, switch to token-based trimming:

# Use token-aware window memory
quarkus.langchain4j.chat-memory.type=TOKEN_WINDOW

# Limit the number of tokens instead of messages
quarkus.langchain4j.chat-memory.token-window.max-tokens=2000

These settings apply to every @RegisterAiService in your application that doesn’t define its own provider.

Per-Service Overrides

Each AI service can customize memory independently, for example to use a smaller window for lightweight agents and a summarizing strategy for long-running ones.

You do this with the chatMemoryProviderSupplier parameter of @RegisterAiService.

@RegisterAiService(chatMemoryProviderSupplier = FixedWindowMemorySupplier.class)
public interface SupportBot {
    String chat(@MemoryId String memoryId, String message);
}

The class FixedWindowMemorySupplier is a CDI bean that returns a configured ChatMemoryProvider.
You’ll create a JPAChatMemoryProvider yourself in the next section.

How Configuration Works Internally

Here’s how Quarkus merges these settings at runtime:

Global defaults come from application.properties.
Per-service providers override those defaults when present.
The resulting ChatMemoryProvider instance is injected into your AI service.
Each call uses the @MemoryId value to select or create a ChatMemory.

Implementing A JPA Memory Provider

This implementation uses:

A JPA entity for stored messages
A Panache repository
A ChatMemoryStore that serializes all messages
A sliding-window ChatMemory wrapper
A CDI-based supplier that wires everything together

Let’s walk through each part.

Create the JPA Entity

ChatMessageEntity stores each message in PostgreSQL.

Create: src/main/java/com/example/memory/entity/ChatMessageEntity.java

package com.example.memory.entity;

import java.time.Instant;

import io.quarkus.hibernate.orm.panache.PanacheEntityBase;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;
import jakarta.persistence.GeneratedValue;
import jakarta.persistence.GenerationType;
import jakarta.persistence.Id;
import jakarta.persistence.Index;
import jakarta.persistence.Lob;
import jakarta.persistence.Table;

@Entity
@Table(name = “chat_message”, indexes = @Index(name = “idx_memory_id”, columnList = “memory_id”)) 
public class ChatMessageEntity extends PanacheEntityBase {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    public Long id; // Auto-generated ID

    @Column(name = “memory_id”) // Indexed for fast lookups
    public String memoryId;

    public String type; // “USER”, “AI”, “SYSTEM”

    @Lob
    public String text; // The message content

    public Instant createdAt;
}

Key points:

Indexed memoryId for fast retrieval
LOB storage for message JSON
Strict ordering via createdAt

Create the Repository

The ChatMemoryRepository provides type-safe database operations for chat messages. It encapsulates all SQL queries and transaction management, offering three core operations: finding messages by conversation ID (with automatic ordering), saving new messages, and deleting entire conversations. By implementing PanacheRepository, you get Panache’s simplified query syntax while maintaining full control over transaction boundaries.

Create src/main/java/com/example/memory/entity/ChatMemoryRepository.java:

package com.example.memory.entity;

import java.util.List;

import io.quarkus.hibernate.orm.panache.PanacheRepository;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.transaction.Transactional;

@ApplicationScoped
public class ChatMemoryRepository implements PanacheRepository<ChatMessageEntity> {

    @Transactional(Transactional.TxType.MANDATORY)
    public List<ChatMessageEntity> findByMemoryId(String memoryId) {
        return find(”memoryId = ?1 ORDER BY createdAt”, memoryId).list();
    }

    @Transactional(Transactional.TxType.MANDATORY)
    public void save(ChatMessageEntity entity) {
        if (entity.createdAt == null) {
            entity.createdAt = java.time.Instant.now();
        }
        // Use merge which works for both insert and update
        getEntityManager().merge(entity);
        getEntityManager().flush();
    }

    @Transactional(Transactional.TxType.MANDATORY)
    public void deleteByMemoryId(String memoryId) {
        delete(”memoryId”, memoryId);
    }
}

Key points:

MANDATORY transaction type: ensures all operations happen inside a caller-managed transaction.
Automatic timestamp handling: The save() method sets createdAt if null, ensuring every message has a timestamp even if the caller forgets to set it
merge() with flush(): Uses JPA’s merge() operation (works for both insert and update) followed by flush() to immediately synchronize changes with the database, making them visible to subsequent queries in the same transaction
Ordered retrieval: The findByMemoryId() query includes ORDER BY createdAt to guarantee messages are returned in chronological order, which is essential for conversation reconstruction

Implement the ChatMemoryStore

The JPAChatMemoryStore bridges LangChain4j’s memory abstraction with your JPA persistence layer. It implements the ChatMemoryStore interface, handling serialization of LangChain4j’s polymorphic message types (UserMessage, AiMessage, SystemMessage) to JSON and back. This store manages the complete lifecycle of conversation storage: updating messages (delete-then-insert pattern), retrieving them with proper deserialization, and cleaning up conversations when needed.

Create src/main/java/com/example/memory/store/jpa/JPAChatMemoryStore.java:

package com.example.memory.store.jpa;

import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

import com.example.memory.entity.ChatMessageEntity;
import com.example.memory.entity.ChatMemoryRepository;
import com.fasterxml.jackson.databind.ObjectMapper;

import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.store.memory.chat.ChatMemoryStore;
import io.quarkus.arc.Unremovable;
import io.quarkus.logging.Log;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Typed;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;

@ApplicationScoped
@Typed(JPAChatMemoryStore.class)
@Unremovable
public class JPAChatMemoryStore implements ChatMemoryStore {

    /**
     * The repository used to persist and retrieve chat messages from the database.
     */
    private final ChatMemoryRepository repository;

    /**
     * The ObjectMapper used for serializing and deserializing ChatMessage objects.
     */
    private final ObjectMapper objectMapper;

    @Inject
    public JPAChatMemoryStore(ChatMemoryRepository repository, ObjectMapper objectMapper) {
        this.repository = repository;
        this.objectMapper = objectMapper;
    }

    @Override
    @Transactional
    public void updateMessages(Object memoryId, List<ChatMessage> messages) {
        if (messages == null || messages.isEmpty()) {
            Log.warnf(”No messages to update for memory ID: %s”, memoryId);
            return;
        }

        String memoryIdString = memoryId.toString();
        try {
            // Delete existing messages for this memory ID
            repository.deleteByMemoryId(memoryIdString);

            // Save each message as a separate entity
            // Use incremental timestamps to preserve message order
            Instant baseTime = Instant.now();
            for (int i = 0; i < messages.size(); i++) {
                ChatMessage message = messages.get(i);
                ChatMessageEntity entity = new ChatMessageEntity();
                entity.memoryId = memoryIdString;
                entity.type = message.type().toString();
                // Serialize the full ChatMessage to JSON for the text field
                entity.text = objectMapper.writeValueAsString(message);
                // Use incremental timestamps to maintain order (millisecond precision)
                entity.createdAt = baseTime.plusMillis(i);
                repository.save(entity);
            }

            Log.infof(”Updated messages for memory ID: %s with %d messages”, memoryIdString, messages.size());
        } catch (Exception e) {
            Log.errorf(e, “Failed to update messages for memory ID: %s”, memoryIdString);
            throw new RuntimeException(”Failed to update messages”, e);
        }
    }

    @Override
    @Transactional
    public List<ChatMessage> getMessages(Object memoryId) {
        String memoryIdString = memoryId.toString();
        try {
            List<ChatMessageEntity> entities = repository.findByMemoryId(memoryIdString);
            if (entities == null || entities.isEmpty()) {
                Log.debugf(”No messages found for memory ID: %s”, memoryIdString);
                return new ArrayList<>();
            }

            // Deserialize each entity back to ChatMessage
            List<ChatMessage> messages = entities.stream()
                    .map(entity -> {
                        try {
                            return objectMapper.readValue(entity.text, ChatMessage.class);
                        } catch (Exception e) {
                            Log.errorf(e, “Failed to deserialize message entity with ID: %d”, entity.id);
                            throw new RuntimeException(”Failed to deserialize message”, e);
                        }
                    })
                    .collect(Collectors.toList());

            Log.debugf(”Retrieved %d messages for memory ID: %s”, messages.size(), memoryIdString);
            return messages;
        } catch (Exception e) {
            Log.errorf(e, “Failed to get messages for memory ID: %s”, memoryIdString);
            throw new RuntimeException(”Failed to get messages”, e);
        }
    }

    @Override
    @Transactional
    public void deleteMessages(Object memoryId) {
        String memoryIdString = memoryId.toString();
        try {
            repository.deleteByMemoryId(memoryIdString);
            Log.infof(”Deleted messages for memory ID: %s”, memoryIdString);
        } catch (IllegalStateException e) {
            // EntityManagerFactory might be closed during shutdown - this is expected
            if (e.getMessage() != null && e.getMessage().contains(”EntityManagerFactory is closed”)) {
                Log.debugf(”Skipping delete for memory ID: %s - EntityManagerFactory is closed (shutdown in progress)”, memoryIdString);
                return;
            }
            // Re-throw if it’s a different IllegalStateException
            Log.errorf(e, “Failed to delete messages for memory ID: %s”, memoryIdString);
            throw new RuntimeException(”Failed to delete messages”, e);
        } catch (Exception e) {
            Log.errorf(e, “Failed to delete messages for memory ID: %s”, memoryIdString);
            throw new RuntimeException(”Failed to delete messages”, e);
        }
    }
}