The LangChain4j Proficiency Exam: 50 Questions for Senior Java Devs (2026 Edition)

Here is a straight checklist of what shows up in interviews when teams hire for LangChain4j and JVM LLM work—for the interviewer and for you if you are the candidate.

Feb 01, 2026

Here is a straight checklist of what shows up in interviews when teams hire for LangChain4j and JVM LLM work, for the interviewer and for you if you are the candidate.

LangChain4j went from “interesting experiment” to something people actually run in production on Java. So it shows up in interviews. The questions are usually more than trivia. Interviewers use them to see whether you understand how an LLM behaves inside a real service. Knowing how to call an HTTP API is not enough on its own.

This is partly a cheat sheet. It is also a map of the mental model. Each block matches how questions tend to cluster, and what the person across the table is usually trying to learn.

If you can explain this stuff in plain language, you are past the “I tried AI in Java” phase.

What this edition assumes (2026): Maven artifacts sit under dev.langchain4j, on whatever 1.x line you are on when you read this (I wrote it against the 1.12.x era). The chat side is ChatModel plus ChatRequest / ChatResponse. Old blog posts still say ChatLanguageModel or generate(). Same idea, new names. On Spring Boot you normally pull in a provider starter (OpenAI, etc.) and langchain4j-spring-boot-starter if you want declarative @AiService beans. On Quarkus, many teams use Quarkus LangChain4j from Quarkiverse on top of the same library.

Essential resources

Before you work through the list, bookmark these:

Part 1: The landscape and basics

What they want to hear: you know what the library is for. Not “it is a thin wrapper around OpenAI.”

1. What is LangChain4j?

It is a Java library for putting LLMs into normal applications. One surface for several providers (OpenAI, Vertex, Bedrock, …). It also carries the boring glue for memory, RAG, tools, and agents on the JVM.

2. How does LangChain4j differ from the Python LangChain library?

They share ideas with Python LangChain, but LangChain4j is not a line-by-line port. It feels like Java: strong types, “AI Services” as interfaces instead of generic “chains,” and it plugs into Quarkus and Spring Boot the way normal Java apps do. There is Kotlin on the side if you like a shorter syntax. The center of gravity is still Java.

3. What are the main high-level components?

The “Big Three” are Models (Chat, Embedding), Stores (Vector DBs, Chat Memory), and Services (AI Services, RAG Retrievers). Supporting components include Prompt Templates, Document Loaders, and Tools.

4. Which LLM providers are supported?

It is provider-agnostic. It supports cloud giants (OpenAI, Azure, Google Vertex AI, Amazon Bedrock, Anthropic) and local/open implementations (Ollama, LocalAI, HuggingFace).

5. What is the standard interface for chatting?

ChatModel. It wraps the provider’s HTTP or SDK behind Java types. In real code you usually build a ChatRequest: messages, ChatRequestParameters (temperature, maxOutputTokens, tools, responseFormat, …). You get back a ChatResponse with AiMessage and ChatResponseMetadata (token counts live there). You can still call chat(String) when you want a quick test.

Learn more: Chat and language models

6. What is the difference between ChatModel and StreamingChatModel?

ChatModel waits until the full answer is there. StreamingChatModel pushes pieces as they arrive. With AI Services you often get a TokenStream, register callbacks (onPartialResponse, and on newer stacks also partial tool calls or “thinking” chunks when the provider sends them), and keep the UI from freezing. On Spring Boot you can return Flux<String> if you add the Reactor integration.

Learn more: Response streaming

7. What is an AiMessage vs UserMessage?

These represent the dialogue roles. UserMessage is input from the human; AiMessage is the response from the model. There is also SystemMessage (instruction) and ToolExecutionResultMessage (function output).

UserMessage userMsg = UserMessage.from("What is the weather?");
SystemMessage systemMsg = SystemMessage.from("You are a helpful weather assistant");
AiMessage aiResponse = AiMessage.from("I'll help you check the weather");

Part 2: The Java superpower (AI Services)

For Java developers this is the important block. If AI Services are new to you, you are probably fighting the library instead of using it.

8. What are “AI Services” in LangChain4j?

This is the main feature most teams actually use. You write a Java interface with methods. LangChain4j builds a proxy at runtime. Prompt wiring and parsing stay behind that interface, so you work with types instead of raw strings everywhere.

Learn more: AI Services

Example:

interface Assistant {
    String chat(String userMessage);
}

Assistant assistant = AiServices.create(Assistant.class, chatModel);
String response = assistant.chat("Tell me a joke");

9. How do you handle structured outputs (e.g. JSON) using AI Services?

Put the type you want on the method, for example Person extractPerson(String text). LangChain4j steers the model toward JSON and parses into your POJO or record (Jackson is the usual story).

10. How do you integrate LangChain4j with Spring Boot?

You need two Maven pieces. First, a provider starter such as langchain4j-open-ai-spring-boot-starter. That wires ChatModel / StreamingChatModel from application.properties. Second, langchain4j-spring-boot-starter. That finds @AiService interfaces and registers implementations as beans. Keys look like langchain4j.open-ai.chat-model.api-key and langchain4j.open-ai.chat-model.model-name.

Learn more: Spring Boot integration

11. What is a PromptTemplate?

A template string with holes, e.g. {{name}}. Same shape every time you call the model, and fewer random string literals in the middle of your code.

12. How are prompt variables injected in an AI Service?

You use the @V annotation on method parameters. For example: String chat(@V("name") String name). This maps the Java argument to the {{name}} variable in the prompt template.

Example:

interface Greeter {
    @UserMessage("Hello {{name}}, welcome to {{place}}!")
    String greet(@V("name") String name, @V("place") String place);
}

13. What is the @SystemMessage annotation used for?

It is placed on an AI Service interface (or method) to define the “persona” or rules for the LLM (e.g. “You are a helpful coding assistant who only speaks JSON”).

14. Why would you prefer AI Services over “Chains”?

Python tutorials love “chains.” In Java, AI Services are the natural fit: you get types, you can test boundaries, and it looks like the rest of your service layer and DI setup.

Part 3: Memory and context

They want to know you can keep state. Can the bot remember what I said five seconds ago?

15. What is ChatMemory?

It holds the conversation history. No memory means every call to the model is a fresh island. Nothing carries over.

Learn more: Chat memory

16. How does MessageWindowChatMemory work?

It acts as a sliding window, keeping only the last N messages. As new messages arrive, the oldest ones drop off to ensure the context window doesn’t overflow.

17. What is TokenWindowChatMemory?

Same idea as a window, but you count tokens, not messages. You squeeze more useful text into the limit before the model refuses or truncates.

18. Where is ChatMemory stored?

By default, it is in-memory (volatile). For production, you must implement a persistent store (using ChatMemoryStore) to save history to a database (Redis, SQL) keyed by the memoryId (usually user ID).

19. What is a SystemMessage and why is it distinct?

A SystemMessage is the rules and persona. Normal memory trimming usually keeps it at the top. User and assistant lines get dropped first when the window fills.

Part 4: RAG (retrieval-augmented generation)

RAG is still the use case you see most in serious Java shops. Interviewers go deep here. If you follow my RAG tutorials on The Main Thread, they match this section.

20. What is RAG?

Retrieval Augmented Generation. It is a pattern where you fetch private data (that the model wasn’t trained on) and inject it into the prompt context before the model generates an answer.

Learn more: RAG (retrieval-augmented generation)

21. What is an Embedding?

A vector (list of numbers) representing the semantic meaning of text. Similar concepts (like “Dog” and “Puppy”) will be mathematically close to each other in this vector space.

22. What is an EmbeddingStore?

A specialized database (Vector DB) designed to store and search embeddings. Examples supported include Pinecone, Weaviate, Milvus, and PGVector.

23. What is the role of the EmbeddingStoreIngestor?

EmbeddingStoreIngestor is the batch side: load documents, split, embed, write to the store. You run it offline or whenever new content lands. ingest() is the single entry point that runs that pipeline.

24. What is a DocumentSplitter and why is it vital?

LLMs have input limits. You cannot embed a 500-page PDF at once. A splitter chunks the document into smaller pieces (e.g. paragraphs) while trying to preserve context (e.g. not cutting a sentence in half).

25. What is a ContentRetriever?

The runtime interface that fetches data. When a user asks a question, the ContentRetriever queries the EmbeddingStore to find relevant text chunks.

26. What is semantic search vs keyword search?

Keyword search matches exact words. Semantic search (using embeddings) matches meaning. “My car is broken” will match “Auto repair shops” semantically, even though they share no keywords.

27. What is re-ranking (using a ScoringModel)?

Vector search is cheap and fast. It is also wrong often enough to hurt RAG. A simple fix: pull more chunks than you need, then let a ScoringModel rerank (Jina, Vertex ranking API, watsonx, etc.). In LangChain4j that usually means a RetrievalAugmentor and a ContentAggregator such as ReRankingContentAggregator around your ScoringModel.

28. How do you implement “naive RAG” in LangChain4j?

Wire a ContentRetriever (often an EmbeddingStoreContentRetriever backed by an EmbeddingStore and EmbeddingModel) into AiServices. The framework retrieves segments, injects them into the prompt, and returns the model answer. For more control (multiple retrievers, reranking, query transformation), configure a RetrievalAugmentor instead of only a single retriever.

29. What is “metadata” in a Document?

Key-value pairs attached to the text (e.g. author, date, access_level). This allows for metadata filtering—for example, “Only search documents where year > 2023.”

30. How do you handle “hybrid search”?

How you do hybrid search depends on the store. PGVector can mix vector + full-text with SearchMode.HYBRID and RRF. You normally pass both the query embedding and the raw query string. Other databases have their own switches or a second index. Same goal everywhere: semantics for paraphrase, keywords for SKUs and acronyms.

Part 5: Tools and agents

After RAG, interviews often move here: tools, function calling, agents.

31. What are “Tools” in LangChain4j?

Tools are Java methods annotated with @Tool. They allow the LLM to interact with the outside world (APIs, Databases, Calculators).

Learn more: Tools (function calling)

32. How does “function calling” actually work?

The LLM does not execute the code. It analyzes the prompt, realizes it needs data, and returns a structured request: “Please call function getWeather with argument London.” LangChain4j intercepts this, runs the Java method, and feeds the result back to the LLM.

33. What is an Agent?

An Agent is an LLM that acts autonomously. It is given a goal and a set of tools, and it loops through a cycle of “Thought → Action (tool call) → Observation” until the goal is met.

Learn more: Agents and agentic AI

34. How do you give a Tool to an AI Service?

You register the bean containing @Tool methods when building the AI Service. The library automatically extracts the method signature and Javadoc description to create the tool definition for the LLM.

35. Why are Javadocs important for Tools?

The LLM reads the Javadoc to understand when and how to use the tool. If your Javadoc is ambiguous, the model will fail to call the tool correctly.

Part 6: Production and architecture

Harder questions: limits, safety, ops, cost.

36. How do you handle token limits?

Use TokenWindowChatMemory, summarize old turns with another model call, or stop pasting the whole chat into every request. RAG helps because you pull only the chunks that match the question.

37. How do you handle LLM hallucinations?

RAG gives the model text it can lean on. Low temperature makes answers more boring and often more faithful. Ask for citations or chunk IDs when a human must check the answer. Output guardrails in LangChain4j can reject bad JSON or policy violations and retry before the user sees anything.

38. What is ImageContent?

Multimodal models want more than text. ImageContent is how you attach an image next to the user message (GPT-4o-style models, and similar).

39. How do you implement streaming with an AI Service?

Declare the method return type as TokenStream, build the service from a StreamingChatModel, then call start() after registering callbacks for partial text, completion, and errors. On Spring Boot, you can alternatively return Flux<String> from an @AiService method when the Reactor integration is on the classpath.

40. What is a ModerationModel?

A separate model or API that scores text for policy (hate, violence, etc.) before you send a prompt upstream or return an answer to the client.

41. How do you handle errors/retries?

LangChain4j relies on normal Java exceptions. Many ChatModel implementations still let you set retries (for example 429) on the HTTP client.

42. How do you switch LLM providers?

You depend on ChatModel, not one vendor class. Swap the builder (OpenAiChatModel vs VertexAiGeminiChatModel), fix config and keys, run your tests again.

43. What is “observability” in this context?

Tracking inputs, outputs, latency, failures, and token usage per call. Implement ChatModelListener (onRequest / onResponse / onError) to inspect ChatRequest, ChatResponse, and TokenUsage in ChatResponseMetadata, or attach MicrometerMetricsChatModelListener for metrics. EmbeddingStoreListener covers vector store operations. Spring Boot can expose these as beans for centralized logging and tracing.

44. What are the security risks of Tools?

Prompt injection. A user might trick the LLM into using a tool destructively (e.g. “Delete all users”). You must strictly scope what tools can do and validate all inputs inside the tool method.

45. What is semantic caching?

Semantic caching here means: embed the question, search a cache of past questions by similarity, and if you find a close hit, return the old answer. Same intent, different words. Provider prompt caching is a different thing: the vendor caches a long fixed prefix (system prompt, big doc) so repeat calls cost less. Both get called “caching” in conversation. Ask which one they mean.

46. How do you test an LLM application?

Unit tests on free text are painful because the model does not give you the same string twice. Many teams keep a set of golden questions and use a stronger model as a grader, or they assert on structure and tools instead of exact prose.

47. What are Document Loaders and how do you customize them?

LangChain4j provides various document loaders like FileSystemDocumentLoader for loading files from the filesystem. For proprietary formats (e.g. a legacy bank mainframe export), you can create custom loaders by implementing the DocumentLoader interface to parse specific formats into LangChain4j Document objects. The FileSystemDocumentLoader supports methods like loadDocuments() and loadDocumentsRecursively() with optional path matchers for filtering.

48. How do you manage complexity in prompts?

Split templates and nest them. When the prompt gets fuzzy, push logic into Tools and keep the prompt short. Code wins over five pages of instructions most of the time.

49. What is the biggest performance bottleneck in RAG?

Usually the embedding generation or the vector store retrieval latency. Or, if you retrieve too many chunks, the input token processing time (time to first token) increases.

50. What is the future of LangChain4j?

More agent-style flows (tools, planning, multi-hop retrieval). Guardrails as a normal part of the stack (@InputGuardrails, @OutputGuardrails). More built-in observability (listeners, Micrometer). On Quarkus, the extension talks a lot about MCP and agent-to-agent style wiring while keeping declarative AI services. On the model side, people run smaller or local models (Ollama, internal endpoints) next to the big cloud APIs to control cost and data residency.

You’ve done it

That is the core map for LangChain4j on the JVM in production. If you know this list, you can hold your own in most senior interviews. You can also build services that do not fall apart the first week.

The short version: LangChain4j is not “HTTP to OpenAI.” It is the wiring for memory, RAG, tools, types, and ops so a Java team can own an LLM feature like any other service.

Common beginner mistakes to avoid

Not setting API keys properly — Use environment variables; never hardcode secrets.
Ignoring token limits — Models have context window limits.
Forgetting to handle exceptions — API calls can fail (rate limits, network issues).
Not using ChatMemory — Your bot won’t remember previous messages without it.
Skipping AI Services — Using low-level APIs directly is harder to maintain.
Ignoring costs — Every API call costs money; use caching and optimize prompts.
Not testing with different models — Different models have different capabilities.
Forgetting about streaming — Users expect real-time responses in chat applications.

Beginner’s checklist

Before your interview or first project, make sure you can:

Set up a basic LangChain4j project with Maven/Gradle (dev.langchain4j artifacts).
Configure an API key securely (environment variables or your framework’s config).
Create a ChatModel and call it via ChatRequest / ChatResponse (or the chat(String) shortcut).
Define an AI Service interface (AiServices or Spring @AiService / Quarkus @RegisterAiService).
Use @UserMessage and @SystemMessage annotations.
Implement basic ChatMemory.
Load and split a document for RAG.
Create a simple @Tool method.
Handle streaming (TokenStream or Spring Flux<String>).
Attach at least one ChatModelListener or metrics listener and know where TokenUsage appears.
Know how @InputGuardrails / @OutputGuardrails fit into a service.
Switch between different LLM providers.

Interview preparation tips

Hands-on practice: Clone the examples repos (ejq and main-thread) and run every example.
Understand the “why”: Memorization alone is weak. Be able to say why each pattern exists.
Know the tradeoffs: Be ready to discuss pros and cons of different approaches.
Real-world scenarios: Think about how you’d use these in production.
Cost awareness: Be able to discuss token usage and optimization strategies.
Security mindset: Understand prompt injection and other risks.

Additional learning resources

Official documentation and tutorials

Hands-on examples

Examples repository (Spring, RAG, agents, tools)

Framework integrations

Quarkus LangChain4j (Quarkiverse) — declarative AI services, Dev UI, Dev Services
LangChain4j’s Quarkus page (library-level)
Spring Boot integration

Community and support

GitHub Discussions

Advanced topics

Pro tip: Clone the examples repo and run a few modules until they make sense. Reading this file twice does not replace that.

Buhake Sindi

Feb 5

For question #2, LangChain4J-CDI (a LangChain4J project) was built as part of the integration of LangChain4J with Jakarta EE CDI ecosystem. So, it's not just an only Quarkus and Spring Boot integration. LangChain4J can be used on any Java enterprise frameworks that supports CDI too.

Discussion about this post

Ready for more?