Quarkus JFR: Find Performance Bugs Before Users Do
Use Java Flight Recorder on a small Quarkus service to spot blocking requests, allocation spikes, and ugly startup before they turn into production latency.
CPU charts do their job. They show CPU. The problem starts when we ask them why a request feels bad.
An endpoint can feel terrible to users while CPU still looks calm. A request might be waiting on a lock, sleeping inside a fake refresh, or burning memory on temporary objects that die young. Logs usually tell you the request was slow. Metrics tell you the average or the percentile moved. Neither tells you what shape the slowdown had.
That is where Java Flight Recorder helps. It is built into the JDK, Quarkus has a JFR extension, and one recording lets you line up request timing, thread behavior, allocations, GC, and startup. After that, you can stop guessing whether the problem was CPU, waiting, or memory churn.
So we build a deliberately suspicious Quarkus service and use JFR to answer a very normal production question: why an endpoint feels slow even though CPU still looks fine.
What we build
We build requestwatch-jfr, a small Quarkus 3.36.1 service on Java 25 with four interesting paths:
GET /requests/fastas the boring control caseGET /requests/blockingas the bad path that holds a global lock while it waitsGET /requests/blocking-fixedas the same work with the slow part moved outside the lockGET /requests/allocatingas the path that creates too many temporary buffers
The app also does one bad thing at startup on purpose, because I want a recording from JVM boot to show more than steady-state traffic.
When you finish, you will know what to look for in JFR when:
latency is high but CPU is not
one request path allocates far more than its neighbors
startup hides work before the first real request
This is not a JVM internals lecture. We are going to look for a few useful signals and stop there.
What you need
You want a recent JDK, Maven, and enough Quarkus comfort to read a REST resource and a small service class. The sample is tiny on purpose. The point is runtime shape, not framework archaeology.
JDK 25
Maven 3.9+
JDK Mission Control if you want the visual recording walkthrough
Podman if you want to follow the container section
About ☕️☕️☕️
Create the project
Start with a plain Quarkus app that has REST, JFR, and the Podman container image extension:
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=com.requestwatch \
-DprojectArtifactId=requestwatch-jfr \
-Dextensions="rest-jackson,jfr,container-image-podman" \
-DnoCode
cd requestwatch-jfrThe generated project I got here uses Quarkus 3.36.1 and maven.compiler.release 25.
Why these extensions:
quarkus-rest-jacksongives us JSON endpoints on Quarkus RESTquarkus-jfradds Quarkus events to the recording so request and startup activity are easier to placequarkus-container-image-podmangives us a direct path to a Podman-built JVM image through the container image guide
The code stays small on purpose. I want the recording to do the heavy lifting here.
Build the sample
The bad behavior is deliberate. I am not trying to build a realistic service in every detail. I want a small app that produces the same JFR shapes you get from real mistakes.
Add typed config
First, add a typed config mapping in src/main/java/com/requestwatch/RequestWatchConfig.java:
package com.requestwatch;
import io.smallrye.config.ConfigMapping;
import io.smallrye.config.WithDefault;
@ConfigMapping(prefix = "requestwatch")
public interface RequestWatchConfig {
Blocking blocking();
Allocation allocation();
Startup startup();
interface Blocking {
@WithDefault("150")
long delayMillis();
}
interface Allocation {
@WithDefault("768")
int buffers();
@WithDefault("8192")
int bufferSizeBytes();
}
interface Startup {
@WithDefault("true")
boolean enabled();
@WithDefault("400")
long delayMillis();
@WithDefault("256")
int buffers();
@WithDefault("16384")
int bufferSizeBytes();
}
}I use @ConfigMapping because it keeps the demo honest. The numbers that drive blocking, allocation, and startup behavior stay in one place instead of hiding in magic constants.
Keep the response types boring
Keep the response types boring. Add these three records under src/main/java/com/requestwatch/:
package com.requestwatch;
public record FastResponse(String endpoint, String threadName, String quoteVersion, int priceCents) {
}package com.requestwatch;
public record BlockingResponse(
String endpoint,
String threadName,
String quoteVersion,
int priceCents,
long simulatedDelayMs,
long elapsedMs) {
}package com.requestwatch;
public record AllocationResponse(
String endpoint,
String threadName,
int bufferCount,
int bufferSizeBytes,
long allocatedBytes,
long checksum,
long elapsedMs) {
}The DTOs are plain on purpose. If the response model starts becoming the story, we already drifted away from the real problem.
Add the blocking and fixed paths
Now add the part we actually care about in src/main/java/com/requestwatch/PricingService.java:
package com.requestwatch;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
@ApplicationScoped
class PricingService {
private final RequestWatchConfig config;
private final Object refreshLock = new Object();
private volatile SupplierQuote latestQuote = new SupplierQuote("startup", 1299);
@Inject
PricingService(RequestWatchConfig config) {
this.config = config;
}
FastResponse fast() {
SupplierQuote quote = latestQuote;
return new FastResponse("fast", Thread.currentThread().getName(), quote.version(), quote.priceCents());
}
BlockingResponse blocking() {
long start = System.nanoTime();
SupplierQuote quote;
synchronized (refreshLock) {
quote = refreshQuote();
latestQuote = quote;
}
return new BlockingResponse(
"blocking",
Thread.currentThread().getName(),
quote.version(),
quote.priceCents(),
config.blocking().delayMillis(),
elapsedMillis(start));
}
BlockingResponse blockingFixed() {
long start = System.nanoTime();
SupplierQuote quote = refreshQuote();
synchronized (refreshLock) {
latestQuote = quote;
}
return new BlockingResponse(
"blocking-fixed",
Thread.currentThread().getName(),
quote.version(),
quote.priceCents(),
config.blocking().delayMillis(),
elapsedMillis(start));
}
private SupplierQuote refreshQuote() {
sleep(config.blocking().delayMillis());
return new SupplierQuote("quote-" + System.nanoTime(), 1299);
}
private static long elapsedMillis(long start) {
return (System.nanoTime() - start) / 1_000_000L;
}
private static void sleep(long delayMillis) {
try {
Thread.sleep(delayMillis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IllegalStateException("Quote refresh interrupted", e);
}
}
private record SupplierQuote(String version, int priceCents) {
}
}This is the first thing I want to show. The bad path is slow, but not because it computes much. It holds a global lock and then sleeps. Real code does this with downstream HTTP calls, JDBC work, cache refreshes, or synchronized wrappers that looked harmless in review.
The fixed path does the same slow work, but it does not hold the lock during that wait. Under parallel traffic, that difference matters far more than the method names.
These endpoints return plain objects, so Quarkus REST treats them as blocking by signature and runs them off the IO thread by default. That is fine for this demo because the whole point is to make the waiting visible.
Add the allocation-heavy path
We also want one path that burns too much memory. Add src/main/java/com/requestwatch/AllocationService.java:
package com.requestwatch;
import java.util.ArrayList;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
@ApplicationScoped
class AllocationService {
private final RequestWatchConfig config;
@Inject
AllocationService(RequestWatchConfig config) {
this.config = config;
}
AllocationResponse allocating() {
long start = System.nanoTime();
List<byte[]> buffers = new ArrayList<>(config.allocation().buffers());
long checksum = 0;
long allocatedBytes = 0;
for (int i = 0; i < config.allocation().buffers(); i++) {
byte[] buffer = new byte[config.allocation().bufferSizeBytes()];
buffer[0] = (byte) i;
buffer[buffer.length - 1] = (byte) (i * 31);
buffers.add(buffer);
checksum += Byte.toUnsignedLong(buffer[0]) + Byte.toUnsignedLong(buffer[buffer.length - 1]);
allocatedBytes += buffer.length;
}
return new AllocationResponse(
"allocating",
Thread.currentThread().getName(),
buffers.size(),
config.allocation().bufferSizeBytes(),
allocatedBytes,
checksum,
elapsedMillis(start));
}
private static long elapsedMillis(long start) {
return (System.nanoTime() - start) / 1_000_000L;
}
}This one allocates 768 x 8192 bytes, so a single request creates 6,291,456 bytes of temporary heap data. That is enough to show up clearly in allocation events without turning the tutorial into an OOM story.
Add one ugly startup path
Startup deserves one bad example too. Add src/main/java/com/requestwatch/StartupWarmup.java:
package com.requestwatch;
import java.util.ArrayList;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;
import org.jboss.logging.Logger;
import io.quarkus.runtime.StartupEvent;
@ApplicationScoped
class StartupWarmup {
private static final Logger LOG = Logger.getLogger(StartupWarmup.class);
private final RequestWatchConfig config;
@Inject
StartupWarmup(RequestWatchConfig config) {
this.config = config;
}
void onStart(@Observes StartupEvent ignored) {
if (!config.startup().enabled()) {
return;
}
List<byte[]> buffers = new ArrayList<>(config.startup().buffers());
long allocatedBytes = 0;
for (int i = 0; i < config.startup().buffers(); i++) {
byte[] buffer = new byte[config.startup().bufferSizeBytes()];
buffer[0] = (byte) i;
buffers.add(buffer);
allocatedBytes += buffer.length;
}
sleep(config.startup().delayMillis());
LOG.infof("Startup warmup allocated %,d bytes across %d buffers", allocatedBytes, buffers.size());
}
private static void sleep(long delayMillis) {
try {
Thread.sleep(delayMillis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IllegalStateException("Startup warmup interrupted", e);
}
}
}I like having one startup smell in the same sample because people often think only about request latency. JFR is also good at showing work that happens before the first request exists.
Expose the endpoints
Finally, expose the paths in src/main/java/com/requestwatch/RequestWatchResource.java:
package com.requestwatch;
import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/requests")
@Produces(MediaType.APPLICATION_JSON)
public class RequestWatchResource {
private final PricingService pricingService;
private final AllocationService allocationService;
@Inject
public RequestWatchResource(PricingService pricingService, AllocationService allocationService) {
this.pricingService = pricingService;
this.allocationService = allocationService;
}
@GET
@Path("/fast")
public FastResponse fast() {
return pricingService.fast();
}
@GET
@Path("/blocking")
public BlockingResponse blocking() {
return pricingService.blocking();
}
@GET
@Path("/blocking-fixed")
public BlockingResponse blockingFixed() {
return pricingService.blockingFixed();
}
@GET
@Path("/allocating")
public AllocationResponse allocating() {
return allocationService.allocating();
}
}The extra /blocking-fixed endpoint is not how I would ship a real feature, obviously. It is there so we can compare the bad and good shapes in the same sample without turning the article into a branch-management exercise.
Configure the sample
Set the sample values in src/main/resources/application.properties:
requestwatch.blocking.delay-millis=150
requestwatch.allocation.buffers=768
requestwatch.allocation.buffer-size-bytes=8192
requestwatch.startup.enabled=true
requestwatch.startup.delay-millis=400
requestwatch.startup.buffers=256
requestwatch.startup.buffer-size-bytes=16384
quarkus.container-image.group=requestwatch
quarkus.container-image.name=requestwatch-jfr
quarkus.container-image.tag=1.0.0The startup warmup allocates a little more than 4 MB and waits 400 ms. The blocking delay is 150 ms. Those numbers are big enough to show up clearly in a recording without making local feedback painful.
Set requestwatch.startup.enabled=false when you only care about request paths and want faster test boots. Just remember that choice removes the startup story from the recording entirely, which is an easy miss when you are trying to explain slow first boots.
Prove it with tests
Then prove the shape with src/test/java/com/requestwatch/RequestWatchResourceTest.java:
package com.requestwatch;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.Executors;
import java.util.stream.IntStream;
import org.junit.jupiter.api.Test;
import io.quarkus.test.common.http.TestHTTPEndpoint;
import io.quarkus.test.common.http.TestHTTPResource;
import io.quarkus.test.junit.QuarkusTest;
import io.restassured.RestAssured;
import io.restassured.path.json.JsonPath;
@QuarkusTest
@TestHTTPEndpoint(RequestWatchResource.class)
class RequestWatchResourceTest {
@TestHTTPResource
URI baseUri;
@Test
void fastEndpointReturnsCachedQuote() {
RestAssured.get("fast")
.then()
.statusCode(200)
.body("endpoint", org.hamcrest.Matchers.is("fast"))
.body("priceCents", org.hamcrest.Matchers.is(1299));
}
@Test
void allocatingEndpointReportsTemporaryBuffers() {
RestAssured.get("allocating")
.then()
.statusCode(200)
.body("endpoint", org.hamcrest.Matchers.is("allocating"))
.body("bufferCount", org.hamcrest.Matchers.is(768))
.body("bufferSizeBytes", org.hamcrest.Matchers.is(8192))
.body("allocatedBytes", org.hamcrest.Matchers.is(6_291_456));
}
@Test
void fixedEndpointAvoidsSerializedLatency() throws Exception {
List<Long> blockingTimes = sampleElapsedTimes("blocking");
List<Long> fixedTimes = sampleElapsedTimes("blocking-fixed");
long slowestBlocking = blockingTimes.stream().mapToLong(Long::longValue).max().orElseThrow();
long slowestFixed = fixedTimes.stream().mapToLong(Long::longValue).max().orElseThrow();
assertTrue(slowestBlocking >= 500, "blocking endpoint should serialize parallel requests");
assertTrue(slowestFixed < 350, "fixed endpoint should avoid serialized latency");
assertTrue(slowestBlocking - slowestFixed >= 200, "fixed endpoint should be meaningfully faster");
}
private List<Long> sampleElapsedTimes(String path) throws Exception {
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
HttpClient client = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(2))
.executor(executor)
.build();
URI requestUri = URI.create(baseUri.toString() + "/" + path);
List<CompletableFuture<Long>> futures = IntStream.range(0, 4)
.mapToObj(ignored -> send(client, requestUri))
.toList();
return futures.stream().map(CompletableFuture::join).toList();
}
}
private CompletableFuture<Long> send(HttpClient client, URI requestUri) {
HttpRequest request = HttpRequest.newBuilder(requestUri)
.timeout(Duration.ofSeconds(5))
.GET()
.build();
return client.sendAsync(request, HttpResponse.BodyHandlers.ofString())
.thenApply(response -> {
assertEquals(200, response.statusCode());
return JsonPath.from(response.body()).getLong("elapsedMs");
});
}
}Run the suite:
./mvnw testOn this sample, the bad path stays above 500 ms for the slowest request when four calls hit it together, while the fixed path stays below 350 ms. Your exact numbers will move. The shape should stay the same.
Record the app in dev mode
The Quarkus JFR guide shows the key startup flag. We use the same pattern here, but I prefer settings=profile so the recording carries enough allocation and lock detail to be useful.
Start dev mode with JFR enabled from JVM startup:
./mvnw quarkus:dev \
-Djvm.args="-XX:StartFlightRecording=name=requestwatch,settings=profile,dumponexit=true,filename=target/requestwatch-dev.jfr"Because JFR starts with the JVM, the recording sees both the startup warmup and the later HTTP traffic. That matters. If you attach too late, you miss the bad startup story entirely.
In another terminal, trigger the paths:
curl -s http://localhost:8080/requests/fast
for i in 1 2 3 4; do
curl -s http://localhost:8080/requests/blocking &
done
wait
curl -s http://localhost:8080/requests/allocating
for i in 1 2 3 4; do
curl -s http://localhost:8080/requests/blocking-fixed &
done
waitThen stop the app with q or Ctrl+C. The recording lands in target/requestwatch-dev.jfr.
Read the recording without guessing
I start with the jfr CLI. It is faster than opening a UI when you only want to confirm that the recording captured something useful.
Get the high-level picture:
jfr summary target/requestwatch-dev.jfrSee the Quarkus-specific events the extension added:
jfr print --categories quarkus target/requestwatch-dev.jfrLook directly at the waiting side of the blocking path:
jfr print \
--events jdk.ThreadSleep,jdk.JavaMonitorEnter \
target/requestwatch-dev.jfrLook at the allocation side:
jfr print \
--events jdk.ObjectAllocationInNewTLAB,jdk.ObjectAllocationOutsideTLAB \
target/requestwatch-dev.jfrThat gives you the text view. After that, open the same file in JDK Mission Control for the visual pass.
The Quarkus JFR guide points out one easy miss: the default thread view does not show Quarkus lanes. In JMC, open the Threads view, edit the thread activity lanes, and add a Quarkus lane. After that, request and startup activity stop blending into generic JVM noise.
Fast endpoint
This is the control case. It should be dull. Short request activity, no obvious allocation spike, no queue of waiting threads. Boring is the point. You need one path that does not smell.
Blocking endpoint
This path answers the opening question.
You should see several requests overlap in time, but one worker thread keeps the lock while it sleeps. The others stack up behind it. CPU is not the star here because the time is mostly waiting time. In the thread view and event list, the interesting words are things like sleep, monitor enter, and longer request duration, not hot arithmetic.
That is the payoff JFR gives you that a latency metric alone does not: the request is not slow because it computes too much. It is slow because it waits while holding the wrong lock.
Allocation-heavy endpoint
This one should jump out in the allocation views. byte[] allocations dominate, and you may see GC activity tighten around the request. The endpoint code is still simple, which is useful. You do not need a large system to learn the difference between CPU work and allocation pressure.
Startup
At the beginning of the recording, before traffic matters, StartupWarmup.onStart shows up with both allocation and sleep. This is the part teams often miss because the app “eventually starts fine.” JFR makes startup cost concrete instead of anecdotal.
If you only remember one reading rule from this article, make it this one:
When latency is bad and CPU is calm, start by looking for waiting and contention before you start arguing about optimization.
Fix one issue and compare before and after
The bad method keeps the slow refresh inside the synchronized block:
BlockingResponse blocking() {
long start = System.nanoTime();
SupplierQuote quote;
synchronized (refreshLock) {
quote = refreshQuote();
latestQuote = quote;
}
return new BlockingResponse(
"blocking",
Thread.currentThread().getName(),
quote.version(),
quote.priceCents(),
config.blocking().delayMillis(),
elapsedMillis(start));
}The fixed method does the wait first and locks only for the state update:
BlockingResponse blockingFixed() {
long start = System.nanoTime();
SupplierQuote quote = refreshQuote();
synchronized (refreshLock) {
latestQuote = quote;
}
return new BlockingResponse(
"blocking-fixed",
Thread.currentThread().getName(),
quote.version(),
quote.priceCents(),
config.blocking().delayMillis(),
elapsedMillis(start));
}That is not a big fix simple enough to make a real difference. Like in many situations.
Run the same parallel traffic against both endpoints and compare the returned elapsedMs values:
for i in 1 2 3 4; do
curl -s http://localhost:8080/requests/blocking &
done
wait
for i in 1 2 3 4; do
curl -s http://localhost:8080/requests/blocking-fixed &
done
waitOn this sample, the bad path crosses 500 ms for the slowest concurrent request. The fixed path stays below 350 ms. In JFR, the stacked waiting shape shrinks with it.
That is the before-and-after loop I want from a tutorial like this. We did not just say “JFR is useful.” We used it to point at one mistake, changed the code, and confirmed the shape changed.
Run the same workflow in a Podman container
The Quarkus container image guide says to build with quarkus.container-image.build=true. With quarkus-container-image-podman on the classpath, that build path uses Podman.
Build the image:
./mvnw install -Dquarkus.container-image.build=trueNow run it and write the recording to a host-mounted directory:
mkdir -p recordings
podman run --rm -p 8080:8080 \
-v "$(pwd)/recordings:/recordings" \
-e JAVA_OPTS_APPEND="-XX:StartFlightRecording=name=requestwatch,settings=profile,dumponexit=true,filename=/recordings/requestwatch-container.jfr" \
requestwatch/requestwatch-jfr:1.0.0That JAVA_OPTS_APPEND trick works because the generated JVM Dockerfile uses Red Hat’s run-java.sh launcher. It appends your extra JVM flags instead of making you replace the whole startup command.
Drive the same endpoints, stop the container, and open recordings/requestwatch-container.jfr in JMC or with the jfr CLI. The workflow stays the same. That is the nice part.
Close the loop
I do not think the lesson here is “install one more observability tool.” The useful part is seeing that performance problems have different shapes, and logs plus coarse metrics often flatten those shapes into one vague latency number. In one Quarkus recording we lined up a fast control path, lock-bound waiting, allocation pressure, and startup work that should not hide from you. That is enough to make JFR practical. You do not need to turn into a JVM specialist. You need to know what you are looking for when a request is slow and CPU still looks calm.
The full source for requestwatch-jfr is on my GitHub.


