Bulletproof APIs with Quarkus: Caching, Rate Limiting, and Fault Tolerance Made Simple

Learn how to deliver APIs that stay fast, fair, and resilient using ETags, Bucket4j, and MicroProfile Fault Tolerance in Quarkus.

Sep 26, 2025

When you run an API in production, it doesn’t matter how elegant your code is if the service feels slow, collapses under bursts of traffic, or crumbles when a dependency stumbles. Users expect instant responses, operators expect predictable load, and architects expect systems that bend without breaking. That means you need more than just endpoints. You need cacheability, fairness, and resilience baked into the design.

In this tutorial, we’ll take a practical path with Quarkus: adding caching headers and strong ETags so clients don’t fetch what hasn’t changed, enforcing fair-use policies with Bucket4j to keep traffic under control, and applying timeouts, retries, and circuit breakers with MicroProfile Fault Tolerance so downstream failures don’t take you down.

Prerequisites

Java 21
Maven 3.9+
Quarkus CLI
cURL or HTTPie for verification

Bootstrap the project

quarkus create app org.acme:api-perf-resilience:1.0.0 \
  -x rest-jackson,smallrye-fault-tolerance,io.quarkiverse.bucket4j:quarkus-bucket4j
cd api-perf-resilience

rest is Quarkus REST.
smallrye-fault-tolerance implements MicroProfile Fault Tolerance.
quarkus-bucket4j adds a dead-simple @RateLimited annotation with flexible configuration.

If you don’t want to start from scratch, go to Github and grab the project from my repository.

Configuration

We define a shared “api” bucket with two limits: fast burst protection and a longer fair-use window. The built-in IpResolver makes it per-client-IP; swap in your own resolver for API-keys or tenants. Add the following to application.properties

# Bucket4j: two layered limits for bucket "api"
quarkus.rate-limiter.buckets.api.shared=true
quarkus.rate-limiter.buckets.api.limits[0].permitted-uses=10
quarkus.rate-limiter.buckets.api.limits[0].period=1S
quarkus.rate-limiter.buckets.api.limits[1].permitted-uses=100
quarkus.rate-limiter.buckets.api.limits[1].period=5M

# Optional: keep buckets around briefly after refill
quarkus.rate-limiter.keep-after-refill=15M

Core implementation

We’ll build three pieces:

A ProductResource that returns resources with strong ETags and proper cache control, and supports conditional GETs.
Rate limiting on selected endpoints.
A PriceService that simulates a flaky downstream call, protected with fault-tolerance annotations.

Domain model

package org.acme.domain;

import java.time.Instant;

public record Product(String id, String name, int stock, Instant lastUpdated) {}

Simple in-memory repository

package org.acme.repo;

import java.time.Instant;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

import org.acme.domain.Product;

import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class ProductRepo {
    private final Map<String, Product> data = new ConcurrentHashMap<>();

    public ProductRepo() {
        var now = Instant.now();
        data.put("1", new Product("1", "Coffee Beans", 42, now));
        data.put("2", new Product("2", "Espresso Machine", 5, now));
    }

    public Product find(String id) {
        return data.get(id);
    }

    public Product updateStock(String id, int stock) {
        var existing = data.get(id);
        if (existing == null)
            return null;
        var updated = new Product(id, existing.name(), stock, Instant.now());
        data.put(id, updated);
        return updated;
    }
}

ETag + Cache-Control + Conditional GET

ETag stands for Entity Tag. It is an HTTP response header defined in RFC 7232. An ETag is a unique identifier for a specific version of a resource. Think of it as a fingerprint of the resource at a given point in time.

Where do ETags come from?
Servers generate ETags whenever they send back a representation of a resource. The value can be:

A hash of the content (e.g. SHA-256 of the JSON payload).
A version or timestamp from the database (lastUpdated).
A simple incrementing version number.

In Quarkus and Jakarta REST, you can create ETags explicitly with EntityTag. There is no automatic generation, because only you know which fields define when a resource has “changed”.

How are they used?
ETags work together with conditional request headers:

The server includes an ETag header in the response:

ETag: "abc123"

The client caches the response and, on the next request, sends it back in an If-None-Match header:

If-None-Match: "abc123"

The server compares the provided ETag with the current one.
- If they match, the resource has not changed → server returns 304 Not Modified, no body.
- If they differ, the server returns 200 OK with the new resource and a new ETag.

Why does this matter?

Performance: Clients don’t download unchanged data. Saves bandwidth and reduces server load.
Consistency: ETags allow clients to detect if their cached version is stale.
Concurrency control: For updates, clients can send If-Match with the ETag to prevent overwriting a resource that changed in the meantime.

Strong vs. Weak ETags

Strong ETags change on any modification of the resource. They’re reliable for concurrency control.
Weak ETags (prefixed with W/) change only on semantically significant modifications. They’re cheaper to compute but less precise.

In our example, we generate a strong ETag using SHA-256 over the product’s identity and lastUpdated timestamp. This guarantees that even small changes result in a new tag.

package org.acme.api;

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.time.Duration;
import java.util.HexFormat;

import org.acme.domain.Product;
import org.acme.repo.ProductRepo;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.NotFoundException;
import jakarta.ws.rs.PUT;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.CacheControl;
import jakarta.ws.rs.core.Context;
import jakarta.ws.rs.core.EntityTag;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Request;
import jakarta.ws.rs.core.Response;

@Path("/products")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class ProductResource {

    @Inject
    ProductRepo repo;

    @Context
    Request request;

    @GET
    @Path("{id}")
    public Response get(@PathParam("id") String id) {
        Product p = repo.find(id);
        if (p == null)
            throw new NotFoundException("No product " + id);

        EntityTag etag = new EntityTag(strongEtag(p));
        // Handle conditional GET: If-None-Match -> 304
        Response.ResponseBuilder precond = request.evaluatePreconditions(etag);
        if (precond != null) {
            return precond
                    .cacheControl(cacheOneMinute())
                    .build(); // 304
        }

        return Response.ok(p)
                .tag(etag)
                .cacheControl(cacheOneMinute())
                .build();
    }

    @PUT
    @Path("{id}/stock/{qty}")
    public Response updateStock(@PathParam("id") String id, @PathParam("qty") int qty) {
        Product updated = repo.updateStock(id, qty);
        if (updated == null)
            throw new NotFoundException();
        EntityTag etag = new EntityTag(strongEtag(updated));
        return Response.ok(updated)
                .tag(etag)
                .cacheControl(noStore()) // writes are not cacheable
                .build();
    }

    private static CacheControl cacheOneMinute() {
        CacheControl cc = new CacheControl();
        cc.setMaxAge((int) Duration.ofMinutes(1).getSeconds());
        cc.setPrivate(false);
        return cc;
    }

    private static CacheControl noStore() {
        CacheControl cc = new CacheControl();
        cc.setNoStore(true);
        return cc;
    }

    // Strong ETag based on immutable fields that change on update
    private static String strongEtag(Product p) {
        try {
            MessageDigest md = MessageDigest.getInstance("SHA-256");
            String payload = p.id() + "|" + p.name() + "|" + p.stock() + "|" + p.lastUpdated().toEpochMilli();
            byte[] digest = md.digest(payload.getBytes(StandardCharsets.UTF_8));
            return HexFormat.of().formatHex(digest);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

Generate a strong ETag that changes whenever the resource changes.
Send Cache-Control: public, max-age=60 to allow short-term caching.
On GET, evaluate If-None-Match using JAX-RS Request.evaluatePreconditions. Return 304 Not Modified if the ETag matches.

Rate limiting with Bucket4j

Add @RateLimited(bucket = "api", identityResolver = io.quarkiverse.bucket4j.runtime.resolver.IpResolver.class) to any CDI bean method (resource method or service). The configured bucket rules apply.

We’ll protect the GET /products/{id} endpoint:

package org.acme.api;

import io.quarkiverse.bucket4j.runtime.RateLimited;
import io.quarkiverse.bucket4j.runtime.resolver.IpResolver;
import jakarta.ws.rs.*;
import jakarta.ws.rs.core.*;
import jakarta.inject.Inject;
import org.acme.domain.Product;
import org.acme.repo.ProductRepo;

@Path("/limited-products")
@Produces(MediaType.APPLICATION_JSON)
public class RateLimitedProductResource {

    @Inject ProductRepo repo;

    @Context Request request;

    @GET
    @Path("{id}")
    @RateLimited(bucket = "api", identityResolver = IpResolver.class)
    public Response getLimited(@PathParam("id") String id) {
        Product p = repo.find(id);
        if (p == null) throw new NotFoundException();
        EntityTag etag = new EntityTag(Integer.toHexString(p.hashCode()));
        Response.ResponseBuilder precond = request.evaluatePreconditions(etag);
        if (precond != null) return precond.build();
        return Response.ok(p).tag(etag).build();
    }
}

When a bucket is exhausted, the extension throws a RateLimitException. Map it cleanly to HTTP 429 Too Many Requests with a helpful Retry-After header:

package org.acme.api;

import io.quarkiverse.bucket4j.runtime.RateLimitException;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.ext.ExceptionMapper;
import jakarta.ws.rs.ext.Provider;

@Provider
public class RateLimitExceptionMapper implements ExceptionMapper<RateLimitException> {
    @Override
    public Response toResponse(RateLimitException ex) {
        // A conservative 1-second retry hint; adapt with ex info if you expose it
        return Response.status(429)
                .header("Retry-After", "1")
                .entity("{\"error\":\"Too Many Requests\"}")
                .build();
    }
}

Buckets and @RateLimited come from the Quarkiverse extension. The config model and identity resolvers are documented there.

Fault tolerance with MicroProfile Fault Tolerance

We’ll simulate a flaky downstream “price” call and protect it with MicroProfile FT annotations.

package org.acme.service;

import java.util.Random;

import org.eclipse.microprofile.faulttolerance.CircuitBreaker;
import org.eclipse.microprofile.faulttolerance.Fallback;
import org.eclipse.microprofile.faulttolerance.Retry;
import org.eclipse.microprofile.faulttolerance.Timeout;
import org.jboss.logging.Logger;

import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class PriceService {

    private static final Logger LOG = Logger.getLogger(PriceService.class);
    private final Random rnd = new Random();

    // Simulate unstable downstream dependency
    private String callRemote(String productId) throws InterruptedException {
        LOG.infof("Calling remote service for product %s", productId);
        // random latency 100..800 ms
        long delay = 100 + rnd.nextInt(700);
        Thread.sleep(delay);

        // 30% chance of failure
        if (rnd.nextDouble() < 0.3) {
            LOG.warnf("Remote service failed for product %s", productId);
            throw new RuntimeException("Downstream error");
        }
        String price = switch (productId) {
            case "1" -> "9.99";
            case "2" -> "399.00";
            default -> "0.00";
        };
        LOG.infof("Remote service returned price %s for product %s", price, productId);
        return price;
    }

    @Timeout(500) // ms: bound the latency
    @Retry(maxRetries = 2, delay = 200) // quick retries for transient failures
    @CircuitBreaker(requestVolumeThreshold = 6, // sliding window size
            failureRatio = 0.5, // open if >50% fail
            delay = 2000 // ms open interval
    )
    @Fallback(fallbackMethod = "fallbackPrice")
    public String price(String productId) throws InterruptedException {
        LOG.infof("Attempting to get price for product %s", productId);
        try {
            String result = callRemote(productId);
            LOG.infof("Successfully got price %s for product %s", result, productId);
            return result;
        } catch (Exception e) {
            LOG.warnf("Failed to get price for product %s: %s", productId, e.getMessage());
            throw e;
        }
    }

    public String fallbackPrice(String productId) {
        LOG.warnf("Using fallback price for product %s - circuit breaker likely open", productId);
        // conservative default; in real apps return last-known-good or tiered default
        return "0.00";
    }
}

@Timeout(500) - Bounds latency to 500ms

@Retry(maxRetries = 2, delay = 200) - Retries failed calls up to 2 times with 200ms delay

@CircuitBreaker(requestVolumeThreshold = 6, failureRatio = 0.5, delay = 2000) - Opens circuit when >50% of last 6 requests fail, stays open for 2 seconds

@Fallback(fallbackMethod = "fallbackPrice") - Uses fallback when circuit is open

Expose it via a resource:

package org.acme.api;

import org.acme.service.PriceService;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/prices")
@Produces(MediaType.TEXT_PLAIN)
public class PriceResource {

    @Inject
    PriceService svc;

    @GET
    @Path("{id}")
    public String get(@PathParam("id") String id) throws InterruptedException {
        return svc.price(id);
    }
}

Run and verify

Start dev mode:

quarkus dev

First GET returns a 200 with an ETag

curl -i http://localhost:8080/products/1

Result:

HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
content-length: 87
Cache-Control: no-transform, max-age=60
ETag: "9cad3f4b0150475054a1533682d93819e440e6ab9466bbc7c421a21e65e09829"

{"id":"1","name":"Coffee Beans","stock":42,"lastUpdated":"2025-09-11T04:35:14.772244Z"}

The server returns the product JSON and attaches a Cache-Control header plus an ETag. The ETag uniquely identifies this version of the resource.

Use the ETag for a conditional GET

Extract the ETag from the headers:

ETAG=$(curl -sI http://localhost:8080/products/1 \
  | awk -F': ' 'tolower($1)=="etag"{print $2}' \
  | tr -d '\r')

Send it back in an If-None-Match header:

curl -i -H "If-None-Match: $ETAG" http://localhost:8080/products/1

Result:

HTTP/1.1 304 Not Modified
Cache-Control: no-transform, max-age=60
ETag: "9cad3f4b0150475054a1533682d93819e440e6ab9466bbc7c421a21e65e09829"

Because the ETag matches the server’s current version, the server responds with 304 Not Modified and no body. The client can safely reuse its cached copy.

Update the resource to force a new ETag

curl -i -X PUT http://localhost:8080/products/1/stock/41

Result:

HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
content-length: 87
Cache-Control: no-transform, no-store
ETag: "4c4eaed028ea11104fec730fecd193313f3554f1a62b2ebfbf79ef42dddf8927"

{"id":"1","name":"Coffee Beans","stock":41,"lastUpdated":"2025-09-11T04:37:30.775020Z"}

The stock count changed, and the server calculated a new ETag. Notice that the response now sets Cache-Control: no-store, because writes are not cacheable.

Try the old ETag again

curl -i -H "If-None-Match: $ETAG" http://localhost:8080/products/1

Result:

HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
content-length: 87
Cache-Control: no-transform, max-age=60
ETag: "4c4eaed028ea11104fec730fecd193313f3554f1a62b2ebfbf79ef42dddf8927"

{"id":"1","name":"Coffee Beans","stock":41,"lastUpdated":"2025-09-11T04:37:30.775020Z"}

Because the client sent an outdated ETag, the server responds with 200 OK and the new resource, including the updated ETag. The cache can now replace the stale copy.

This flow illustrates conditional requests:

Clients avoid downloading unchanged payloads.
Servers save CPU and network bandwidth.
Applications get a consistent mechanism for cache validation and concurrency control.

The behavior is not Quarkus-specific. It’s defined by standard HTTP semantics (ETag, If-None-Match, 304 Not Modified) and works across all compliant clients and intermediaries..

Rate limiting

Hit the rate-limited endpoint quickly:

# 10 rapid calls burst limit (per IP). Some will be 200; repeat to exhaust.

for i in $(seq 1 15); do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/limited-products/1
done

# Expect to see 429 once the bucket exhausts; waits reset over time windows.

The limits and IpResolver behavior are defined by the Bucket4j config.

Fault tolerance

Call the price endpoint multiple times to observe timeouts, retries, and the circuit breaker opening:

# Run a quick loop and watch the logs for retries, timeouts, and fallback

for i in $(seq 1 20); do
  printf "%02d: " $i
  curl -s http://localhost:8080/prices/2; echo
done

See the Quarkus guide for how each annotation contributes to resilience and how chaining works.

Production notes

Caching
- Strong ETags plus short max-age reduce bandwidth while keeping data fresh. For public APIs, document cache behavior and set Vary where applicable.
- If you render from templates, hash a stable representation (id + version + lastUpdated) not the raw JSON to avoid accidental ETag churn. MDN’s header docs are the ground truth.
Rate limiting
- Use per-tenant or per-API key resolvers in multi-tenant systems. The Bucket4j extension supports custom identity resolvers and multiple limits per bucket (burst + sustained).
- Expose X-RateLimit-* headers from an exception mapper if you want client-friendly quotas. Persist bucket state in a distributed cache for multi-instance deployments.
Fault tolerance
- Timeouts are mandatory on any remote call. Retries should be bounded and use jitter. Circuit breakers prevent cascading failure. The MP FT guide and spec describe interactions and tuning.
- Log and observe retry counts and breaker states. Forward metrics to your platform.

What-ifs and variations

Server-side response caching: Pair HTTP caching with quarkus-cache for expensive computations. It’s complementary to client caching.
Tenant-aware rate limits: Implement a custom IdentityResolver that extracts tenant from JWT claims or API keys; combine with shared buckets for global caps.
Async I/O: For highly concurrent services, keep endpoints non-blocking and move blocking I/O to worker threads. REST Reactive handles this gracefully.
Spec-only portability: The MP FT annotations are portable across runtimes that implement MicroProfile.

When you combine caching, rate limits, and fault-tolerant design, you move beyond raw functionality and deliver APIs that respect both your infrastructure and your users. Quarkus gives you the tools to make those choices explicit instead of accidental, so your services stay quick when everything works, protect themselves under pressure, and recover cleanly when dependencies fail. That’s the standard modern teams should aim for.

Discussion about this post

Ready for more?