The Quarkus Nightclub: A Hands-On Guide to Load Shedding Under Heavy Traffic

The Nightclub Problem: Learn how to protect your Java microservices from overload with the Quarkus Load Shedding extension. The smart bouncer that keeps your APIs stable when the crowd goes wild.

Oct 28, 2025

Imagine your Quarkus application is a trendy nightclub. On a normal Tuesday evening, everything’s chill. People walk in, get their drinks, and everyone’s happy. But it’s Saturday night, and suddenly there’s a line around the block. What happens if you let everyone in?

Chaos.

The bartenders are overwhelmed, the bathroom lines are insane, and instead of a fun night out, everyone has a miserable experience. Some people might even pass out from waiting too long (timeout errors, anyone?).

This is exactly what happens to your service under heavy load. Load shedding is like hiring a bouncer who politely (or not so politely) turns people away when the club is at capacity, ensuring the people inside have a great experience.

What is Load Shedding?

Load shedding detects when your service is overloaded and strategically rejects requests before they make things worse. It’s better to say “Sorry, we’re full” than to let everyone in and have the whole system collapse.

The Quarkus Load Shedding extension uses an adaptation of TCP Vegas (yes, the algorithm has a cool name) to detect overload and optionally uses priority-based rejection to ensure VIPs get in first.

When Should You Use Load Shedding?

Perfect Scenarios:

Public-facing APIs - When you have unpredictable traffic spikes (Black Friday sales, viral tweets about your product)
Services with expensive operations - Database queries, external API calls, ML inference
Microservices architectures - Prevent cascade failures when one service gets hammered
Services with SLAs - Better to reject 10% of requests fast than timeout 100% of them slowly
Resource-constrained environments - Running on limited CPU/memory? Don’t let it crash!

When NOT to use it:

Critical transaction systems where every request MUST be processed (payment processing)
Very low traffic services - unnecessary overhead
Services with already-perfect auto-scaling - but let’s be honest, does that exist?

Let’s Build a Load Shedding Demo!

Create Your Quarkus Project

quarkus create app com.nightclub:bouncer-demo \
    -x=rest,load-shedding,rest-jackson
cd bouncer-demo

That’s it! The extension works out of the box with sensible defaults. No configuration needed.

Create Your Nightclub API

Rename GreetingResource.java to src/main/java/com/nightclub/NightclubResource.java and replace its content with:

package com.nightclub;

import java.time.Duration;
import java.util.Random;

import io.smallrye.mutiny.Uni;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path(”/club”)
@Produces(MediaType.APPLICATION_JSON)
public class NightclubResource {

    private final Random random = new Random();

    @GET
    @Path(”/enter”)
    public Uni<ClubResponse> enterClub() {
        // Simulate some work (checking ID, taking coat, etc.) - Make it slower
        int delay = 500 + random.nextInt(200); // 500-700ms
        return Uni.createFrom().item(new ClubResponse(”Welcome to the club!”))
                .onItem().delayIt().by(Duration.ofMillis(delay));
    }

    @GET
    @Path(”/vip-entry”)
    public Uni<ClubResponse> vipEntry() {
        // VIP lane is still slow but we’ll prioritize these
        int delay = 150 + random.nextInt(150); // 150-300ms
        return Uni.createFrom().item(new ClubResponse(”VIP lane! Skip the line!”))
                .onItem().delayIt().by(Duration.ofMillis(delay));
    }

    @GET
    @Path(”/bathroom”)
    public Uni<ClubResponse> useBathroom() {
        // This takes forever at a nightclub
        int delay = 300 + random.nextInt(500); // 300-800ms
        return Uni.createFrom().item(new ClubResponse(”Finally...”))
                .onItem().delayIt().by(Duration.ofMillis(delay));
    }

    public record ClubResponse(String status) {
    }

}

Configure Load Shedding

Modify src/main/resources/application.properties:

# Enable load shedding (default: true)
quarkus.load-shedding.enabled=true

# Maximum concurrent requests (default: 1000) - Set very low to test
quarkus.load-shedding.max-limit=5

# Initial limit (default: 100) - Set very low to test  
quarkus.load-shedding.initial-limit=2

# Alpha factor for increasing limit (default: 3)
quarkus.load-shedding.alpha-factor=1

# Beta factor for decreasing limit (default: 6) 
quarkus.load-shedding.beta-factor=2

# Probe factor - how often to reset the baseline (default: 30.0)
quarkus.load-shedding.probe-factor=5.0

# Disable priority-based rejection to force rejection of all requests when overloaded
quarkus.load-shedding.priority.enabled=false

Add Custom Priority (The VIP List!)

Create src/main/java/com/nightclub/VIPPrioritizer.java:

package com.nightclub;

import io.quarkus.load.shedding.RequestPrioritizer;
import io.quarkus.load.shedding.RequestPriority;
import io.vertx.core.http.HttpServerRequest;
import jakarta.enterprise.context.ApplicationScoped;
import org.jboss.logging.Logger;

@ApplicationScoped
public class VIPPrioritizer implements RequestPrioritizer<HttpServerRequest> {

    private static final Logger LOG = Logger.getLogger(VIPPrioritizer.class);

    @Override
    public boolean appliesTo(Object request) {
        return true; // This prioritizer applies to all requests
    }

    @Override
    public RequestPriority priority(HttpServerRequest request) {
        String path = request.path();
        LOG.debugf(”VIPPrioritizer called for path: %s”, path);

        // VIP endpoints get CRITICAL priority
        if (path.contains(”/vip-”)) {
            return RequestPriority.CRITICAL;
        }

        // Regular entry is NORMAL
        if (path.contains(”/enter”)) {
            return RequestPriority.NORMAL;
        }

        // Bathroom can wait (sorry!)
        if (path.contains(”/bathroom”)) {
            return RequestPriority.BACKGROUND;
        }

        return RequestPriority.NORMAL;
    }
}

Add a Custom Classifier (Cohort Assignment)

Create src/main/java/com/nightclub/RegularCustomerClassifier.java:

package com.nightclub;

import io.quarkus.load.shedding.RequestClassifier;
import io.vertx.core.http.HttpServerRequest;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class RegularCustomerClassifier implements RequestClassifier<HttpServerRequest> {

    @Override
    public boolean appliesTo(Object request) {
        return true; // This classifier applies to all requests
    }

    @Override
    public int cohort(HttpServerRequest request) {
        // Check if they’re a “regular” (via header)
        String customerType = request.getHeader(”X-Customer-Type”);

        if (”regular”.equalsIgnoreCase(customerType)) {
            // Regular customers get a better cohort (lower number)
            return 10;
        } else if (”premium”.equalsIgnoreCase(customerType)) {
            // Premium customers get even better treatment
            return 5;
        }

        // Random people get assigned based on IP (default behavior)
        return 64; // Middle of the road
    }
}

Test It!

Start your application:

quarkus dev

Test Normal Traffic:

# Single request - should work fine
curl http://localhost:8080/club/enter

Simulate a Stampede! 🏃‍♂️🏃‍♀️

Use hay or any load testing tool:

# Install
brew install hey

# Overwhelm the club with 1000 requests, 100 at a time
hey -n 1000 -c 200 http://localhost:8080/club/enter

Watch as the bouncer (load shedding) starts rejecting requests with HTTP 503 Service Unavailable!

Summary:
  Total:	3.1961 secs
  Slowest:	0.7329 secs
  Fastest:	0.0016 secs
  Average:	0.0283 secs
  Requests/sec:	312.8794
  
  Total data:	330 bytes
  Size/request:	0 bytes

Response time histogram:
  0.002 [1]	|
  0.075 [962]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.148 [27]	|■
  0.221 [0]	|
  0.294 [0]	|
  0.367 [0]	|
  0.440 [0]	|
  0.514 [1]	|
  0.587 [1]	|
  0.660 [6]	|
  0.733 [2]	|


Latency distribution:
  10% in 0.0056 secs
  25% in 0.0071 secs
  50% in 0.0114 secs
  75% in 0.0233 secs
  90% in 0.0682 secs
  95% in 0.0739 secs
  99% in 0.5082 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0037 secs, 0.0016 secs, 0.7329 secs
  DNS-lookup:	0.0009 secs, 0.0000 secs, 0.0082 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0043 secs
  resp wait:	0.0241 secs, 0.0003 secs, 0.7288 secs
  resp read:	0.0002 secs, 0.0000 secs, 0.0016 secs

Status code distribution:
  [200]	10 responses
  [503]	990 responses

Looking at the results:

10 requests got through with 200 status codes

990 requests were rejected with 503 status codes

Total data: 370 bytes

Average response time: 0.0283 secs

This is exactly what we wanted to see! The bouncer is working as intended:

Load shedding is active - Most requests (990/1000) are being rejected

Only a few requests get through - The 10 successful requests represent the allowed concurrent limit

Fast rejection - 503 responses are returned quickly (0.028s average) instead of waiting for the full processing time

System protection - The nightclub is protected from overload

Test VIP Treatment:

# Regular entry during overload - might get rejected
hey -n 100 -c 50 http://localhost:8080/club/enter

# VIP entry during overload - higher priority!
hey -n 500 -c 50 http://localhost:8080/club/vip-entry

You should see VIP requests have a higher success rate!

Summary:
  Total:	2.3453 secs
  Slowest:	0.2883 secs
  Fastest:	0.0003 secs
  Average:	0.0213 secs
  Requests/sec:	213.1918
  
  Total data:	1517 bytes
  Size/request:	3 bytes

Response time histogram:
  0.000 [1]	|
  0.029 [458]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.058 [0]	|
  0.087 [0]	|
  0.116 [0]	|
  0.144 [0]	|
  0.173 [3]	|
  0.202 [12]	|■
  0.231 [10]	|■
  0.259 [7]	|■
  0.288 [9]	|■


Latency distribution:
  10% in 0.0020 secs
  25% in 0.0025 secs
  50% in 0.0032 secs
  75% in 0.0044 secs
  90% in 0.0073 secs
  95% in 0.2180 secs
  99% in 0.2738 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0022 secs, 0.0003 secs, 0.2883 secs
  DNS-lookup:	0.0005 secs, 0.0000 secs, 0.0030 secs
  req write:	0.0002 secs, 0.0000 secs, 0.0030 secs
  resp wait:	0.0187 secs, 0.0001 secs, 0.2881 secs
  resp read:	0.0002 secs, 0.0000 secs, 0.0018 secs

Status code distribution:
  [200]	41 responses
  [503]	459 responses

Test with Customer Headers:

# Premium customer
curl -H "X-Customer-Type: premium" http://localhost:8080/club/enter

# Regular customer
curl -H "X-Customer-Type: regular" http://localhost:8080/club/enter

The RegularCustomerClassifier reads the `X-Customer-Type` header from incoming requests. The classifier assigns customers to different “cohorts” (groups) based on their type (Lower = Better):

Premium customers (`X-Customer-Type: premium`) → Cohort 5 (best treatment)
Regular customers (`X-Customer-Type: regular`) → Cohort 10 (good treatment)
Unknown/No header → Cohort 64 (default treatment)

When the system is overloaded, the load shedding algorithm uses these cohorts to decide who gets in (group = priority * num_cohorts + cohort):

Lower cohort numbers = Higher priority
Premium (5) > Regular (10) > Unknown (64)

Understanding the Algorithm

The Vegas Algorithm

Start with a limit: Default is 100 concurrent requests
Track request duration: Keep track of the fastest request
Estimate queue size: Compare current duration to the fastest
Adjust the limit:
- Queue small (< alpha)? Increase limit
- Queue large (> beta)? Decrease limit
- Queue just right? Keep limit

Priority Load Shedding

When overload is detected, the algorithm considers:

5 Priority Levels: CRITICAL > IMPORTANT > NORMAL > BACKGROUND > DEGRADED
128 Cohorts: Groups for similar requests
640 Total Groups: priority × cohorts

The rejection formula:

reject if: group_number > total_groups × (1 - cpu_load³)

Translation: As CPU load increases, only higher priority groups get through!

Real-World Scenarios

Scenario 1: E-commerce Flash Sale

CRITICAL: Checkout API
IMPORTANT: Product search
NORMAL: Browse catalog
BACKGROUND: Analytics tracking

Scenario 2: Video Streaming Service

CRITICAL: Video playback
IMPORTANT: User authentication
NORMAL: Browse recommendations
BACKGROUND: View history updates

Scenario 3: Banking API

CRITICAL: Transaction endpoints
IMPORTANT: Balance inquiries
NORMAL: Transaction history
DEGRADED: Marketing content

Advanced Tricks

1. Geographic Load Balancing

@ApplicationScoped
public class GeoClassifier implements RequestClassifier<HttpServerRequest> {
    @Override
public int cohort(HttpServerRequest request) {
String customerType = request.getHeader(”CloudFront-Viewer-Country”);
        // Prioritize local traffic
        return “US”.equals(region) ? 20 : 80;
    }
}

2. Tenant-Based Prioritization

@ApplicationScoped
public class TenantPrioritizer implements RequestPrioritizer<HttpServerRequest>  {
    @Override
public RequestPriority priority(HttpServerRequest request) {
        String tenant = extractTenant(request);
        return isPremiumTenant(tenant) 
            ? RequestPriority.IMPORTANT 
            : RequestPriority.NORMAL;
    }
}

3. Time-Based Adjustment

@ApplicationScoped
public class TimeBasedClassifier implements RequestClassifier<HttpServerRequest> {
    @Override
public int cohort(HttpServerRequest request) {
        int hour = LocalTime.now().getHour();
        // Business hours get better treatment
        return (hour >= 9 && hour <= 17) ? 30 : 90;
    }
}

Important Gotchas

Only works for HTTP: gRPC, WebSocket, messaging are not supported (yet)
Experimental status: This is bleeding edge stuff! Test thoroughly
Fast rejections are good: A 503 after 1ms is better than a timeout after 30s
Monitor CPU usage: Priority load shedding needs CPU metrics
Tune for your workload: Default settings might not fit your use case

Conclusion

Load shedding is like having a smart bouncer for your service. It won’t prevent all problems, but it’ll keep your nightclub (service) from turning into a disaster zone when things get crowded.

Remember:

Better to reject some requests fast than fail all requests slowly
Prioritize critical operations
Monitor and tune based on your actual traffic patterns
Test under realistic load conditions

Now go forth and protect your services from the stampede!

Discussion about this post

Ready for more?