The Quarkus Nightclub: A Hands-On Guide to Load Shedding Under Heavy Traffic
The Nightclub Problem: Learn how to protect your Java microservices from overload with the Quarkus Load Shedding extension. The smart bouncer that keeps your APIs stable when the crowd goes wild.
Imagine your Quarkus application is a trendy nightclub. On a normal Tuesday evening, everything’s chill. People walk in, get their drinks, and everyone’s happy. But it’s Saturday night, and suddenly there’s a line around the block. What happens if you let everyone in?
Chaos.
The bartenders are overwhelmed, the bathroom lines are insane, and instead of a fun night out, everyone has a miserable experience. Some people might even pass out from waiting too long (timeout errors, anyone?).
This is exactly what happens to your service under heavy load. Load shedding is like hiring a bouncer who politely (or not so politely) turns people away when the club is at capacity, ensuring the people inside have a great experience.
What is Load Shedding?
Load shedding detects when your service is overloaded and strategically rejects requests before they make things worse. It’s better to say “Sorry, we’re full” than to let everyone in and have the whole system collapse.
The Quarkus Load Shedding extension uses an adaptation of TCP Vegas (yes, the algorithm has a cool name) to detect overload and optionally uses priority-based rejection to ensure VIPs get in first.
When Should You Use Load Shedding?
Perfect Scenarios:
Public-facing APIs - When you have unpredictable traffic spikes (Black Friday sales, viral tweets about your product)
Services with expensive operations - Database queries, external API calls, ML inference
Microservices architectures - Prevent cascade failures when one service gets hammered
Services with SLAs - Better to reject 10% of requests fast than timeout 100% of them slowly
Resource-constrained environments - Running on limited CPU/memory? Don’t let it crash!
When NOT to use it:
Critical transaction systems where every request MUST be processed (payment processing)
Very low traffic services - unnecessary overhead
Services with already-perfect auto-scaling - but let’s be honest, does that exist?
Let’s Build a Load Shedding Demo!
Create Your Quarkus Project
quarkus create app com.nightclub:bouncer-demo \
-x=rest,load-shedding,rest-jackson
cd bouncer-demoThat’s it! The extension works out of the box with sensible defaults. No configuration needed.
Create Your Nightclub API
Rename GreetingResource.java to src/main/java/com/nightclub/NightclubResource.java and replace its content with:
package com.nightclub;
import java.time.Duration;
import java.util.Random;
import io.smallrye.mutiny.Uni;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path(”/club”)
@Produces(MediaType.APPLICATION_JSON)
public class NightclubResource {
private final Random random = new Random();
@GET
@Path(”/enter”)
public Uni<ClubResponse> enterClub() {
// Simulate some work (checking ID, taking coat, etc.) - Make it slower
int delay = 500 + random.nextInt(200); // 500-700ms
return Uni.createFrom().item(new ClubResponse(”Welcome to the club!”))
.onItem().delayIt().by(Duration.ofMillis(delay));
}
@GET
@Path(”/vip-entry”)
public Uni<ClubResponse> vipEntry() {
// VIP lane is still slow but we’ll prioritize these
int delay = 150 + random.nextInt(150); // 150-300ms
return Uni.createFrom().item(new ClubResponse(”VIP lane! Skip the line!”))
.onItem().delayIt().by(Duration.ofMillis(delay));
}
@GET
@Path(”/bathroom”)
public Uni<ClubResponse> useBathroom() {
// This takes forever at a nightclub
int delay = 300 + random.nextInt(500); // 300-800ms
return Uni.createFrom().item(new ClubResponse(”Finally...”))
.onItem().delayIt().by(Duration.ofMillis(delay));
}
public record ClubResponse(String status) {
}
}Configure Load Shedding
Modify src/main/resources/application.properties:
# Enable load shedding (default: true)
quarkus.load-shedding.enabled=true
# Maximum concurrent requests (default: 1000) - Set very low to test
quarkus.load-shedding.max-limit=5
# Initial limit (default: 100) - Set very low to test
quarkus.load-shedding.initial-limit=2
# Alpha factor for increasing limit (default: 3)
quarkus.load-shedding.alpha-factor=1
# Beta factor for decreasing limit (default: 6)
quarkus.load-shedding.beta-factor=2
# Probe factor - how often to reset the baseline (default: 30.0)
quarkus.load-shedding.probe-factor=5.0
# Disable priority-based rejection to force rejection of all requests when overloaded
quarkus.load-shedding.priority.enabled=falseAdd Custom Priority (The VIP List!)
Create src/main/java/com/nightclub/VIPPrioritizer.java:
package com.nightclub;
import io.quarkus.load.shedding.RequestPrioritizer;
import io.quarkus.load.shedding.RequestPriority;
import io.vertx.core.http.HttpServerRequest;
import jakarta.enterprise.context.ApplicationScoped;
import org.jboss.logging.Logger;
@ApplicationScoped
public class VIPPrioritizer implements RequestPrioritizer<HttpServerRequest> {
private static final Logger LOG = Logger.getLogger(VIPPrioritizer.class);
@Override
public boolean appliesTo(Object request) {
return true; // This prioritizer applies to all requests
}
@Override
public RequestPriority priority(HttpServerRequest request) {
String path = request.path();
LOG.debugf(”VIPPrioritizer called for path: %s”, path);
// VIP endpoints get CRITICAL priority
if (path.contains(”/vip-”)) {
return RequestPriority.CRITICAL;
}
// Regular entry is NORMAL
if (path.contains(”/enter”)) {
return RequestPriority.NORMAL;
}
// Bathroom can wait (sorry!)
if (path.contains(”/bathroom”)) {
return RequestPriority.BACKGROUND;
}
return RequestPriority.NORMAL;
}
}Add a Custom Classifier (Cohort Assignment)
Create src/main/java/com/nightclub/RegularCustomerClassifier.java:
package com.nightclub;
import io.quarkus.load.shedding.RequestClassifier;
import io.vertx.core.http.HttpServerRequest;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class RegularCustomerClassifier implements RequestClassifier<HttpServerRequest> {
@Override
public boolean appliesTo(Object request) {
return true; // This classifier applies to all requests
}
@Override
public int cohort(HttpServerRequest request) {
// Check if they’re a “regular” (via header)
String customerType = request.getHeader(”X-Customer-Type”);
if (”regular”.equalsIgnoreCase(customerType)) {
// Regular customers get a better cohort (lower number)
return 10;
} else if (”premium”.equalsIgnoreCase(customerType)) {
// Premium customers get even better treatment
return 5;
}
// Random people get assigned based on IP (default behavior)
return 64; // Middle of the road
}
}Test It!
Start your application:
quarkus devTest Normal Traffic:
# Single request - should work fine
curl http://localhost:8080/club/enterSimulate a Stampede! 🏃♂️🏃♀️
Use hay or any load testing tool:
# Install
brew install hey
# Overwhelm the club with 1000 requests, 100 at a time
hey -n 1000 -c 200 http://localhost:8080/club/enterWatch as the bouncer (load shedding) starts rejecting requests with HTTP 503 Service Unavailable!
Summary:
Total: 3.1961 secs
Slowest: 0.7329 secs
Fastest: 0.0016 secs
Average: 0.0283 secs
Requests/sec: 312.8794
Total data: 330 bytes
Size/request: 0 bytes
Response time histogram:
0.002 [1] |
0.075 [962] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.148 [27] |■
0.221 [0] |
0.294 [0] |
0.367 [0] |
0.440 [0] |
0.514 [1] |
0.587 [1] |
0.660 [6] |
0.733 [2] |
Latency distribution:
10% in 0.0056 secs
25% in 0.0071 secs
50% in 0.0114 secs
75% in 0.0233 secs
90% in 0.0682 secs
95% in 0.0739 secs
99% in 0.5082 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0037 secs, 0.0016 secs, 0.7329 secs
DNS-lookup: 0.0009 secs, 0.0000 secs, 0.0082 secs
req write: 0.0001 secs, 0.0000 secs, 0.0043 secs
resp wait: 0.0241 secs, 0.0003 secs, 0.7288 secs
resp read: 0.0002 secs, 0.0000 secs, 0.0016 secs
Status code distribution:
[200] 10 responses
[503] 990 responsesLooking at the results:
10 requests got through with 200 status codes
990 requests were rejected with 503 status codes
Total data: 370 bytes
Average response time: 0.0283 secs
This is exactly what we wanted to see! The bouncer is working as intended:
Load shedding is active - Most requests (990/1000) are being rejected
Only a few requests get through - The 10 successful requests represent the allowed concurrent limit
Fast rejection - 503 responses are returned quickly (0.028s average) instead of waiting for the full processing time
System protection - The nightclub is protected from overload
Test VIP Treatment:
# Regular entry during overload - might get rejected
hey -n 100 -c 50 http://localhost:8080/club/enter
# VIP entry during overload - higher priority!
hey -n 500 -c 50 http://localhost:8080/club/vip-entryYou should see VIP requests have a higher success rate!
Summary:
Total: 2.3453 secs
Slowest: 0.2883 secs
Fastest: 0.0003 secs
Average: 0.0213 secs
Requests/sec: 213.1918
Total data: 1517 bytes
Size/request: 3 bytes
Response time histogram:
0.000 [1] |
0.029 [458] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.058 [0] |
0.087 [0] |
0.116 [0] |
0.144 [0] |
0.173 [3] |
0.202 [12] |■
0.231 [10] |■
0.259 [7] |■
0.288 [9] |■
Latency distribution:
10% in 0.0020 secs
25% in 0.0025 secs
50% in 0.0032 secs
75% in 0.0044 secs
90% in 0.0073 secs
95% in 0.2180 secs
99% in 0.2738 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0022 secs, 0.0003 secs, 0.2883 secs
DNS-lookup: 0.0005 secs, 0.0000 secs, 0.0030 secs
req write: 0.0002 secs, 0.0000 secs, 0.0030 secs
resp wait: 0.0187 secs, 0.0001 secs, 0.2881 secs
resp read: 0.0002 secs, 0.0000 secs, 0.0018 secs
Status code distribution:
[200] 41 responses
[503] 459 responsesTest with Customer Headers:
# Premium customer
curl -H "X-Customer-Type: premium" http://localhost:8080/club/enter
# Regular customer
curl -H "X-Customer-Type: regular" http://localhost:8080/club/enterThe RegularCustomerClassifier reads the `X-Customer-Type` header from incoming requests. The classifier assigns customers to different “cohorts” (groups) based on their type (Lower = Better):
Premium customers (`X-Customer-Type: premium`) → Cohort 5 (best treatment)
Regular customers (`X-Customer-Type: regular`) → Cohort 10 (good treatment)
Unknown/No header → Cohort 64 (default treatment)
When the system is overloaded, the load shedding algorithm uses these cohorts to decide who gets in (group = priority * num_cohorts + cohort):
Lower cohort numbers = Higher priority
Premium (5) > Regular (10) > Unknown (64)
Understanding the Algorithm
The Vegas Algorithm
Start with a limit: Default is 100 concurrent requests
Track request duration: Keep track of the fastest request
Estimate queue size: Compare current duration to the fastest
Adjust the limit:
Queue small (< alpha)? Increase limit
Queue large (> beta)? Decrease limit
Queue just right? Keep limit
Priority Load Shedding
When overload is detected, the algorithm considers:
5 Priority Levels: CRITICAL > IMPORTANT > NORMAL > BACKGROUND > DEGRADED
128 Cohorts: Groups for similar requests
640 Total Groups: priority × cohorts
The rejection formula:
reject if: group_number > total_groups × (1 - cpu_load³)Translation: As CPU load increases, only higher priority groups get through!
Real-World Scenarios
Scenario 1: E-commerce Flash Sale
CRITICAL: Checkout API
IMPORTANT: Product search
NORMAL: Browse catalog
BACKGROUND: Analytics tracking
Scenario 2: Video Streaming Service
CRITICAL: Video playback
IMPORTANT: User authentication
NORMAL: Browse recommendations
BACKGROUND: View history updates
Scenario 3: Banking API
CRITICAL: Transaction endpoints
IMPORTANT: Balance inquiries
NORMAL: Transaction history
DEGRADED: Marketing content
Advanced Tricks
1. Geographic Load Balancing
@ApplicationScoped
public class GeoClassifier implements RequestClassifier<HttpServerRequest> {
@Override
public int cohort(HttpServerRequest request) {
String customerType = request.getHeader(”CloudFront-Viewer-Country”);
// Prioritize local traffic
return “US”.equals(region) ? 20 : 80;
}
}2. Tenant-Based Prioritization
@ApplicationScoped
public class TenantPrioritizer implements RequestPrioritizer<HttpServerRequest> {
@Override
public RequestPriority priority(HttpServerRequest request) {
String tenant = extractTenant(request);
return isPremiumTenant(tenant)
? RequestPriority.IMPORTANT
: RequestPriority.NORMAL;
}
}3. Time-Based Adjustment
@ApplicationScoped
public class TimeBasedClassifier implements RequestClassifier<HttpServerRequest> {
@Override
public int cohort(HttpServerRequest request) {
int hour = LocalTime.now().getHour();
// Business hours get better treatment
return (hour >= 9 && hour <= 17) ? 30 : 90;
}
}Important Gotchas
Only works for HTTP: gRPC, WebSocket, messaging are not supported (yet)
Experimental status: This is bleeding edge stuff! Test thoroughly
Fast rejections are good: A 503 after 1ms is better than a timeout after 30s
Monitor CPU usage: Priority load shedding needs CPU metrics
Tune for your workload: Default settings might not fit your use case
Conclusion
Load shedding is like having a smart bouncer for your service. It won’t prevent all problems, but it’ll keep your nightclub (service) from turning into a disaster zone when things get crowded.
Remember:
Better to reject some requests fast than fail all requests slowly
Prioritize critical operations
Monitor and tune based on your actual traffic patterns
Test under realistic load conditions
Now go forth and protect your services from the stampede!
Further Reading
Happy load shedding!



