Local Image Generation in Java: Quarkus, FFM, and Native AI
A deep, hands-on guide to running modern image models inside the JVM using Project Panama and native shared libraries.
My interest in image generation started, like for many developers, with tools such as Midjourney. The quality jump over the last few years has been obvious, and the creative potential is undeniable. From a technical perspective, however, most of these systems are black boxes. You send a prompt over the network, wait, and get an image back. The internals remain opaque.
Trying to reproduce similar capabilities locally quickly exposes the real problem. Modern image generation stacks are complex. They often assume Python-first workflows, CUDA-capable GPUs, and large dependency graphs that are hard to reason about and harder to operate long-term. Even when local inference is possible, setup tends to be fragile, poorly documented, and tightly coupled to specific hardware configurations.
This becomes more obvious on non-x86 systems. On Apple Silicon and other ARM-based machines, many inference paths are either unavailable or second-class. GPU acceleration is inconsistent, and CPU-based inference is often dismissed as impractical, even when the workload characteristics would allow it. As a result, there are missed opportunities for local, predictable inference on hardware that developers already use every day.
From a Java perspective, the situation is even more constrained. Native image models are typically treated as external services, accessed through REST APIs or sidecar processes. This adds latency, operational overhead, and another failure boundary. More importantly, it prevents the JVM from having any real ownership over model lifecycle, memory usage, or concurrency behavior.
This tutorial explores a different approach. We integrate a native image generation model directly into a Java application using Quarkus and the Java Foreign Function and Memory API from Project Panama. The focus is on making native inference a first-class part of a Java service, with explicit memory management and well-defined boundaries.
Prerequisites
This setup crosses the Java and native boundary. You need a working toolchain on both sides.
Java 25 (FFM is preview in 21 and stable from 22)
jextract version compatible with your selected Java version (!)
Apache Maven 3.8 or newer
GCC or compatible C compiler
Basic understanding of C pointers and memory ownership
Eight gigabytes of RAM or more for image generation
Approximately forty-five minutes of focused time
Project setup
We build a Quarkus-based REST service that loads a native Flux2 model at startup and exposes a synchronous image generation endpoint. The service runs entirely locally and performs inference in-process.
Create the project or follow along with the source-code repository:
mvn io.quarkus:quarkus-maven-plugin:create \
-DprojectGroupId=com.example \
-DprojectArtifactId=flux2-quarkus \
-DnoCode=true \
-DjavaVersion=25
cd flux2-quarkusEnable preview features in pom.xml:
<build>
<plugins>
<plugin>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-maven-plugin</artifactId>
<configuration>
<compilerArgs>
<arg>--enable-preview</arg>
</compilerArgs>
</configuration>
</plugin>
</plugins>
</build>Without this flag, the FFM API compiles but fails at runtime.
One little twek in application.properties. Point to your model directory. We will download the models to it later:
# Flux2Native Configuration
flux.model.path=/Users/youruser/flux2-quarkus/src/main/resources/modelsBuilding the native Flux2 library
FLUX.2-klein-4B Pure C Implementation is a compact C-based image generation implementation. Before Java can call it, the native library must be compiled as a shared object.
The project documentation recommends building the Apple Silicon backend using:
make mpsThis produces a fast, native executable intended for interactive or batch use from the command line. For Java FFM integration, however, this build target is not usable directly. The Foreign Function and Memory API requires a different artifact and a different mental model.
Why FFM requires a shared library
Java’s Foreign Function and Memory API allows Java code to call native C functions inside the same process as the JVM. There is no process boundary and no command-line invocation. The JVM loads native machine code into its own address space and invokes functions via stable ABI symbols.
To do this, the JVM must load a shared library:
.dylibon macOS.soon Linux
Executables cannot be loaded this way. They are designed to be launched by the operating system as independent processes with their own main() function, stack, and lifecycle.
Other native artifact types are also unsuitable:
Executables (
./flux)
These are entry-point driven programs. They start, parse arguments, perform work, and exit. Java would have to invoke them usingProcessBuilder, serialize data through files or pipes, and reload the model on every invocation.Static libraries (
.a)
These are linked at compile time into another native binary. Java does not perform native linking during compilation, so static libraries cannot be consumed by the JVM.Shared libraries (
.dylib,.so)
These are explicitly designed to be loaded dynamically by a running process. The JVM can resolve symbols, call functions, and keep native state alive for as long as the process runs.
How this differs from make mps
The make mps target produces a standalone CLI executable. Conceptually, it looks like this:
Contains a
main()functionParses command-line arguments
Loads the model
Generates one image
Terminates
This is ideal for experimentation or scripting, but it forces a very different execution model. Every invocation is isolated. The model is loaded and unloaded repeatedly. Communication happens through files or standard streams. Error handling is coarse-grained.
In contrast, the shared library build produces something fundamentally different:
No
main()functionExposes callable functions such as
flux2_init,flux2_generate, andflux2_freeCan be loaded once and reused for the lifetime of the JVM
Keeps model weights resident in memory
Allows direct parameter passing without intermediate serialization
In this setup, the Java application becomes the “main” program. Quarkus controls startup, shutdown, concurrency, and resource management. Native code becomes an implementation detail rather than the controlling process.
What we are building instead
Rather than invoking Flux2 as a program, we compile it into a shared library:
git clone https://github.com/antirez/flux2.c.git
cd flux2.c
clang -dynamiclib -o libflux.dylib \
flux.c flux_kernels.c flux_tokenizer.c flux_vae.c \
flux_transformer.c flux_sample.c flux_image.c \
flux_safetensors.c flux_qwen3.c flux_qwen3_tokenizer.c \
flux_metal.m \
-Wall -Wextra -O3 -march=native -ffast-math \
-DUSE_BLAS -DUSE_METAL -DACCELERATE_NEW_LAPACK \
-fobjc-arc \
-framework Accelerate -framework Metal -framework MetalPerformanceShaders -framework Foundation
Defining a stable native boundary
Rather than binding against the entire implementation, we are going to define a minimal header that expresses only what Java needs. Using flux.h directly with Java FFM is suboptimal because:
Missing Initialization: flux.h does not expose flux_metal_init(), which is required for GPU acceleration (it’s in flux_metal.h).
Struct Complexity: Passing flux_params by value or pointer requires mirroring that struct’s layout in Java. A flat API is much easier to bind.
Memory Management: flux_generate returns a flux_image* that you manually have to free. A “fire-and-forget” function that saves directly to disk is fewer FFM calls.
Split Headers: Important globals like flux_step_callback are hidden in flux_kernels.h.
Create: flux_wrapper.h
#ifndef FLUX_WRAPPER_H
#define FLUX_WRAPPER_H
#ifdef __cplusplus
extern "C" {
#endif
typedef struct flux_ctx flux_ctx;
// 1. Initialize Metal (if compiled with it) AND load the model
flux_ctx* flux_wrapper_init(const char* model_path);
// 2. Generate and save directly to file (simplest for Java)
// Returns 0 on success, non-zero on failure.
int flux_wrapper_generate(
flux_ctx* ctx,
const char* prompt,
const char* output_path,
int width,
int height,
int steps,
float guidance,
long seed
);
// 3. Cleanup
void flux_wrapper_free(flux_ctx* ctx);
#ifdef __cplusplus
}
#endif
#endifCreate: flux_wrapper.c
#include "flux_wrapper.h"
#include "flux.h"
#include "flux_metal.h" // Needed for flux_metal_init
#include <stddef.h>
flux_ctx* flux_wrapper_init(const char* model_path) {
#ifdef USE_METAL
flux_metal_init();
#endif
return flux_load_dir(model_path);
}
int flux_wrapper_generate(
flux_ctx* ctx,
const char* prompt,
const char* output_path,
int width,
int height,
int steps,
float guidance,
long seed
) {
if (!ctx || !prompt || !output_path) return -1;
flux_params params = FLUX_PARAMS_DEFAULT;
params.width = width;
params.height = height;
params.num_steps = steps;
params.guidance_scale = guidance;
params.seed = seed;
flux_image* img = flux_generate(ctx, prompt, ¶ms);
if (!img) return -2;
int res = flux_image_save(img, output_path);
flux_image_free(img);
return res;
}
void flux_wrapper_free(flux_ctx* ctx) {
flux_free(ctx);
// Note: flux_metal_cleanup() could be called here if you want strict cleanup
}This isolates all the C-specific headers (flux.h, flux_metal.h) from your Java world. You only need to generate bindings for flux_wrapper.h.
While we are here, make sure to also build the metal shaders:
# download toolchain if you have not (>700MB)
xcodebuild -downloadComponent MetalToolchain
# Build the metal shaders
xcrun -sdk macosx metal -o default.metallib flux_shaders.metalThere is a teensy little hiccup in flux_metal.m where it assumes the application bundle has the shader resource. In a Java environment, this returns nil and crashes the array creation. So we need to fix this.
File: flux2.c/flux_metal.m Lines 1506-1510:
/* Try to find the shader file in various locations */
NSString *shaderPath = nil;
NSMutableArray *searchPaths = [NSMutableArray arrayWithObjects:
@"flux_shaders.metal",
@"./flux_shaders.metal",
nil];
NSString *bundlePath = [[NSBundle mainBundle] pathForResource:@"flux_shaders" ofType:@"metal"];
if (bundlePath) {
[searchPaths addObject:bundlePath];
}Now we need to recompile the dynamic library. Let’s update the build command to include flux_wrapper.c alongside the other source files. And we use the same command as before:
clang -dynamiclib -o libflux.dylib \
flux_wrapper.c flux.c flux_kernels.c flux_tokenizer.c flux_vae.c \
flux_transformer.c flux_sample.c flux_image.c \
flux_safetensors.c flux_qwen3.c flux_qwen3_tokenizer.c \
flux_metal.m \
-Wall -Wextra -O3 -march=native -ffast-math \
-DUSE_BLAS -DUSE_METAL -DACCELERATE_NEW_LAPACK \
-fobjc-arc \
-framework Accelerate -framework Metal -framework MetalPerformanceShaders -framework FoundationAnd now, finally back to Java. Let’s generate the bindings with jextract.
jextract --output src \
--target-package com.forest.flux \
--library flux \
-l flux \
--header-class-name FluxLib \
flux_wrapper.hThe resulting bindings land in: src/com/forest/flux/FluxLib.java (and related files). Copy them over to your Quarkus project you created earlier.
⚠️ I have written about challenges with MacOS library loading and my way of patching together demos before:
⚠️ Because of this, make sure to patch the library loading accordingly and make it point to your libflux.dylib (FluxLib, roughly around line 23ff).
Implementing
The next big step is to download the relevant model files. You can either do this via the download_model.py script or use the following little bash script:
#!/bin/bash
# Target directory in your Quarkus project
TARGET_DIR="/Users/meisele/Projects/the-main-thread/flux2-quarkus/src/main/resources/models"
# Or use relative path if running from project root:
# TARGET_DIR="./src/main/resources/models"
REPO_URL="https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/resolve/main"
echo "Downloading FLUX.2 Klein models to $TARGET_DIR..."
# Create directories
mkdir -p "$TARGET_DIR/vae"
mkdir -p "$TARGET_DIR/transformer"
mkdir -p "$TARGET_DIR/text_encoder"
mkdir -p "$TARGET_DIR/tokenizer"
mkdir -p "$TARGET_DIR/scheduler"
# Helper function
download_file() {
local path=$1
local url="$REPO_URL/$path"
local output="$TARGET_DIR/$path"
if [ -f "$output" ]; then
echo " [SKIP] $path (already exists)"
else
echo " [DOWN] $path"
# -L follows redirects, -f fails on 404/server errors
curl -L -f -o "$output" "$url"
if [ $? -ne 0 ]; then
echo " [ERROR] Failed to download $path"
rm -f "$output"
exit 1
fi
fi
}
echo "1. Downloading VAE..."
download_file "vae/diffusion_pytorch_model.safetensors"
download_file "vae/config.json"
echo "2. Downloading Transformer..."
download_file "transformer/diffusion_pytorch_model.safetensors"
download_file "transformer/config.json"
echo "3. Downloading Text Encoder (Qwen3-4B)..."
# Confirmed: This model uses sharded weights
download_file "text_encoder/model-00001-of-00002.safetensors"
download_file "text_encoder/model-00002-of-00002.safetensors"
download_file "text_encoder/config.json"
echo "4. Downloading Tokenizer..."
download_file "tokenizer/tokenizer.json"
download_file "tokenizer/tokenizer_config.json"
download_file "tokenizer/vocab.json"
download_file "tokenizer/merges.txt"
download_file "tokenizer/special_tokens_map.json"
echo "5. Downloading Configs..."
download_file "model_index.json"
echo ""
echo "Success! Models ready in $TARGET_DIR"⚠️ Warning: it downloads ~16GB!
Implementing the Java FFM binding
The FFM layer is where most failures occur if boundaries are unclear. Every allocation, pointer, and lifetime decision must be explicit.
package com.example;
import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;
import org.eclipse.microprofile.config.inject.ConfigProperty;
import com.forest.flux.FluxLib;
public class Flux2Native {
@ConfigProperty(name = "flux.model.path")
String modelsDir;
// run method removed to prevent auto-execution
// internal testing logic can be moved to a test class if needed
public static MemorySegment initialize(String modelPath) {
try (Arena arena = Arena.ofConfined()) {
MemorySegment modelPathNative = arena.allocateFrom(modelPath);
return FluxLib.flux_wrapper_init(modelPathNative);
}
}
public static int generate(MemorySegment context, String prompt, String path, int width, int height, int steps) {
float guidance = 3.5f;
long seed = System.currentTimeMillis();
System.out.printf("Params: %dx%d, steps=%d, guidance=%.2f, seed=%d%n", width, height, steps, guidance, seed);
try (Arena arena = Arena.ofConfined()) {
MemorySegment promptNative = arena.allocateFrom(prompt);
MemorySegment outputPathNative = arena.allocateFrom(path);
return FluxLib.flux_wrapper_generate(context, promptNative, outputPathNative, width, height, steps,
guidance, seed);
}
}
public static void free(MemorySegment context) {
FluxLib.flux_wrapper_free(context);
}
}Temporary strings are allocated in confined arenas and released immediately. The native context itself remains valid until explicitly freed.
Service layer and lifecycle management
Model initialization is expensive and must not happen per request.
package com.example;
import java.lang.foreign.MemorySegment;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.UUID;
import org.eclipse.microprofile.config.inject.ConfigProperty;
import jakarta.annotation.PostConstruct;
import jakarta.annotation.PreDestroy;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class Flux2Service {
@ConfigProperty(name = "flux2.model.path")
String modelPath;
@ConfigProperty(name = "flux2.output.dir", defaultValue = "/tmp/flux2-output")
String outputDir;
private MemorySegment context;
@PostConstruct
void init() {
try {
Files.createDirectories(Path.of(outputDir));
} catch (Exception e) {
throw new IllegalStateException("Failed to create output directory", e);
}
context = Flux2Native.initialize(modelPath);
if (context == null || context.address() == 0) {
throw new IllegalStateException("Flux2 returned a null context");
}
}
@PreDestroy
void shutdown() {
if (context != null && context.address() != 0) {
Flux2Native.free(context);
}
}
public GenerationResult generate(String prompt, int width, int height, int steps) {
String filename = UUID.randomUUID() + ".png";
String path = Path.of(outputDir, filename).toString();
int result = Flux2Native.generate(context, prompt, path, width, height, steps);
if (result != 0) {
throw new IllegalStateException("Flux2 failed with error code " + result);
}
return new GenerationResult(filename, path);
}
public record GenerationResult(String filename, String fullPath) {
}
}This design assumes single-threaded access to the native context. If concurrency is required, a bounded pool of contexts is the correct next step.
GenerateRequest: validating input before native execution
Native image generation does not tolerate sloppy input. If invalid dimensions or empty prompts reach the C layer, you do not get a clean exception — you get undefined behavior.
For that reason, request validation must happen before calling Flux2Service.
Create src/main/java/com/example/GenerateRequest.java:
package com.example;
public record GenerateRequest(
String prompt,
Integer width,
Integer height,
Integer steps
) {
public GenerateRequest {
// Default values
if (width == null) {
width = 512;
}
if (height == null) {
height = 512;
}
if (steps == null) {
steps = 20;
}
// Validation
if (prompt == null || prompt.isBlank()) {
throw new IllegalArgumentException("Prompt must not be empty");
}
if (width < 256 || width > 2048) {
throw new IllegalArgumentException("Width must be between 256 and 2048");
}
if (height < 256 || height > 2048) {
throw new IllegalArgumentException("Height must be between 256 and 2048");
}
if (steps < 1 || steps > 100) {
throw new IllegalArgumentException("Steps must be between 1 and 100");
}
}
}This record serves three purposes:
Boundary enforcement
All untrusted input is normalized and validated before it ever reaches the FFM layer. Native code is never asked to defend itself against bad parameters.Stable defaults
The service and native bindings can assume that width, height, and steps are always present and within expected bounds. No defensive checks are needed downstream.Clear ownership of failure
If validation fails, the error is clearly an API contract violation, not a native execution problem. This keeps error handling predictable and debuggable.
Why validation is not done in the native layer
Flux2 is optimized C code. It assumes inputs are sane. Adding defensive checks there would complicate the code and still not protect the JVM from misuse.
In a Java + FFM setup, the rule of thumb is simple:
Java validates. Native code executes.
Once a call crosses the FFM boundary, it must already be correct.
REST API
The REST layer remains intentionally simple. It performs validation and delegates to the service.
package com.example;
import java.nio.file.Files;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
@Path("/api/generate")
public class ImageGeneratorResource {
@Inject
Flux2Service service;
@POST
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public Response generate(GenerateRequest request) {
var result = service.generate(
request.prompt(),
request.width(),
request.height(),
request.steps());
return Response.ok(
new GenerateResponse(
result.filename(),
"/api/generate/image/" + result.filename(),
"success"))
.build();
}
@GET
@Path("/image/{filename}")
@Produces("image/png")
public Response image(@PathParam("filename") String filename) {
java.nio.file.Path path = java.nio.file.Path.of(service.outputDir, filename);
if (!Files.exists(path)) {
return Response.status(Response.Status.NOT_FOUND).build();
}
return Response.ok(path.toFile()).build();
}
record GenerateResponse(String filename, String url, String status) {
}
}Production considerations
Load behavior
Each request performs synchronous native inference. Requests queue on the HTTP worker threads. This avoids unsafe concurrent access to the native context and makes throughput predictable.
Failure modes
A native segmentation fault terminates the JVM immediately. This is expected behavior. If the process survives a native memory violation, the system is already corrupted.
Platform considerations
CPU-based inference on ARM is viable for many workloads when the model and resolution are chosen carefully. This setup makes those trade-offs explicit instead of hiding them behind opaque services.
Verification
Run the application:
mvn quarkus:devGenerate an image:
curl -X POST http://localhost:8080/api/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "A quiet mountain landscape at sunset",
"width": 768,
"height": 512,
"steps": 25
}'You can see the processing happening in the logfile:
Params: 768x512, steps=25, guidance=3.50, seed=1768930717244
Qwen3 tokenizer loaded (151669 vocab)......Using bf16 weights for GPU acceleration
Denoising timing breakdown:
Step 1: 12839.6 ms
Step 2: 6143.4 ms
Step 3: 6025.7 ms
Step 4: 6032.5 ms
Step 5: 6034.5 ms
Step 6: 6060.5 ms
Step 7: 6034.5 ms
Step 8: 6058.9 ms
Step 9: 6071.5 ms
Step 10: 6103.8 ms
Step 11: 6062.9 ms
Step 12: 6127.1 ms
Step 13: 6080.2 ms
Step 14: 6079.9 ms
Step 15: 6115.5 ms
Step 16: 6123.1 ms
Step 17: 6105.0 ms
Step 18: 6068.6 ms
Step 19: 6088.5 ms
Step 20: 6108.0 ms
Step 21: 6087.4 ms
Step 22: 5872.8 ms
Step 23: 5999.1 ms
Step 24: 6089.8 ms
Step 25: 6086.0 ms
Total denoising: 158498.8 ms (158.50 s)
Transformer breakdown:
Double blocks: 34578.5 ms (21.8%)
Single blocks: 123630.7 ms (78.1%)
Final layer: 84.2 ms (0.1%)
Total: 158293.5 msAnd you get the following answer:
{"filename":"<filename>","url":"/api/generate/image/<filename>.png","status":"success"}% Retrieve the result:
curl http://localhost:8080/api/generate/image/<filename>.png -o image.pngWOW! This took a while but it finally worked. And all through Java!
Conclusion
This tutorial demonstrates how to integrate a native image generation model directly into a Java application using Quarkus and the Java FFM API. The result is a local inference service with explicit memory ownership, predictable behavior, and minimal infrastructure dependencies.
For Java developers interested in local AI workloads, this approach provides a viable alternative to external inference stacks and service-based integrations.




