Local Image Generation in Java: Quarkus, FFM, and Native AI

A deep, hands-on guide to running modern image models inside the JVM using Project Panama and native shared libraries.

Feb 15, 2026

My interest in image generation started, like for many developers, with tools such as Midjourney. The quality jump over the last few years has been obvious, and the creative potential is undeniable. From a technical perspective, however, most of these systems are black boxes. You send a prompt over the network, wait, and get an image back. The internals remain opaque.

Trying to reproduce similar capabilities locally quickly exposes the real problem. Modern image generation stacks are complex. They often assume Python-first workflows, CUDA-capable GPUs, and large dependency graphs that are hard to reason about and harder to operate long-term. Even when local inference is possible, setup tends to be fragile, poorly documented, and tightly coupled to specific hardware configurations.

This becomes more obvious on non-x86 systems. On Apple Silicon and other ARM-based machines, many inference paths are either unavailable or second-class. GPU acceleration is inconsistent, and CPU-based inference is often dismissed as impractical, even when the workload characteristics would allow it. As a result, there are missed opportunities for local, predictable inference on hardware that developers already use every day.

From a Java perspective, the situation is even more constrained. Native image models are typically treated as external services, accessed through REST APIs or sidecar processes. This adds latency, operational overhead, and another failure boundary. More importantly, it prevents the JVM from having any real ownership over model lifecycle, memory usage, or concurrency behavior.

This tutorial explores a different approach. We integrate a native image generation model directly into a Java application using Quarkus and the Java Foreign Function and Memory API from Project Panama. The focus is on making native inference a first-class part of a Java service, with explicit memory management and well-defined boundaries.

Prerequisites

This setup crosses the Java and native boundary. You need a working toolchain on both sides.

Java 25 (FFM is preview in 21 and stable from 22)
jextract version compatible with your selected Java version (!)
Apache Maven 3.8 or newer
GCC or compatible C compiler
Basic understanding of C pointers and memory ownership
Eight gigabytes of RAM or more for image generation
Approximately forty-five minutes of focused time

Project setup

We build a Quarkus-based REST service that loads a native Flux2 model at startup and exposes a synchronous image generation endpoint. The service runs entirely locally and performs inference in-process.

Create the project or follow along with the source-code repository:

mvn io.quarkus:quarkus-maven-plugin:create \
  -DprojectGroupId=com.example \
  -DprojectArtifactId=flux2-quarkus \
  -DnoCode=true \
  -DjavaVersion=25
cd flux2-quarkus

Enable preview features in pom.xml:

<build>
  <plugins>
    <plugin>
      <groupId>io.quarkus</groupId>
      <artifactId>quarkus-maven-plugin</artifactId>
      <configuration>
        <compilerArgs>
          <arg>--enable-preview</arg>
        </compilerArgs>
      </configuration>
    </plugin>
  </plugins>
</build>

Without this flag, the FFM API compiles but fails at runtime.

One little twek in application.properties. Point to your model directory. We will download the models to it later:

# Flux2Native Configuration
flux.model.path=/Users/youruser/flux2-quarkus/src/main/resources/models

Building the native Flux2 library

FLUX.2-klein-4B Pure C Implementation is a compact C-based image generation implementation. Before Java can call it, the native library must be compiled as a shared object.

The project documentation recommends building the Apple Silicon backend using:

make mps

This produces a fast, native executable intended for interactive or batch use from the command line. For Java FFM integration, however, this build target is not usable directly. The Foreign Function and Memory API requires a different artifact and a different mental model.

Why FFM requires a shared library

Java’s Foreign Function and Memory API allows Java code to call native C functions inside the same process as the JVM. There is no process boundary and no command-line invocation. The JVM loads native machine code into its own address space and invokes functions via stable ABI symbols.

To do this, the JVM must load a shared library:

.dylib on macOS
.so on Linux

Executables cannot be loaded this way. They are designed to be launched by the operating system as independent processes with their own main() function, stack, and lifecycle.

Other native artifact types are also unsuitable:

Executables (./flux)
These are entry-point driven programs. They start, parse arguments, perform work, and exit. Java would have to invoke them using ProcessBuilder, serialize data through files or pipes, and reload the model on every invocation.
Static libraries (.a)
These are linked at compile time into another native binary. Java does not perform native linking during compilation, so static libraries cannot be consumed by the JVM.
Shared libraries (.dylib, .so)
These are explicitly designed to be loaded dynamically by a running process. The JVM can resolve symbols, call functions, and keep native state alive for as long as the process runs.

How this differs from `make mps`

The make mps target produces a standalone CLI executable. Conceptually, it looks like this:

Contains a main() function
Parses command-line arguments
Loads the model
Generates one image
Terminates

This is ideal for experimentation or scripting, but it forces a very different execution model. Every invocation is isolated. The model is loaded and unloaded repeatedly. Communication happens through files or standard streams. Error handling is coarse-grained.

In contrast, the shared library build produces something fundamentally different:

No main() function
Exposes callable functions such as flux2_init, flux2_generate, and flux2_free
Can be loaded once and reused for the lifetime of the JVM
Keeps model weights resident in memory
Allows direct parameter passing without intermediate serialization

In this setup, the Java application becomes the “main” program. Quarkus controls startup, shutdown, concurrency, and resource management. Native code becomes an implementation detail rather than the controlling process.

What we are building instead

Rather than invoking Flux2 as a program, we compile it into a shared library:

git clone https://github.com/antirez/flux2.c.git
cd flux2.c


clang -dynamiclib -o libflux.dylib \
    flux.c flux_kernels.c flux_tokenizer.c flux_vae.c \
    flux_transformer.c flux_sample.c flux_image.c \
    flux_safetensors.c flux_qwen3.c flux_qwen3_tokenizer.c \
    flux_metal.m \
    -Wall -Wextra -O3 -march=native -ffast-math \
    -DUSE_BLAS -DUSE_METAL -DACCELERATE_NEW_LAPACK \
    -fobjc-arc \
    -framework Accelerate -framework Metal -framework MetalPerformanceShaders -framework Foundation

Defining a stable native boundary

Rather than binding against the entire implementation, we are going to define a minimal header that expresses only what Java needs. Using flux.h directly with Java FFM is suboptimal because:

Missing Initialization: flux.h does not expose flux_metal_init(), which is required for GPU acceleration (it’s in flux_metal.h).
Struct Complexity: Passing flux_params by value or pointer requires mirroring that struct’s layout in Java. A flat API is much easier to bind.
Memory Management: flux_generate returns a flux_image* that you manually have to free. A “fire-and-forget” function that saves directly to disk is fewer FFM calls.
Split Headers: Important globals like flux_step_callback are hidden in flux_kernels.h.

Create: flux_wrapper.h

#ifndef FLUX_WRAPPER_H
#define FLUX_WRAPPER_H

#ifdef __cplusplus
extern "C" {
#endif

typedef struct flux_ctx flux_ctx;

// 1. Initialize Metal (if compiled with it) AND load the model
flux_ctx* flux_wrapper_init(const char* model_path);

// 2. Generate and save directly to file (simplest for Java)
// Returns 0 on success, non-zero on failure.
int flux_wrapper_generate(
    flux_ctx* ctx,
    const char* prompt,
    const char* output_path,
    int width,
    int height,
    int steps,
    float guidance,
    long seed
);

// 3. Cleanup
void flux_wrapper_free(flux_ctx* ctx);

#ifdef __cplusplus
}
#endif

#endif

Create: flux_wrapper.c

#include "flux_wrapper.h"
#include "flux.h"
#include "flux_metal.h" // Needed for flux_metal_init
#include <stddef.h>

flux_ctx* flux_wrapper_init(const char* model_path) {
    #ifdef USE_METAL
    flux_metal_init();
    #endif
    return flux_load_dir(model_path);
}

int flux_wrapper_generate(
    flux_ctx* ctx,
    const char* prompt,
    const char* output_path,
    int width,
    int height,
    int steps,
    float guidance,
    long seed
) {
    if (!ctx || !prompt || !output_path) return -1;

    flux_params params = FLUX_PARAMS_DEFAULT;
    params.width = width;
    params.height = height;
    params.num_steps = steps;
    params.guidance_scale = guidance;
    params.seed = seed;

    flux_image* img = flux_generate(ctx, prompt, &params);
    if (!img) return -2;

    int res = flux_image_save(img, output_path);
    flux_image_free(img);
    return res;
}

void flux_wrapper_free(flux_ctx* ctx) {
    flux_free(ctx);
    // Note: flux_metal_cleanup() could be called here if you want strict cleanup
}

This isolates all the C-specific headers (flux.h, flux_metal.h) from your Java world. You only need to generate bindings for flux_wrapper.h.

While we are here, make sure to also build the metal shaders:

# download toolchain if you have not (>700MB)
xcodebuild -downloadComponent MetalToolchain

# Build the metal shaders
xcrun -sdk macosx metal -o default.metallib flux_shaders.metal

There is a teensy little hiccup in flux_metal.m where it assumes the application bundle has the shader resource. In a Java environment, this returns nil and crashes the array creation. So we need to fix this.

File: flux2.c/flux_metal.m Lines 1506-1510:

/* Try to find the shader file in various locations */
        NSString *shaderPath = nil;
        NSMutableArray *searchPaths = [NSMutableArray arrayWithObjects:
            @"flux_shaders.metal",
            @"./flux_shaders.metal",
            nil];
            
        NSString *bundlePath = [[NSBundle mainBundle] pathForResource:@"flux_shaders" ofType:@"metal"];
        if (bundlePath) {
            [searchPaths addObject:bundlePath];
        }

Now we need to recompile the dynamic library. Let’s update the build command to include flux_wrapper.c alongside the other source files. And we use the same command as before:

clang -dynamiclib -o libflux.dylib \
    flux_wrapper.c flux.c flux_kernels.c flux_tokenizer.c flux_vae.c \
    flux_transformer.c flux_sample.c flux_image.c \
    flux_safetensors.c flux_qwen3.c flux_qwen3_tokenizer.c \
    flux_metal.m \
    -Wall -Wextra -O3 -march=native -ffast-math \
    -DUSE_BLAS -DUSE_METAL -DACCELERATE_NEW_LAPACK \
    -fobjc-arc \
    -framework Accelerate -framework Metal -framework MetalPerformanceShaders -framework Foundation

And now, finally back to Java. Let’s generate the bindings with jextract.

jextract --output src \
  --target-package com.forest.flux \
  --library flux \
  -l flux \
  --header-class-name FluxLib \
  flux_wrapper.h

The resulting bindings land in: src/com/forest/flux/FluxLib.java (and related files). Copy them over to your Quarkus project you created earlier.

⚠️ I have written about challenges with MacOS library loading and my way of patching together demos before:

Java Meets Whisper: Speech-to-Text with Quarkus and the FFM API

Markus Eisele

Jan 19

Read full story

⚠️ Because of this, make sure to patch the library loading accordingly and make it point to your libflux.dylib (FluxLib, roughly around line 23ff).

Implementing

The next big step is to download the relevant model files. You can either do this via the download_model.py script or use the following little bash script:

#!/bin/bash

# Target directory in your Quarkus project
TARGET_DIR="/Users/meisele/Projects/the-main-thread/flux2-quarkus/src/main/resources/models"
# Or use relative path if running from project root:
# TARGET_DIR="./src/main/resources/models"

REPO_URL="https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/resolve/main"

echo "Downloading FLUX.2 Klein models to $TARGET_DIR..."

# Create directories
mkdir -p "$TARGET_DIR/vae"
mkdir -p "$TARGET_DIR/transformer"
mkdir -p "$TARGET_DIR/text_encoder"
mkdir -p "$TARGET_DIR/tokenizer"
mkdir -p "$TARGET_DIR/scheduler"

# Helper function
download_file() {
    local path=$1
    local url="$REPO_URL/$path"
    local output="$TARGET_DIR/$path"
    
    if [ -f "$output" ]; then
        echo "  [SKIP] $path (already exists)"
    else
        echo "  [DOWN] $path"
        # -L follows redirects, -f fails on 404/server errors
        curl -L -f -o "$output" "$url"
        
        if [ $? -ne 0 ]; then
             echo "  [ERROR] Failed to download $path"
             rm -f "$output"
             exit 1
        fi
    fi
}

echo "1. Downloading VAE..."
download_file "vae/diffusion_pytorch_model.safetensors"
download_file "vae/config.json"

echo "2. Downloading Transformer..."
download_file "transformer/diffusion_pytorch_model.safetensors"
download_file "transformer/config.json"

echo "3. Downloading Text Encoder (Qwen3-4B)..."
# Confirmed: This model uses sharded weights
download_file "text_encoder/model-00001-of-00002.safetensors"
download_file "text_encoder/model-00002-of-00002.safetensors"
download_file "text_encoder/config.json"

echo "4. Downloading Tokenizer..."
download_file "tokenizer/tokenizer.json"
download_file "tokenizer/tokenizer_config.json"
download_file "tokenizer/vocab.json"
download_file "tokenizer/merges.txt"
download_file "tokenizer/special_tokens_map.json"

echo "5. Downloading Configs..."
download_file "model_index.json"

echo ""
echo "Success! Models ready in $TARGET_DIR"

⚠️ Warning: it downloads ~16GB!

Implementing the Java FFM binding

The FFM layer is where most failures occur if boundaries are unclear. Every allocation, pointer, and lifetime decision must be explicit.

package com.example;

import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;

import org.eclipse.microprofile.config.inject.ConfigProperty;

import com.forest.flux.FluxLib;

public class Flux2Native {

    @ConfigProperty(name = "flux.model.path")
    String modelsDir;

    // run method removed to prevent auto-execution
    // internal testing logic can be moved to a test class if needed

    public static MemorySegment initialize(String modelPath) {
        try (Arena arena = Arena.ofConfined()) {
            MemorySegment modelPathNative = arena.allocateFrom(modelPath);
            return FluxLib.flux_wrapper_init(modelPathNative);
        }
    }

    public static int generate(MemorySegment context, String prompt, String path, int width, int height, int steps) {
        float guidance = 3.5f;
        long seed = System.currentTimeMillis();
        System.out.printf("Params: %dx%d, steps=%d, guidance=%.2f, seed=%d%n", width, height, steps, guidance, seed);

        try (Arena arena = Arena.ofConfined()) {
            MemorySegment promptNative = arena.allocateFrom(prompt);
            MemorySegment outputPathNative = arena.allocateFrom(path);

            return FluxLib.flux_wrapper_generate(context, promptNative, outputPathNative, width, height, steps,
                    guidance, seed);
        }
    }

    public static void free(MemorySegment context) {
        FluxLib.flux_wrapper_free(context);
    }
}

Temporary strings are allocated in confined arenas and released immediately. The native context itself remains valid until explicitly freed.

Service layer and lifecycle management

Model initialization is expensive and must not happen per request.

package com.example;

import java.lang.foreign.MemorySegment;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.UUID;

import org.eclipse.microprofile.config.inject.ConfigProperty;

import jakarta.annotation.PostConstruct;
import jakarta.annotation.PreDestroy;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class Flux2Service {

    @ConfigProperty(name = "flux2.model.path")
    String modelPath;

    @ConfigProperty(name = "flux2.output.dir", defaultValue = "/tmp/flux2-output")
    String outputDir;

    private MemorySegment context;

    @PostConstruct
    void init() {
        try {
            Files.createDirectories(Path.of(outputDir));
        } catch (Exception e) {
            throw new IllegalStateException("Failed to create output directory", e);
        }

        context = Flux2Native.initialize(modelPath);

        if (context == null || context.address() == 0) {
            throw new IllegalStateException("Flux2 returned a null context");
        }
    }

    @PreDestroy
    void shutdown() {
        if (context != null && context.address() != 0) {
            Flux2Native.free(context);
        }
    }

    public GenerationResult generate(String prompt, int width, int height, int steps) {
        String filename = UUID.randomUUID() + ".png";
        String path = Path.of(outputDir, filename).toString();

        int result = Flux2Native.generate(context, prompt, path, width, height, steps);

        if (result != 0) {
            throw new IllegalStateException("Flux2 failed with error code " + result);
        }

        return new GenerationResult(filename, path);
    }

    public record GenerationResult(String filename, String fullPath) {
    }
}

This design assumes single-threaded access to the native context. If concurrency is required, a bounded pool of contexts is the correct next step.

GenerateRequest: validating input before native execution

Native image generation does not tolerate sloppy input. If invalid dimensions or empty prompts reach the C layer, you do not get a clean exception — you get undefined behavior.

For that reason, request validation must happen before calling Flux2Service.

Create src/main/java/com/example/GenerateRequest.java:

package com.example;

public record GenerateRequest(
    String prompt,
    Integer width,
    Integer height,
    Integer steps
) {

    public GenerateRequest {
        // Default values
        if (width == null) {
            width = 512;
        }
        if (height == null) {
            height = 512;
        }
        if (steps == null) {
            steps = 20;
        }

        // Validation
        if (prompt == null || prompt.isBlank()) {
            throw new IllegalArgumentException("Prompt must not be empty");
        }

        if (width < 256 || width > 2048) {
            throw new IllegalArgumentException("Width must be between 256 and 2048");
        }

        if (height < 256 || height > 2048) {
            throw new IllegalArgumentException("Height must be between 256 and 2048");
        }

        if (steps < 1 || steps > 100) {
            throw new IllegalArgumentException("Steps must be between 1 and 100");
        }
    }
}

This record serves three purposes:

Boundary enforcement
All untrusted input is normalized and validated before it ever reaches the FFM layer. Native code is never asked to defend itself against bad parameters.
Stable defaults
The service and native bindings can assume that width, height, and steps are always present and within expected bounds. No defensive checks are needed downstream.
Clear ownership of failure
If validation fails, the error is clearly an API contract violation, not a native execution problem. This keeps error handling predictable and debuggable.

Why validation is not done in the native layer

Flux2 is optimized C code. It assumes inputs are sane. Adding defensive checks there would complicate the code and still not protect the JVM from misuse.

In a Java + FFM setup, the rule of thumb is simple:

Java validates. Native code executes.

Once a call crosses the FFM boundary, it must already be correct.

REST API

The REST layer remains intentionally simple. It performs validation and delegates to the service.

package com.example;

import java.nio.file.Files;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;

@Path("/api/generate")
public class ImageGeneratorResource {

    @Inject
    Flux2Service service;

    @POST
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces(MediaType.APPLICATION_JSON)
    public Response generate(GenerateRequest request) {
        var result = service.generate(
                request.prompt(),
                request.width(),
                request.height(),
                request.steps());

        return Response.ok(
                new GenerateResponse(
                        result.filename(),
                        "/api/generate/image/" + result.filename(),
                        "success"))
                .build();
    }

    @GET
    @Path("/image/{filename}")
    @Produces("image/png")
    public Response image(@PathParam("filename") String filename) {
        java.nio.file.Path path = java.nio.file.Path.of(service.outputDir, filename);

        if (!Files.exists(path)) {
            return Response.status(Response.Status.NOT_FOUND).build();
        }

        return Response.ok(path.toFile()).build();
    }

    record GenerateResponse(String filename, String url, String status) {
    }
}

Production considerations

Load behavior

Each request performs synchronous native inference. Requests queue on the HTTP worker threads. This avoids unsafe concurrent access to the native context and makes throughput predictable.

Failure modes

A native segmentation fault terminates the JVM immediately. This is expected behavior. If the process survives a native memory violation, the system is already corrupted.

Platform considerations

CPU-based inference on ARM is viable for many workloads when the model and resolution are chosen carefully. This setup makes those trade-offs explicit instead of hiding them behind opaque services.

Verification

Run the application:

mvn quarkus:dev

Generate an image:

curl -X POST http://localhost:8080/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A quiet mountain landscape at sunset",
    "width": 768,
    "height": 512,
    "steps": 25
  }'

You can see the processing happening in the logfile:

Params: 768x512, steps=25, guidance=3.50, seed=1768930717244
Qwen3 tokenizer loaded (151669 vocab)......Using bf16 weights for GPU acceleration
Denoising timing breakdown:
  Step 1: 12839.6 ms
  Step 2: 6143.4 ms
  Step 3: 6025.7 ms
  Step 4: 6032.5 ms
  Step 5: 6034.5 ms
  Step 6: 6060.5 ms
  Step 7: 6034.5 ms
  Step 8: 6058.9 ms
  Step 9: 6071.5 ms
  Step 10: 6103.8 ms
  Step 11: 6062.9 ms
  Step 12: 6127.1 ms
  Step 13: 6080.2 ms
  Step 14: 6079.9 ms
  Step 15: 6115.5 ms
  Step 16: 6123.1 ms
  Step 17: 6105.0 ms
  Step 18: 6068.6 ms
  Step 19: 6088.5 ms
  Step 20: 6108.0 ms
  Step 21: 6087.4 ms
  Step 22: 5872.8 ms
  Step 23: 5999.1 ms
  Step 24: 6089.8 ms
  Step 25: 6086.0 ms
  Total denoising: 158498.8 ms (158.50 s)
  Transformer breakdown:
    Double blocks: 34578.5 ms (21.8%)
    Single blocks: 123630.7 ms (78.1%)
    Final layer:   84.2 ms (0.1%)
    Total:         158293.5 ms

And you get the following answer:

 {"filename":"<filename>","url":"/api/generate/image/<filename>.png","status":"success"}%

Retrieve the result:

curl http://localhost:8080/api/generate/image/<filename>.png -o image.png

WOW! This took a while but it finally worked. And all through Java!

Conclusion

This tutorial demonstrates how to integrate a native image generation model directly into a Java application using Quarkus and the Java FFM API. The result is a local inference service with explicit memory ownership, predictable behavior, and minimal infrastructure dependencies.

For Java developers interested in local AI workloads, this approach provides a viable alternative to external inference stacks and service-based integrations.

Java Meets Whisper: Speech-to-Text with Quarkus and the FFM API

Discussion about this post

Ready for more?

Local Image Generation in Java: Quarkus, FFM, and Native AI

A deep, hands-on guide to running modern image models inside the JVM using Project Panama and native shared libraries.

Prerequisites

Project setup

Building the native Flux2 library

Why FFM requires a shared library

How this differs from make mps

What we are building instead

Defining a stable native boundary

Java Meets Whisper: Speech-to-Text with Quarkus and the FFM API

Implementing

Implementing the Java FFM binding

Service layer and lifecycle management

GenerateRequest: validating input before native execution

Why validation is not done in the native layer

REST API

Production considerations

Load behavior

Failure modes

Platform considerations

Verification

Conclusion

Discussion about this post

Ready for more?

How this differs from `make mps`