Your AI Model Isn’t Wrong — Your Tokenizer Is
A hands-on Java and Quarkus tutorial on building an emotion detection API and uncovering the most common silent failure in NLP systems
I have two teenagers. They are smart, funny, opinionated, and endlessly impressive human beings. They are also living proof that text is one of the worst mediums we have for understanding emotional state.
A message like “ok” can mean agreement, exhaustion, irritation, or a silent escalation. “Sure.” might be genuine cooperation or passive resistance. And “I’m fine” almost never means fine. If you’ve ever stared at your phone trying to decide whether to step in, wait, or brace for impact, you know the feeling.
The difficulty isn’t teenage moodiness. It’s ambiguity. The signal is weak, the context is missing, and the cost of misreading it is real. Push when you should pause and you escalate. Ignore subtle warning signs and you miss the moment entirely. Over time, you build instincts. Pattern recognition. A mental model of tone that goes far beyond the literal words.
That instinct is learned. It’s contextual. And it’s fragile.
The Same Problem Exists in Enterprise Software
Now replace family chat with a business inbox.
Customer support tickets. Slack messages during an incident. Sales emails. Feedback forms. Postmortems written under stress. Every system we build treats text as neutral, but everyone who has worked in operations knows tone matters long before facts do.
A calm sentence can hide deep frustration. A polite reply can signal resignation. A short message during an outage might mean panic, not efficiency. Systems miss this constantly, not because teams don’t care, but because software is blind to subtext.
The consequences aren’t emotional. They’re operational.
Tickets get routed incorrectly. Escalations happen too late. Customers churn not because of bugs, but because they felt unheard. Automation pipelines handle human language like structured data and quietly get it wrong.
We’ve built highly sophisticated systems that still read text like it’s 1999.
Why This Is an Engineering Problem
Humans read tone because we carry context. Machines don’t, unless we teach them how to approximate it.
Modern language models don’t “understand emotions,” but they are extremely good at recognizing statistical patterns in how people express them. Word choice, punctuation, contractions, casing, rhythm. These patterns correlate strongly with emotional states.
But there’s a trap.
If you feed these models sloppy inputs, they return confident nonsense. If you cut corners on text preparation, the system looks fine until it matters. That’s how AI systems fail quietly.
This tutorial exists to avoid that failure.
What We’re Building
We’re building a small but real service that answers a simple question:
What is the emotional vibe of this text?
It’s a REST API built with Quarkus, running a RoBERTa-based emotion model locally using ONNX Runtime. We’ll start with a simplified tokenizer to understand the mechanics, then deliberately replace it with a production-grade tokenizer so readers see exactly where correctness lives.
This is not a demo that pretends AI is magic. It’s an engineering system.
Prerequisites
You need Java 21 or newer, Maven, and a machine with at least 2 GB of RAM. No ML background is required, but you do need curiosity and patience.
Bootstrapping the Project
We start with a clean Quarkus REST application. You can either follow along or grab a copy from my Github repository.
quarkus create app com.vibecheck:vibe-api \
--extension=quarkus-rest-jackson
cd vibe-apiThis gives us a fast startup REST stack with Jackson support. The goal is clarity around inference, not complexity.
Adding ONNX Runtime
ONNX Runtime is the engine that executes our model inside the JVM.
Add this dependency:
<dependency>
<groupId>com.microsoft.onnxruntime</groupId>
<artifactId>onnxruntime</artifactId>
<version>1.23.2</version>
</dependency>ONNX exists to decouple training from execution. The model we’ll run was trained in Python, exported once, and now lives inside a Java service without Python in production. That separation matters.
Downloading the Model and Tokenizer Assets
Create the model directory.
mkdir -p src/main/resources/model
cd src/main/resources/modelDownload the quantized GoEmotions model.
curl -L -o model.onnx \
https://huggingface.co/SamLowe/roberta-base-go_emotions-onnx/resolve/main/onnx/model_quantized.onnxNow download the tokenizer artifacts.
curl -L -o vocab.json \
https://huggingface.co/roberta-base/resolve/main/vocab.json
curl -L -o merges.txt \
https://huggingface.co/roberta-base/resolve/main/merges.txtThese files are not optional. If the tokenizer does not match the model exactly, your inference results are undefined. Not slightly worse. Undefined.
Why Tokenization Is the Real Model
Transformer models don’t see text. They see token IDs.
The mapping from text to IDs defines the semantic space the model operates in. RoBERTa uses byte-level Byte Pair Encoding. That means punctuation, casing, whitespace, and Unicode characters all matter.
“I’m fine” and “im fine” are not the same input.
A naive tokenizer destroys this structure. The model still runs, but it’s now operating on distorted embeddings. This is how you get confident but wrong results.
We will create two different paths for tokenizers. Let’s start with a simple interface:
package com.vibecheck.tokenizer;
public interface Tokenizer {
long[] encode(String text, int maxLength);
}and a tokenizer CDI qualifier:
package com.vibecheck.tokenizer;
import static java.lang.annotation.ElementType.FIELD;
import static java.lang.annotation.ElementType.METHOD;
import static java.lang.annotation.ElementType.PARAMETER;
import static java.lang.annotation.ElementType.TYPE;
import static java.lang.annotation.RetentionPolicy.RUNTIME;
import java.lang.annotation.Retention;
import java.lang.annotation.Target;
import jakarta.inject.Qualifier;
@Qualifier
@Retention(RUNTIME)
@Target({ METHOD, FIELD, PARAMETER, TYPE })
public @interface TokenizerType {
public enum Type {
SIMPLE, HUGGINGFACE
}
Type value();
}Path A: A Simplified but Honest Tokenizer
We start with a simplified tokenizer for learning purposes. It preserves structure but explicitly does not claim production correctness.
Create SimpleRobertaTokenizer.java.
package com.vibecheck.tokenizer;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;
import com.fasterxml.jackson.databind.ObjectMapper;
import jakarta.enterprise.context.ApplicationScoped;
import com.vibecheck.tokenizer.TokenizerType.Type;
@ApplicationScoped
@TokenizerType(Type.SIMPLE)
public class SimpleRobertaTokenizer implements Tokenizer {
private static final int CLS = 0;
private static final int PAD = 1;
private static final int SEP = 2;
private final Map<String, Integer> vocab;
private final Map<String, Integer> bpeRanks;
private static final Pattern TOKEN_PATTERN = Pattern
.compile("'s|'t|'re|'ve|'m|'ll|'d|\\p{L}+|\\p{N}+|[^\\s\\p{L}\\p{N}]+|\\s+");
public SimpleRobertaTokenizer() throws IOException {
this.vocab = loadVocab();
this.bpeRanks = loadBpeRanks();
}
@Override
public long[] encode(String text, int maxLength) {
List<Integer> tokenIds = new ArrayList<>();
tokenIds.add(CLS);
List<String> tokens = tokenize(text);
List<String> bpeTokens = bpe(tokens);
for (String token : bpeTokens) {
tokenIds.add(vocab.getOrDefault(token, vocab.get("<unk>")));
}
tokenIds.add(SEP);
if (tokenIds.size() > maxLength) {
tokenIds = tokenIds.subList(0, maxLength);
tokenIds.set(maxLength - 1, SEP);
}
while (tokenIds.size() < maxLength) {
tokenIds.add(PAD);
}
return tokenIds.stream().mapToLong(Integer::longValue).toArray();
}
private List<String> tokenize(String text) {
List<String> tokens = new ArrayList<>();
var matcher = TOKEN_PATTERN.matcher(text);
while (matcher.find()) {
tokens.add(matcher.group());
}
return tokens;
}
private List<String> bpe(List<String> tokens) {
List<String> result = new ArrayList<>();
for (String token : tokens) {
if (token.trim().isEmpty())
continue;
List<String> word = new ArrayList<>();
for (char c : token.toCharArray()) {
word.add(String.valueOf(c));
}
while (word.size() > 1) {
String bestPair = null;
int bestRank = Integer.MAX_VALUE;
for (int i = 0; i < word.size() - 1; i++) {
String pair = word.get(i) + " " + word.get(i + 1);
Integer rank = bpeRanks.get(pair);
if (rank != null && rank < bestRank) {
bestRank = rank;
bestPair = pair;
}
}
if (bestPair == null)
break;
String[] parts = bestPair.split(" ");
List<String> newWord = new ArrayList<>();
int i = 0;
while (i < word.size()) {
if (i < word.size() - 1 &&
word.get(i).equals(parts[0]) &&
word.get(i + 1).equals(parts[1])) {
newWord.add(parts[0] + parts[1]);
i += 2;
} else {
newWord.add(word.get(i));
i++;
}
}
word = newWord;
}
result.addAll(word);
}
return result;
}
private Map<String, Integer> loadVocab() throws IOException {
var json = Files.readString(
Path.of("src/main/resources/model/vocab.json"),
StandardCharsets.UTF_8);
return new ObjectMapper().readValue(json, Map.class);
}
private Map<String, Integer> loadBpeRanks() throws IOException {
var lines = Files.readAllLines(
Path.of("src/main/resources/model/merges.txt"),
StandardCharsets.UTF_8);
Map<String, Integer> ranks = new HashMap<>();
int rank = 0;
for (String line : lines) {
if (line.startsWith("#"))
continue;
ranks.put(line, rank++);
}
return ranks;
}
}This tokenizer is intentionally incomplete. It preserves structure but skips byte-level encoding. It’s good enough to demonstrate behavior differences, but it will fail silently with Unicode, emojis, and copied text.
That’s the point.
Some DTOs first
Let’s be explicit here and use two Records:
package com.vibecheck;
public record VibeRequest(String text) {
}package com.vibecheck;
import java.util.Map;
public record VibeResponse(String text, String topEmotion, float confidence, Map<String, Float> allEmotions,
String vibeCheck) {
}Building the Inference Service
Now we wire the tokenizer into a real inference pipeline.
package com.vibecheck;
import java.nio.FloatBuffer;
import java.nio.LongBuffer;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import com.vibecheck.tokenizer.Tokenizer;
import com.vibecheck.tokenizer.TokenizerType;
import ai.onnxruntime.OnnxTensor;
import ai.onnxruntime.OrtEnvironment;
import ai.onnxruntime.OrtSession;
import jakarta.annotation.PostConstruct;
import jakarta.annotation.PreDestroy;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class VibeService {
private OrtEnvironment env;
private OrtSession session;
@jakarta.inject.Inject
@TokenizerType(TokenizerType.Type.SIMPLE)
private Tokenizer tokenizer;
private static final String[] EMOTIONS = {
"admiration", "amusement", "anger", "annoyance", "approval", "caring",
"confusion", "curiosity", "desire", "disappointment", "disapproval",
"disgust", "embarrassment", "excitement", "fear", "gratitude", "grief",
"joy", "love", "nervousness", "optimism", "pride", "realization", "relief",
"remorse", "sadness", "surprise", "neutral"
};
@PostConstruct
void init() throws Exception {
env = OrtEnvironment.getEnvironment();
session = env.createSession(
"src/main/resources/model/model.onnx",
new OrtSession.SessionOptions());
}
public VibeResponse checkVibe(String text) throws Exception {
long[] inputIds = tokenizer.encode(text, 128);
long[] attention = Arrays.stream(inputIds).map(i -> i == 1 ? 0 : 1).toArray();
try (
var idsTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(inputIds), new long[] { 1, 128 });
var maskTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(attention), new long[] { 1, 128 });
var results = session.run(Map.of(
"input_ids", idsTensor,
"attention_mask", maskTensor))) {
FloatBuffer scores = ((OnnxTensor) results.get(0)).getFloatBuffer();
return buildResponse(text, scores);
}
}
private VibeResponse buildResponse(String text, FloatBuffer scores) {
Map<String, Float> all = new HashMap<>();
float max = Float.NEGATIVE_INFINITY;
String top = "";
for (int i = 0; i < EMOTIONS.length; i++) {
float v = sigmoid(scores.get(i));
all.put(EMOTIONS[i], v);
if (v > max) {
max = v;
top = EMOTIONS[i];
}
}
return new VibeResponse(text, top, max, all, vibe(top, max));
}
private float sigmoid(float x) {
return (float) (1 / (1 + Math.exp(-x)));
}
private String vibe(String emotion, float confidence) {
if (confidence < 0.5)
return "🤷 Mixed signals";
return switch (emotion) {
case "joy", "amusement", "excitement" -> "✨ Positive vibes";
case "anger", "annoyance", "disgust" -> "🔥 Not great";
case "sadness", "grief" -> "😢 Heavy mood";
case "neutral" -> "😐 Neutral";
default -> "🎭 Complex emotions";
};
}
@PreDestroy
void shutdown() throws Exception {
session.close();
env.close();
}
}REST Endpoint
package com.vibecheck;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
@Path("/vibe-check")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public class VibeResource {
@Inject
VibeService service;
@POST
public Response check(VibeRequest request) throws Exception {
return Response.ok(service.checkVibe(request.text())).build();
}
}Running and Testing
quarkus devThen try:
curl -X POST http://localhost:8080/vibe-check \
-H "Content-Type: application/json" \
-d '{"text":"im fine"}' | jqResult
{
"text": "im fine",
"topEmotion": "neutral",
"confidence": 0.96150255,
"allEmotions": {
"love": 0.0026431507,
"optimism": 0.0016650114,
"annoyance": 0.0056820507,
"disapproval": 0.0029068636,
"sadness": 0.0025263545,
"nervousness": 0.00039797724,
"anger": 0.0023700353,
"disappointment": 0.0035882709,
"joy": 0.0030811338,
"relief": 0.00051024696,
"remorse": 0.00047306655,
"fear": 0.001768196,
"grief": 0.0005620608,
"surprise": 0.0013851494,
"desire": 0.0013457239,
"curiosity": 0.0011385276,
"approval": 0.017554253,
"neutral": 0.96150255,
"excitement": 0.0037220959,
"embarrassment": 0.00088763324,
"realization": 0.00573335,
"admiration": 0.007823096,
"caring": 0.0009268841,
"gratitude": 0.0014208292,
"pride": 0.0006876686,
"disgust": 0.0029038852,
"confusion": 0.0018910671,
"amusement": 0.0019244894
},
"vibeCheck": "😐 Neutral"
}Now repeat with:
curl -X POST http://localhost:8080/vibe-check \
-H "Content-Type: application/json" \
-d '{"text":"I’m fine."}' | jqResult:
{
"text": "I’m fine.",
"topEmotion": "approval",
"confidence": 0.6140154,
"allEmotions": {
"love": 0.0023572529,
"optimism": 0.009997365,
"annoyance": 0.005406595,
"disapproval": 0.0044844854,
"sadness": 0.0021254062,
"nervousness": 0.0009044541,
"anger": 0.001010634,
"disappointment": 0.0013499215,
"joy": 0.010682897,
"relief": 0.009010835,
"remorse": 0.0005849007,
"fear": 0.0009088966,
"grief": 0.0004701472,
"surprise": 0.00068676873,
"desire": 0.0016231633,
"curiosity": 0.0011422085,
"approval": 0.6140154,
"neutral": 0.29402748,
"excitement": 0.0030360494,
"embarrassment": 0.00043324908,
"realization": 0.012291717,
"admiration": 0.0073720026,
"caring": 0.021164028,
"gratitude": 0.0029004868,
"pride": 0.0017802336,
"disgust": 0.0008796263,
"confusion": 0.0013001784,
"amusement": 0.00076951954
},
"vibeCheck": "🎭 Complex emotions"
}The differences you see are the entire point of this tutorial.
Leveling Up: The Production Tokenizer
When this system matters, you don’t maintain tokenization yourself.
Add:
<dependency>
<groupId>ai.djl.huggingface</groupId>
<artifactId>tokenizers</artifactId>
<version>0.36.0</version>
</dependency>Then create a HuggingFaceTokenizer.java:
package com.vibecheck.tokenizer;
import java.io.IOException;
import java.util.Arrays;
import com.vibecheck.tokenizer.TokenizerType.Type;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
@TokenizerType(Type.HUGGINGFACE)
public class HuggingFaceTokenizer implements Tokenizer {
private final ai.djl.huggingface.tokenizers.HuggingFaceTokenizer tokenizer;
private static final int PAD_TOKEN_ID = 1; // RoBERTa convention
public HuggingFaceTokenizer() {
try {
this.tokenizer = ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.builder()
.optTokenizerName("roberta-base")
.build();
} catch (IOException e) {
throw new RuntimeException("Failed to load tokenizer: roberta-base", e);
}
}
public static HuggingFaceTokenizer newInstance(String modelId) {
try {
ai.djl.huggingface.tokenizers.HuggingFaceTokenizer tokenizer = ai.djl.huggingface.tokenizers.HuggingFaceTokenizer
.builder()
.optTokenizerName(modelId)
.build();
return new HuggingFaceTokenizer(tokenizer);
} catch (IOException e) {
throw new RuntimeException("Failed to load tokenizer: " + modelId, e);
}
}
private HuggingFaceTokenizer(ai.djl.huggingface.tokenizers.HuggingFaceTokenizer tokenizer) {
this.tokenizer = tokenizer;
}
@Override
public long[] encode(String text, int maxLength) {
ai.djl.huggingface.tokenizers.Encoding djlEncoding = tokenizer.encode(text);
long[] ids = djlEncoding.getIds();
if (ids.length > maxLength) {
return Arrays.copyOf(ids, maxLength); // Truncate
} else if (ids.length < maxLength) {
long[] padded = Arrays.copyOf(ids, maxLength);
Arrays.fill(padded, ids.length, maxLength, PAD_TOKEN_ID); // Pad
return padded;
}
return ids;
}
public Encoding encode(String text) {
ai.djl.huggingface.tokenizers.Encoding djlEncoding = tokenizer.encode(text);
return new Encoding(djlEncoding);
}
public static class Encoding {
private final ai.djl.huggingface.tokenizers.Encoding delegate;
Encoding(ai.djl.huggingface.tokenizers.Encoding delegate) {
this.delegate = delegate;
}
public long[] getIds() {
return delegate.getIds();
}
public long[] getAttentionMask() {
return delegate.getAttentionMask();
}
}
}And switch the qualifyer in VibeService to
@TokenizerType(TokenizerType.Type.HUGGINGFACE)Nothing else changes. The system gets more truthful overnight.
Why All This Is Important
When this API returns a label and a confidence score, it’s tempting to treat the problem as solved. The model ran, the numbers look reasonable, and the system behaves predictably under load. From a purely technical perspective, that’s often where teams stop.
But the reason this work matters has very little to do with whether the code compiles or the inference latency is low.
When I think about my kids and those short, ambiguous messages that land on my phone, what’s hard isn’t the lack of information. It’s the absence of context. As a parent, I don’t react to the word “fine” in isolation. I react based on history, tone, timing, and everything I know about the person on the other end. Over time, I’ve learned when “ok” means genuinely ok and when it means “please don’t push right now.”
Software systems don’t have that luxury. All they get is the text.
That’s why details like tokenization matter so much more than they appear to at first glance. They are the difference between a system that at least approximates context and one that confidently misreads it. A bad tokenizer doesn’t crash your service. It quietly erodes trust by nudging decisions in the wrong direction, one message at a time.
In business systems, the cost of that erosion shows up slowly. A support ticket escalated too late. A frustrated customer who stops replying. A team that feels unheard because their tone never made it through the pipeline. These failures rarely trigger alerts, but they accumulate.
What we built here won’t understand people the way a parent does, and it shouldn’t pretend to. But it can give software better instincts. It can help systems pause when something feels off, route messages with more care, and surface signals that deserve human attention sooner rather than later.
That’s the real point of this exercise.
Not to replace judgment, but to support it. Not to automate empathy, but to stop throwing it away at the first parsing step.
And if this API helps a system respond a little more thoughtfully, whether to a customer, a colleague, or a teenager on the other end of a text message, then the extra care we took along the way was worth it.



Really solid breakdown here, especially the tokenization comparisions. It reminds me of a project where we spent weeks tunning the model only to discover the preprocessing was mangling multi-byte characters the whole time. You're spot on that the tokenizer is basically part of the model - people forget that the embedding space changes completely when you mess with the input encoding. The Quarkus demo makes it way easier to see why this stuff matters in production sistems.
Hi Markus, I haven't read through the complete tutorial yet but alone the intro is a great and well balanced read... Thanks again for another interesting article. How you come with all these ideas and produce these articles in the given quality in the short intervals is a mystery to me but I do enjoy them. Thank you & best regards, Michael