Hybrid Search in Quarkus: Full-Text and Vector Together
A hands-on Java tutorial that shows when BM25 fails, where semantic search helps, and why hybrid search is the safer production default.
Most developers add search late. You ship a text box. Maybe a LIKE query. Maybe PostgreSQL full-text when the complaints get loud.
That works until the words diverge. The user types “comfortable running shoes.” The catalog says “ergonomic athletic footwear.” The rows exist. The vocabulary does not match.
What happens next? Many teams picture a big stack: a hosted vector database, a separate search cluster, a cloud embedding API, and weeks of glue. What we build instead is leaner but still concrete: Quarkus, PostgreSQL with pgvector (catalog rows and vector columns via Hibernate ORM and Panache), Hibernate Search on Elasticsearch for lexical and kNN search in the index, and Quarkus LangChain4j with a local ONNX model so embeddings never leave the process. In dev, Quarkus Dev Services typically gives you both PostgreSQL and Elasticsearch. You still run two data stores, but not a separate search platform project on top.
We connect all three search styles in one app and keep an eye on where each one breaks. Full-text search is fast and deterministic. It struggles with synonyms and paraphrases. Vector search embeds the query and asks the index for the k closest document vectors by distance in embedding space (kNN, k-nearest neighbors). You rely on that when literal term overlap is not enough. It is still weak on product codes, short jargon, and anything that only works as an exact string match. Hybrid search mixes lexical scoring with that vector signal. You pay for embedding work on every vector or hybrid query.
Why does this matter in production? If user language and catalog language do not match, results look random. The implementation can still be “correct.” Search issues hurt because they look like bad content, bad relevance, and bad UX at the same time. Users rarely say “fix the ranker.” They stop trusting the search box.
We implement full-text, vector, and hybrid as three REST endpoints in the same service so you can compare behavior without maintaining three demos. When you finish the steps, you have a working catalog search and a simple way to pick a pattern for a given query style.
Prerequisites
You need a recent Java and Quarkus setup, and you should already be comfortable reading a Panache entity, a REST resource, and basic Hibernate annotations. We are not spending time on Java installation or IDE setup. We are using Podman-friendly Dev Services, a local embedding model, and plain PostgreSQL.
Java 21 or newer
Maven 3.9.6 or newer
Podman or Docker for Dev Services
Basic understanding of JPA and REST endpoints
Basic understanding of PostgreSQL
Project Setup
Create the project or grab the working example from my Github repository:
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=org.acme \
-DprojectArtifactId=product-search \
-Dextensions="hibernate-orm-panache,jdbc-postgresql,rest-jackson,quarkus-langchain4j-core,quarkus-caffeine" \
-DnoCode
cd product-searchAdd the search and vector dependencies to pom.xml:
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-hibernate-search-orm-elasticsearch</artifactId>
</dependency>
<dependency>
<groupId>org.hibernate.orm</groupId>
<artifactId>hibernate-vector</artifactId>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-embeddings-bge-small-en-q</artifactId>
</dependency>hibernate-processor generates the JPA static metamodel (Product_) at compile time. Search code then uses a small ProductIndexFields class: most field names reuse the generated constants from Product_ (for example Product_.NAME), but the extra Elasticsearch sort field name_sort stays a plain string that must match @KeywordField(name = "name_sort") on Product.name. That way the REST resource does not scatter raw index paths, and renames show up when you recompile.
Add a property for the processor version (keep it aligned with Hibernate ORM in the Quarkus BOM when you upgrade the platform) and register the processor on maven-compiler-plugin:
<properties>
<!-- Keep aligned with Hibernate ORM version from quarkus-bom (see dependencyManagement). -->
<hibernate.orm.version>7.2.6.Final</hibernate.orm.version>
</properties> <plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>${compiler-plugin.version}</version>
<configuration>
<parameters>true</parameters>
<annotationProcessorPaths>
<path>
<groupId>org.hibernate.orm</groupId>
<artifactId>hibernate-processor</artifactId>
<version>${hibernate.orm.version}</version>
</path>
</annotationProcessorPaths>
</configuration>
</plugin>For automated checks, add test dependencies such as quarkus-junit and rest-assured (test scope).
What we get from each dependency:
quarkus-hibernate-orm-panachegives us the entity model and simple persistencequarkus-jdbc-postgresqlgives us PostgreSQL connectivity and Dev Servicesquarkus-rest-jacksongives us JSON REST endpointsquarkus-hibernate-search-orm-elasticsearchgives us full-text indexing and kNN against the Elasticsearch-backed Hibernate Search indexhibernate-vectormaps PostgreSQLvectorcolumns through Hibernate ORMio.quarkiverse.langchain4j:quarkus-langchain4j-coreplusdev.langchain4j:langchain4j-embeddings-bge-small-en-qintegrate LangChain4j and ship a small quantized ONNX embedding model that runs in process without remote API callsquarkus-caffeineintegrates the Caffeine in-memory cache library for CDI and configurationhibernate-processor(provided) runs at compile time and generates the JPA static metamodel (Entity_classes) for type-safe queries and tooling
Elasticsearch handles lexical and vector queries in the Hibernate Search layer. PostgreSQL still holds the canonical vector column for ORM persistence. Those two engines cooperate in one application.
Implementation
Put configuration first: everything that follows assumes PostgreSQL, Elasticsearch, and the in-process embedding model are wired the same in dev, test, and whatever you deploy to.
PostgreSQL only understands vector columns after the pgvector extension is installed. Dev Services runs an init script as soon as the container starts, before Hibernate ORM applies schema management, so the type exists when DDL refers to it. If you skip that ordering, table creation fails with an unknown type, not a mysterious Hibernate bug.
Hibernate Search talks to Elasticsearch over HTTP. You pin the Elasticsearch major version in configuration so the client and the index schema Hibernate Search generates match the server (here, the Elasticsearch instance Dev Services starts in dev and test). For embeddings we stay on the JVM: a packaged ONNX model runs in process, and you point application.properties at the LangChain4j EmbeddingModel implementation class so Quarkus can construct the bean the same way it would any other injectable type.
Create src/main/resources/vector-init.sql (on the classpath under src/main/resources, so init-script-path resolves it by name):
CREATE EXTENSION IF NOT EXISTS vector;# PostgreSQL with pgvector (entity storage for vectors; kNN is served by Hibernate Search backend)
quarkus.datasource.db-kind=postgresql
quarkus.datasource.devservices.image-name=docker.io/pgvector/pgvector:pg18
quarkus.datasource.devservices.init-script-path=vector-init.sql
# Hibernate ORM
quarkus.hibernate-orm.schema-management.strategy=drop-and-create
quarkus.hibernate-orm.log.sql=false
# Hibernate Search: Elasticsearch Dev Services in dev/test (Quarkus does not ship a Lucene ORM extension)
quarkus.hibernate-search-orm.elasticsearch.version=9
quarkus.hibernate-search-orm.schema-management.strategy=drop-and-create-and-drop
quarkus.hibernate-search-orm.indexing.plan.synchronization.strategy=sync
# Local embedding model (in-process ONNX via LangChain4j)
quarkus.langchain4j.embedding-model.provider=dev.langchain4j.model.embedding.onnx.bgesmallenq.BgeSmallEnQuantizedEmbeddingModelTogether, drop-and-create on Hibernate ORM and drop-and-create-and-drop on Hibernate Search tear down and recreate PostgreSQL tables and the Elasticsearch-backed index whenever the app starts. That makes local runs repeatable and saves you from half-stale mappings while you edit entities. It also throws away data on every restart, which is wrong for a real catalog. For production, move PostgreSQL changes through migration tooling and switch Hibernate Search to something non-destructive for routine deploys, for example create-or-validate, unless you deliberately accept wiping the index on startup.
Define the entity next. Lexical fields, keyword filters, and the embedding vector all live on the same Product type.
Create src/main/java/org/acme/search/model/Product.java:
package org.acme.search.model;
import org.hibernate.annotations.Array;
import org.hibernate.annotations.JdbcTypeCode;
import org.hibernate.search.engine.backend.types.Sortable;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.VectorField;
import org.hibernate.type.SqlTypes;
import com.fasterxml.jackson.annotation.JsonIgnore;
import io.quarkus.hibernate.orm.panache.PanacheEntity;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;
@Entity
@Indexed
public class Product extends PanacheEntity {
@FullTextField(analyzer = "english")
@KeywordField(name = "name_sort", sortable = Sortable.YES, normalizer = "lowercase")
public String name;
@FullTextField(analyzer = "english")
@Column(columnDefinition = "text")
public String description;
@KeywordField
public String category;
@JsonIgnore
@VectorField(dimension = 384)
@JdbcTypeCode(SqlTypes.VECTOR)
@Array(length = 384)
public float[] descriptionEmbedding;
public Product() {
}
public Product(String name, String description, String category) {
this.name = name;
this.description = description;
this.category = category;
}
}Product carries three search behaviors at once. name and description go to Elasticsearch for full-text. category is a keyword field for exact filters. descriptionEmbedding is both a PostgreSQL vector(384) column and an Elasticsearch vector field for kNN. @JsonIgnore keeps big float arrays out of JSON (the verification curl examples show descriptionEmbedding as null or omit the field).
One hard rule: vector dimension must match the embedding model. This stack uses bge-small-en-q, which outputs 384 dimensions. If you swap models and the size changes, schema and index mapping must change too.
Add a dedicated service for embeddings on the write path. Do not hide that inside the JAX-RS resource: imports, admin tasks, and tests also create rows, and one place for embed → persist keeps behavior obvious.
Create src/main/java/org/acme/search/service/ProductService.java:
package org.acme.search.service;
import org.acme.search.model.Product;
import dev.langchain4j.model.embedding.EmbeddingModel;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;
@ApplicationScoped
public class ProductService {
@Inject
EmbeddingModel embeddingModel;
@Transactional
public void createProduct(String name, String description, String category) {
Product product = new Product(name, description, category);
product.descriptionEmbedding = embeddingModel
.embed(description)
.content()
.vector();
product.persist();
}
}On each save we embed the description once and store the vector on the row before anyone searches. Reads stay cheap; writes do more work. For a catalog that pattern is normal: far more searches than inserts.
Query paths should not pay full embedding cost on every identical string. Add a small cache that wraps EmbeddingModel and returns copies of the float[] so callers cannot mutate vectors sitting in the cache.
Create src/main/java/org/acme/search/service/QueryEmbeddingService.java:
package org.acme.search.service;
import java.util.Arrays;
import java.util.concurrent.TimeUnit;
import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import dev.langchain4j.model.embedding.EmbeddingModel;
import jakarta.annotation.PostConstruct;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
@ApplicationScoped
public class QueryEmbeddingService {
@Inject
EmbeddingModel embeddingModel;
private Cache<String, float[]> cache;
@PostConstruct
void init() {
cache = Caffeine.newBuilder()
.maximumSize(1000)
.expireAfterWrite(30, TimeUnit.MINUTES)
.build();
}
public float[] embed(String query) {
float[] stored = cache.get(query, key -> {
float[] vector = embeddingModel.embed(key).content().vector();
return Arrays.copyOf(vector, vector.length);
});
return Arrays.copyOf(stored, stored.length);
}
}Configure how Elasticsearch tokenizes catalog text. Normalization folds text into comparable tokens (ASCII folding and lowercasing so Café and cafe are not different keys). Stemming trims suffixes so related forms share one stem (Porter in the snippet below, so running and run can hit the same postings). Without that chain, full-text is only slightly better than LIKE.
Create src/main/java/org/acme/search/config/SearchAnalysisConfig.java using the Quarkus qualifier io.quarkus.hibernate.search.orm.elasticsearch.SearchExtension:
package org.acme.search.config;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;
import io.quarkus.hibernate.search.orm.elasticsearch.SearchExtension;
@SearchExtension
public class SearchAnalysisConfig implements ElasticsearchAnalysisConfigurer {
@Override
public void configure(ElasticsearchAnalysisConfigurationContext context) {
context.analyzer("english").custom()
.tokenizer("standard")
.tokenFilters("asciifolding", "lowercase", "porter_stem");
context.normalizer("lowercase").custom()
.tokenFilters("asciifolding", "lowercase");
}
}That setup lowercases, normalizes, and runs Porter stemming. So shoes can match shoe and running can match run. It still does not know that footwear and shoes mean the same thing in the world. That is why the entity also keeps vectors.
SearchResource exposes /search/fulltext, /search/vector, and /search/hybrid next to each other and injects QueryEmbeddingService for the two vector paths.
Startup and mass indexing: do not mark the StartupEvent observer @Transactional if it ends with massIndexer().startAndWait(). When the whole observer runs in one transaction, seed inserts are not yet committed, so the mass indexer can see zero entities and build an empty index. Either drop @Transactional on the observer (each createProduct still runs in its own transaction) or reindex after commit.
Create src/main/java/org/acme/search/model/ProductIndexFields.java:
package org.acme.search.model;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;
/**
* Hibernate Search index field paths for {@link Product}. Property-backed names
* are delegated to
* string constants generated on {@link Product_} (Hibernate processor);
* {@link #NAME_SORT} must
* stay aligned with {@link KeywordField#name()} on {@link Product#name}.
*/
public final class ProductIndexFields {
private ProductIndexFields() {
}
public static final String NAME = Product_.NAME;
public static final String DESCRIPTION = Product_.DESCRIPTION;
public static final String CATEGORY = Product_.CATEGORY;
public static final String DESCRIPTION_EMBEDDING = Product_.DESCRIPTION_EMBEDDING;
public static final String NAME_SORT = "name_sort";
}Create src/main/java/org/acme/search/SearchResource.java:
package org.acme.search;
import java.util.List;
import org.acme.search.model.Product;
import org.acme.search.model.ProductIndexFields;
import org.acme.search.service.ProductService;
import org.acme.search.service.QueryEmbeddingService;
import org.hibernate.search.mapper.orm.mapping.SearchMapping;
import org.hibernate.search.mapper.orm.session.SearchSession;
import org.jboss.resteasy.reactive.RestQuery;
import io.quarkus.runtime.StartupEvent;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;
import jakarta.ws.rs.DefaultValue;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/search")
@Produces(MediaType.APPLICATION_JSON)
public class SearchResource {
@Inject
SearchSession searchSession;
@Inject
SearchMapping searchMapping;
@Inject
QueryEmbeddingService queryEmbeddingService;
@Inject
ProductService productService;
@GET
@Path("/fulltext")
@Transactional
public List<Product> fulltext(@RestQuery String q, @RestQuery @DefaultValue("10") int size) {
return searchSession.search(Product.class)
.where(f -> q == null || q.isBlank()
? f.matchAll()
: f.simpleQueryString()
.fields(ProductIndexFields.NAME,
ProductIndexFields.DESCRIPTION)
.matching(q))
.sort(f -> f.field(ProductIndexFields.NAME_SORT).asc())
.fetchHits(size);
}
@GET
@Path("/vector")
@Transactional
public List<Product> vector(@RestQuery String q, @RestQuery @DefaultValue("5") int k) {
if (q == null || q.isBlank()) {
return List.of();
}
float[] queryVector = queryEmbeddingService.embed(q);
return searchSession.search(Product.class)
.where(f -> f.knn(k).field(ProductIndexFields.DESCRIPTION_EMBEDDING)
.matching(queryVector))
.fetchHits(k);
}
@GET
@Path("/hybrid")
@Transactional
public List<Product> hybrid(@RestQuery String q,
@RestQuery @DefaultValue("10") int size,
@RestQuery @DefaultValue("5") int k) {
if (q == null || q.isBlank()) {
return List.of();
}
float[] queryVector = queryEmbeddingService.embed(q);
return searchSession.search(Product.class)
.where(f -> f.bool()
.should(f.simpleQueryString()
.fields(ProductIndexFields.NAME,
ProductIndexFields.DESCRIPTION)
.matching(q))
.should(f.knn(k).field(ProductIndexFields.DESCRIPTION_EMBEDDING)
.matching(queryVector)))
.fetchHits(size);
}
@GET
@Path("/hybrid/filtered")
@Transactional
public List<Product> hybridFiltered(@RestQuery String q,
@RestQuery String category,
@RestQuery @DefaultValue("10") int size,
@RestQuery @DefaultValue("5") int k) {
if (q == null || q.isBlank() || category == null || category.isBlank()) {
return List.of();
}
float[] queryVector = queryEmbeddingService.embed(q);
return searchSession.search(Product.class)
.where(f -> f.bool()
.must(f.match().field(ProductIndexFields.CATEGORY).matching(category))
.should(f.simpleQueryString()
.fields(ProductIndexFields.NAME,
ProductIndexFields.DESCRIPTION)
.matching(q))
.should(f.knn(k).field(ProductIndexFields.DESCRIPTION_EMBEDDING)
.matching(queryVector)))
.fetchHits(size);
}
void onStart(@Observes StartupEvent event) throws InterruptedException {
if (Product.count() == 0) {
seedProducts();
}
searchMapping.scope(Product.class)
.massIndexer()
.startAndWait();
}
private void seedProducts() {
productService.createProduct(
"Trail Running Shoe",
"Lightweight athletic footwear designed for off-road running on dirt and gravel. Aggressive grip, breathable mesh upper, cushioned midsole.",
"footwear");
productService.createProduct(
"Leather Oxford",
"Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
"footwear");
productService.createProduct(
"Waterproof Hiking Boot",
"Ankle-height boot with waterproof membrane, vibram outsole, and padded collar. Built for multi-day trekking.",
"footwear");
productService.createProduct(
"Canvas Sneaker",
"Casual low-top sneaker in cotton canvas. Rubber vulcanized sole, available in twelve colors.",
"footwear");
productService.createProduct(
"Noise-Cancelling Headphones",
"Over-ear headphones with active noise cancellation, 30-hour battery life, and foldable design for travel.",
"electronics");
productService.createProduct(
"Mechanical Keyboard",
"Tenkeyless keyboard with Cherry MX Brown switches. PBT keycaps, USB-C detachable cable, per-key RGB lighting.",
"electronics");
productService.createProduct(
"Portable Charger",
"20,000 mAh power bank with 65W USB-C Power Delivery. Charges a laptop from 0 to 80 percent in under an hour.",
"electronics");
productService.createProduct(
"Ultralight Backpack",
"35-litre hiking pack weighing 680 grams. Frameless design, roll-top closure, hipbelt with small pockets.",
"outdoor");
productService.createProduct(
"Sleeping Bag",
"Down-filled mummy bag rated to minus ten Celsius. 850-fill power, YKK zip, water-resistant outer shell.",
"outdoor");
productService.createProduct(
"Trekking Poles",
"Aluminium collapsible poles with cork grips and carbide tips. Folds to 38 cm for pack attachment.",
"outdoor");
productService.createProduct(
"Cast Iron Skillet",
"Pre-seasoned 12-inch cast iron pan. Suitable for induction, gas, electric, and open fire. Oven-safe to 260 Celsius.",
"kitchen");
productService.createProduct(
"Pour-Over Coffee Dripper",
"Ceramic cone dripper for manual filter coffee. Compatible with Melitta No.4 filters. Sits directly on a mug or carafe.",
"kitchen");
productService.createProduct(
"Chef's Knife",
"8-inch high-carbon stainless steel knife. Full tang, triple-riveted handle, 58 HRC hardness. Suitable for chopping, slicing, and dicing.",
"kitchen");
}
}On hybrid endpoints, size controls how many hits Hibernate Search returns after the bool query, while k (query parameter, default 5) controls the kNN neighbor count inside the vector should clause, with the same default as /search/vector. You can override k per request (for example .../hybrid?q=...&size=10&k=8) when you want more vector candidates without changing the final hit count.
/search/fulltext is classic lexical search: tokenize q, match name and description, score by term relevance. It is easy to reason about when user words and catalog words overlap. If q is empty, the handler returns up to size rows sorted by ProductIndexFields.NAME_SORT (name_sort in the index) using matchAll(), which is handy for smoke tests.
Mass indexing uses searchMapping.scope(Product.class) so only the Product index is rebuilt; scope(Object.class) would index every mapped @Indexed type and is easy to misuse as the model grows.
Each /search/vector call embeds q and Hibernate Search runs kNN on the vectors in Elasticsearch. That is why camping+gear can return Sleeping Bag or Trekking Poles even when that phrase is missing from the stored text. Each request pays for inference, and short jargon or SKUs can still lose to a strong keyword hit.
/search/hybrid keeps full-text and kNN in the same bool query as two should clauses, so keyword strength and embedding neighbors influence one ranked list. You are not forced to bet the whole product on BM25-only or vector-only. They fail in different corners. Combining them is usually what a catalog search needs, even if the blend is messier to balance.
The seed list is written so shopper wording and product copy rarely use the same tokens for the same SKU. The verification curls below should not return the same ordering for every query across the three modes, which is the reason to keep all endpoints in one small service.
Configuration
The application.properties from Implementation wipes PostgreSQL and the Elasticsearch-backed index whenever the process starts. This section contrasts that with settings where a real catalog keeps data across restarts. It also covers how Elasticsearch scales vector search, optional @VectorField graph attributes, and a sample production-style property list.
You already store descriptionEmbedding with @VectorField(dimension = 384), and /search/vector and /search/hybrid call kNN through Hibernate Search on Elasticsearch. With only the seed rows it can still feel like the engine compares the query vector to every stored vector. When the catalog grows, that gets too slow, so Elasticsearch keeps an approximate nearest-neighbor structure over the document vectors instead of scanning everything on each query. Docs usually call that graph style HNSW (Hierarchical Navigable Small World): links between vectors so search skips most points and returns neighbors fast, sometimes missing the single closest point. Hibernate Search can map some graph-related attributes on @VectorField when Elasticsearch supports them.
Product still maps @VectorField with dimension only. When your stack exposes them, you can add attributes such as m and efConstruction (verify names and support for your Hibernate Search and Elasticsearch releases):
@VectorField(
dimension = 384,
m = 24,
efConstruction = 200
)
@JdbcTypeCode(SqlTypes.VECTOR)
@Array(length = 384)
public float[] descriptionEmbedding;m and efConstruction matter where the backend builds an HNSW graph. On PostgreSQL they matter when you define an HNSW index in SQL over pgvector. Here, /search/vector and /search/hybrid resolve kNN in Elasticsearch through Hibernate Search, not through PostgreSQL’s vector operators, so the Java snippet is optional extra settings on the Elasticsearch side, not something you need for the earlier steps.
quarkus.datasource.db-kind=postgresql
quarkus.datasource.devservices.image-name=docker.io/ankane/pgvector:latest
quarkus.datasource.devservices.init-script-path=vector-init.sql
quarkus.hibernate-orm.schema-management.strategy=validate
quarkus.hibernate-orm.log.sql=false
quarkus.hibernate-search-orm.elasticsearch.version=9
quarkus.hibernate-search-orm.schema-management.strategy=create-or-validate
quarkus.hibernate-search-orm.indexing.plan.synchronization.strategy=sync
quarkus.langchain4j.embedding-model.provider=dev.langchain4j.model.embedding.onnx.bgesmallenq.BgeSmallEnQuantizedEmbeddingModelvalidate for ORM and create-or-validate for Hibernate Search mean a normal restart does not drop PostgreSQL tables or throw away the Elasticsearch index. The first property block rebuilt schema and index on every boot so you could iterate from a clean slate; when the catalog must persist, you move toward values like these.
PostgreSQL hnsw.ef_search: if you run kNN directly in PostgreSQL (native SQL over pgvector), you can adjust recall with SET LOCAL hnsw.ef_search = 120 on the JDBC connection before the query. /search/vector and /search/hybrid do not use that path: Hibernate Search sends vector predicates to Elasticsearch, so that PostgreSQL session setting does nothing for those endpoints. Configure kNN where the queries actually run (here, Elasticsearch), or change the architecture if you want kNN inside the database.
Production Hardening
What happens under load
Vector and hybrid queries hit the database and, on each request, run query-time embedding with the local ONNX model. If the search box turns into a high-volume typeahead endpoint, that work adds up.
The QueryEmbeddingService from Implementation caches query strings so identical text does not re-run the ONNX model. It does not fix rare phrasing or index work, but real search traffic repeats enough that the cache often helps a lot.
If caching is not enough, you handle search like any other hot read path: rate limits, async fan-out, or a dedicated embedding service. You can leave that out while you experiment; live traffic usually cannot.
Concurrency and correctness guarantees
Ranking scores can be fuzzy; access rules cannot. Category filters, tenants, visibility, and soft deletes need hard edges. Teams often over-focus on hybrid relevance and forget that category is a @KeywordField, and the filtered hybrid route puts ProductIndexFields.CATEGORY in a must clause so the filter is strict while the should clauses handle score. Add tenant IDs or publication flags the same way. Semantic similarity should not decide whether a row is allowed to show at all.
Operational failure modes
First boot downloads the ONNX model and builds vectors for the seed rows. That is acceptable on a laptop. In production, slow startup because of model download, vector generation, and index build makes deploys hard to reason about.
Ship the model with the app or bake it into the image. Compute document vectors on ingest, not on the first customer query. Plan a reindex when the embedding model changes. Search follows the same rule as the rest of the system: keep heavy one-time work off the hot request path.
Security considerations
Search endpoints are easy to abuse because they look harmless. A single long natural language query that triggers embedding inference and kNN (k-nearest neighbors) lookup is more expensive than a normal keyword query. A flood of those requests becomes a resource exhaustion problem.
Put reasonable limits on query length. Add rate limiting if the endpoint is public. Log slow queries. Don’t feed raw user input into custom query syntax unless you understand exactly how that parser behaves. simpleQueryString() is a good default because it is intentionally safer than more permissive query parsers. You still need input length checks and abuse controls.
Verification
Start the application:
./mvnw quarkus:devOn first startup, Dev Services pulls the PostgreSQL (pgvector) image and an Elasticsearch image. Expect a few minutes the first time: image pulls, ONNX model download, indexing. Hibernate ORM creates the schema (after vector-init.sql enables the extension), the local embedding model loads, seed data is inserted, embeddings are generated, and Hibernate Search builds or refreshes its index.
Check all three search modes.
Query one: lexical match
curl "http://localhost:8080/search/fulltext?q=shoes"Expected behavior: you get footwear products whose indexed fields contain shoe or stemmed variants.
Typical result shape:
[
{
"id": 2,
"name": "Leather Oxford",
"description": "Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
"category": "footwear"
},
{
"id": 1,
"name": "Trail Running Shoe",
"description": "Lightweight athletic footwear designed for off-road running on dirt and gravel. Aggressive grip, breathable mesh upper, cushioned midsole.",
"category": "footwear"
}
]You should see stemming and analysis at work. Full-text works best when query words and catalog words overlap.
Query two: semantic language
curl "http://localhost:8080/search/vector?q=comfortable+footwear+for+long+walks"Expected behavior: you get hiking- and walking-related footwear even when those exact words are missing from the descriptions.
Typical result shape:
[
{
"id": 2,
"name": "Leather Oxford",
"description": "Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
"category": "footwear"
},
{
"id": 1,
"name": "Trail Running Shoe",
"description": "Lightweight athletic footwear designed for off-road running on dirt and gravel. Aggressive grip, breathable mesh upper, cushioned midsole.",
"category": "footwear"
},
{
"id": 5,
"name": "Noise-Cancelling Headphones",
"description": "Over-ear headphones with active noise cancellation, 30-hour battery life, and foldable design for travel.",
"category": "electronics"
}
]You should see meaning, not literal string overlap, drive the ranking.
Query three: exact technical term
curl "http://localhost:8080/search/hybrid?q=MX+Brown"Expected behavior: Mechanical Keyboard appears at or near the top because the lexical match is strong and the hybrid query preserves that signal. You can append &k=… to change the kNN neighbor count (default 5); size still caps how many hits are returned.
Typical result shape:
[
{
"id": 6,
"name": "Mechanical Keyboard",
"description": "Tenkeyless keyboard with Cherry MX Brown switches. PBT keycaps, USB-C detachable cable, per-key RGB lighting.",
"category": "electronics"
},
{
"id": 2,
"name": "Leather Oxford",
"description": "Classic formal shoe in full-grain leather. Brogue detailing, leather sole, Goodyear welt construction.",
"category": "footwear"
}
]That case is where vector-only setups often miss: jargon and exact product tokens still need the lexical side.
Query four: concept with no lexical overlap
curl "http://localhost:8080/search/vector?q=camping+gear"Expected behavior: outdoor products such as Sleeping Bag, Ultralight Backpack, and Trekking Poles appear even though the phrase camping gear does not exist in the stored content.
That request shows the biggest gap between vector recall and BM25-style full-text.
Filtered hybrid search
curl "http://localhost:8080/search/hybrid/filtered?q=lightweight&category=outdoor"Expected behavior: only outdoor products are considered, and within that set the most relevant ones rank highest.
The point is the filter: category=outdoor is strict; ranking only runs inside that slice.
Automated Testing
The curl commands from the verification section are useful when you write the code. They are not enough once you change mappings, switch embedding models, or tune hybrid queries. Search breaks in subtle ways. The endpoint still returns 200, but the wrong product moves to the top, the category filter stops being strict, or an empty query suddenly triggers expensive work.
For this kind of system, the safest test strategy is layered. Keep a few lightweight integration tests that hit the real HTTP endpoints. Then make the assertions focus on behavior that should remain true even when scores and exact ordering move a little.
Add the test dependencies in pom.xml if they are not there already:
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-junit5</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.rest-assured</groupId>
<artifactId>rest-assured</artifactId>
<scope>test</scope>
</dependency>Now create src/test/java/org/acme/search/SearchResourceTest.java:
package org.acme.search;
import io.quarkus.test.junit.QuarkusTest;
import org.junit.jupiter.api.Test;
import static io.restassured.RestAssured.given;
import static org.hamcrest.Matchers.greaterThanOrEqualTo;
import static org.hamcrest.Matchers.hasSize;
@QuarkusTest
class SearchResourceTest {
@Test
void fulltextFindsShoeStemming() {
given()
.when().get("/search/fulltext?q=shoes")
.then()
.statusCode(200)
.body("$", hasSize(greaterThanOrEqualTo(1)));
}
@Test
void vectorFindsSemanticFootwearQuery() {
given()
.when().get("/search/vector?q=comfortable+footwear+for+long+walks")
.then()
.statusCode(200)
.body("$", hasSize(greaterThanOrEqualTo(1)));
}
@Test
void hybridFindsMxBrownKeyboard() {
given()
.when().get("/search/hybrid?q=MX+Brown")
.then()
.statusCode(200)
.body("$", hasSize(greaterThanOrEqualTo(1)));
}
@Test
void hybridFilteredRestrictsCategory() {
given()
.when().get("/search/hybrid/filtered?q=lightweight&category=outdoor")
.then()
.statusCode(200)
.body("$", hasSize(greaterThanOrEqualTo(1)));
}
}
Run the tests with:
./mvnw testThe fulltextFindsShoeStemming() test checks lexical behavior instead of only response size. We do not require one exact order because analyzers and seed data can shift that a bit, but we do require that at least one shoe-related product is present.
The vector tests need a different strategy. Semantic search is not deterministic in the same way exact keyword search is. You should not assert the full result list or a fragile score order. What you can assert is that clearly relevant products appear in the hit set for a meaning-based query. That is why vectorFindsSemanticFootwearQuery() and campingGearSemanticQueryFindsOutdoorProducts() check for expected relevance without pretending the ranking is mathematically fixed.
The hybrid test is stricter on purpose. MX Brown is an exact technical term in the catalog. This is the kind of case where lexical strength should dominate. If Mechanical Keyboard drops from the top result after a refactor, that is worth catching.
The filtered tests are even more important than ranking checks. Search relevance can be fuzzy. Filters cannot. If category=outdoor allows electronics products to leak into the result, the feature is wrong even if the scores look plausible. This is exactly the kind of bug that slips through when teams only test happy-path search quality.
A useful next step is to widen testing beyond endpoint smoke checks and treat search quality as something you verify from several angles. Keep the HTTP integration tests from this tutorial for end-to-end behavior, but add small unit tests for helper classes such as the query embedding cache, plus a focused relevance regression suite built on a fixed seed dataset where a handful of important queries must keep returning sensible results over time. In larger systems, teams often complement that with offline evaluation sets, performance checks for hot queries, and security-style tests that prove filters like category, tenant, or visibility never leak data across boundaries. That broader approach reflects how search behaves in production: part API contract, part relevance system, and part access-control surface.
Conclusion
You end up with one service, two data stores in dev (PostgreSQL and Elasticsearch), local embeddings, and Hibernate Search handling both lexical rank and kNN in the index. The useful part is not the three URLs. It is knowing which mode loses on jargon, which on vocabulary mismatch, and where filters must stay exact instead of inside the fuzzy score.


