Discussion about this post

User's avatar
Neural Foundry's avatar

This is an exceptional end-to-end tutorial. The insight about Java ONNX Runtime not supporting INT8 quantized models is something that would have cost me hours to debug without this heads up. What I find particularly valuable here is how you've structured this to keep Python strictly in the model export phase rather than anywhere near production. Most AI integration tutorials for Java developers quietly assume you'll just call Python services over HTTP, which defeats the purpose of using the JVM's operational strengths. The 20-50ms inference time you cite for CLIP embeddings is fasinating because it's actually competitive with raw network roundtrip latency to cloud APIs before even accounting for their processing time.

Expand full comment
Michael M's avatar

Hey Markus, I am following you for some time now and I very much appreciate the ideas you come up with and the level of details you explain and show how the ideas are implemented. Keep up the good work. I wish you all the best with your publication.

Best Michael

Expand full comment
1 more comment...

No posts