Discussion about this post

User's avatar
Emil A Lefkof III's avatar

Markus,

I was using this project as a starter but when passing a 24MB file it failed with GateWay Timeout. I worked with the Docling team and figured it out you may want to update your example based on this: https://github.com/quarkiverse/quarkus-docling/issues/103

It fixes the issues and allows you to upload larger documents!

Tsvetan Tsvetkov's avatar

Great post (as always)!

I see Docling as the main topic here - great to see that document handling is now also bearable in Java.

Would've been nice to explain your model choice and the choice of the splitter (sentence), along with the token and the overlap (200 tokens, overlap of 20). Did you get the best results using this configuration? This could really make a great difference in a RAG Pipeline, and it would be nice to explain how you landed on those values.

Anyways, like I already said, great stuff, keep it coming!

3 more comments...

No posts

Ready for more?