3 Comments
User's avatar
Tsvetan Tsvetkov's avatar

Great post (as always)!

I see Docling as the main topic here - great to see that document handling is now also bearable in Java.

Would've been nice to explain your model choice and the choice of the splitter (sentence), along with the token and the overlap (200 tokens, overlap of 20). Did you get the best results using this configuration? This could really make a great difference in a RAG Pipeline, and it would be nice to explain how you landed on those values.

Anyways, like I already said, great stuff, keep it coming!

Markus Eisele's avatar

Thanks for your kind words. Yeah, those blogs have a length limit and it’s easy to get lost in one or the other direction. I’m planning to explore certain pieces deeper. Hang in there.

In the meantime, maybe I can get you excited about our book ;-)

https://www.oreilly.com/library/view/applied-ai-for/9781098174491/

Unni Mana's avatar

This is the updated link for docling java

https://docling-project.github.io/docling-java/0.4.5/

The existing link is not working