3 Comments
User's avatar
Neural Foundry's avatar

The dynamic proxy approach is particularly elegant here. Using InvocationHandler to intercept AI service calls without modifying the interface keeps everything clean and maintanble. One thing I'd be curious about is how the system handles concurrent requests with the same prompt do you see any benefit from a short-lived in-memory cache layer before hitting pgvector?

Expand full comment
Markus Eisele's avatar

I mean, sure. Memory is just a lot faster.

With regards to concurrent requests, I havent had a lot of time to load test it. Would indeed be interesting.

Expand full comment
Rakesh's avatar

How do we externalize the system message? In a way it helps to make it configurable without changes to code and deployment. I tried to intercept the requests to RegisterAiService similar to the one done here but unfortunately langchain does not allow modification to Chat message request. Is System message provider a viable option? Appreciate if you could share your insight

Expand full comment