Caching watsonx LLM answers for better perfromance for RAG use cases

Whenever customers use Assistant for conversational search or with RAG patterns and use watsonx.ai or WA's inbuild LLM capabilities to answer the question, sometimes there are questions which are frequently asked and asked by many users.

For e.g. Question about leave entitlement in HR bot or Credit Card terms and conditions or features in Banking bot then can answers received from LLMs can be cached by the product so that every time there is no need to do a costly call to LLM and performance and response time can be improved. All the typical Cache related configuration can be done by the user (with default values provided by the product).

We see this as a need in many use cases we are working upon.

Idea priority

Medium

Post comment

Guest

Jul 29, 2025

This is a feature that is available in competing orchestration products. It would go a long way as a value add for token savings for the customer. I would model this after the NeuralSeek feature, where you can actually fine tune the answer by editing the response, setting thresholds for cache responses and ear-marking the responses that where answered through RAG with a canned response.

Reply
Hide replies

Admin

SABTAIN KHAN

Jul 29, 2025

Not currently being pursued - we'll re-evaluate later

Reply
Hide replies

Guest

Apr 28, 2025

Any Updates on this idea? gretchen.tietge@ibm.com?

Reply
Hide replies

Guest

Jan 15, 2025

This would be a key feature going towards one of Arvind's IBM differentiators in 2025 - "affordable LLM token usage". Repeatable questions can easily be cached in Orchestrate coming though Assistant Builder. This would enable not only better performance, but lessen the usage of tokens and make applications afforgable that use LLMs with RAG. Look at competing products and products that enable this with a couple click clicks in settings, Like NeuralSeek. This would also be an area where you could edit the answer given by the LLM and provide oversight on cached RAG answers.

Reply
Hide replies

By clicking the "Post Comment" or "Add Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not include IBM confidential, company confidential, or personal information in any field.
Having problems accessing this portal? Describe the problems in an email to ideasibm@us.ibm.com.

Please enter your email address

RELATED IDEAS

Caching watsonx LLM answers for better perfromance for RAG use cases