Skip to Main Content
Shape the future of IBM watsonx Orchestrate

Start by searching and reviewing ideas others have posted, and add a comment (private if needed), vote, or subscribe to updates on them if they matter to you.

If you can't find what you are looking for, create a new idea:

  1. stick to one feature enhancement per idea

  2. add as much detail as possible, including use-case, examples & screenshots (put anything confidential in Hidden details field or a private comment)

  3. Explain business impact and timeline of project being affected

[For IBMers] Add customer/project name, details & timeline in Hidden details field or a private comment (only visible to you and the IBM product team).

This all helps to scope and prioritize your idea among many other good ones. Thank you for your feedback!

Specific links you will want to bookmark for future use
Learn more about IBM watsonx Orchestrate - Use this site to find out additional information and details about the product.
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Not under consideration
Created by Guest
Created on Feb 26, 2024

Caching watsonx LLM answers for better perfromance for RAG use cases

Whenever customers use Assistant for conversational search or with RAG patterns and use watsonx.ai or WA's inbuild LLM capabilities to answer the question, sometimes there are questions which are frequently asked and asked by many users.


For e.g. Question about leave entitlement in HR bot or Credit Card terms and conditions or features in Banking bot then can answers received from LLMs can be cached by the product so that every time there is no need to do a costly call to LLM and performance and response time can be improved. All the typical Cache related configuration can be done by the user (with default values provided by the product).


We see this as a need in many use cases we are working upon.

Idea priority Medium
  • Guest
    Jul 29, 2025

    This is a feature that is available in competing orchestration products. It would go a long way as a value add for token savings for the customer. I would model this after the NeuralSeek feature, where you can actually fine tune the answer by editing the response, setting thresholds for cache responses and ear-marking the responses that where answered through RAG with a canned response.

  • Admin
    SABTAIN KHAN
    Jul 29, 2025

    Not currently being pursued - we'll re-evaluate later

  • Guest
    Apr 28, 2025

    Any Updates on this idea? gretchen.tietge@ibm.com?

  • Guest
    Jan 15, 2025

    This would be a key feature going towards one of Arvind's IBM differentiators in 2025 - "affordable LLM token usage". Repeatable questions can easily be cached in Orchestrate coming though Assistant Builder. This would enable not only better performance, but lessen the usage of tokens and make applications afforgable that use LLMs with RAG. Look at competing products and products that enable this with a couple click clicks in settings, Like NeuralSeek. This would also be an area where you could edit the answer given by the LLM and provide oversight on cached RAG answers.