Start by searching and reviewing ideas others have posted, and add a comment (private if needed), vote, or subscribe to updates on them if they matter to you.
If you can't find what you are looking for, create a new idea:
stick to one feature enhancement per idea
add as much detail as possible, including use-case, examples & screenshots (put anything confidential in Hidden details field or a private comment)
Explain business impact and timeline of project being affected
[For IBMers] Add customer/project name, details & timeline in Hidden details field or a private comment (only visible to you and the IBM product team).
This all helps to scope and prioritize your idea among many other good ones. Thank you for your feedback!
Specific links you will want to bookmark for future use
Learn more about IBM watsonx Orchestrate - Use this site to find out additional information and details about the product.
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
The idea is to integrate watsonx Orchestrate as the multi-agent orchestrator and use the IBM Cloud ML training service (e.g., Watson Machine Learning / watsonx.ai training) for scalable model training, while keeping an option to run everything locally for sensitive data—cloud training is opt-in and uses secure private connectivity.
Integration overview
We propose using watsonx Orchestrate to coordinate the multi-agent pipeline (Database Analysis, Data Analyst, Dataset Generator, AutoML Trainer, Analyst/Validator, Visualization). For scalable training, the AutoML Training Agent can offload training jobs to the IBM Cloud ML training service (Watson Machine Learning / watsonx.ai training) via secure APIs. For customers with strict privacy requirements, the pipeline can run entirely on-prem or in a private VPC—cloud training is optional and activated only by client consent.
Key benefits
Scalability & performance: leverage IBM Cloud GPUs and managed training clusters for heavy DL workloads.
Governance & traceability: use IBM Cloud registries and model management for versioning, lineage and audit trails.
Security & compliance: support private endpoints/VPC, encrypted transit and at-rest storage, and IAM roles for least-privilege access.
Flexibility & privacy: default local execution for sensitive data; seamless opt-in offload to IBM Cloud when desired.
Operational flow (summary)
Orchestrator (watsonx Orchestrate) triggers agents and manages the pipeline.
AutoML Training Agent packages dataset + training config and, if selected, calls the IBM Cloud training API to submit a job.
IBM Cloud runs the job (AutoML / custom training), stores artifacts (model, metrics, logs) in the cloud registry.
Artifacts and metrics are returned to the orchestrator for the Analyst/Validator Agent to evaluate, version, and decide on deployment or re-training.
For privacy-sensitive customers, the same steps run locally (H2O AutoML / local training infra) with identical artifact/versioning formats.
Security & connectivity (high level)
Use private endpoints or VPC peering to avoid public internet exposure.
Enforce TLS, encryption at rest (AES-256), and IAM roles for API access.
Docker sandboxing for any generated code; default network egress disabled unless explicitly permitted
images
This a demo >
The inspiration for this- Harnessing gpt-oss for Superior Reasoning: We are incredibly proud that using gpt-oss as the brain reduced the agents' code self-correction cycle by 50% compared to other AI models. It demonstrated significant improvements in task execution, tool use, and understanding the agent workflow project stemmed from two core ideas: the desire to democratize deep learning and machine learning for non-experts and the need to overcome the immense barrier of data privacy that prevents many companies from adopting AI. While powerful tools like H2O AutoML exist, they often require significant technical expertise. We saw the recent advancements in AI agents, particularly frameworks like watsonx Orchestrate ADK, as the perfect opportunity to bridge this gap. The central idea was, "What if anyone could turn their data into valuable predictions simply by stating their goal in plain language?"
The "eureka" moment came with the rise of powerful open-source models like gpt-oss-120b. We realized we could build a system where the advanced reasoning of a top-tier model runs entirely locally. Companies possess incredibly sensitive data they can't send to external cloud APIs due to security risks. This project was born from the vision of bringing the AutoML process to the data, not the other way around, creating a secure, self-contained, and intelligent system that makes advanced data science accessible to everyone.
The system allows for versioning of each workout, saving the complete context of the process. This enriches future interactions, as users can refer to past workouts to refine their goals and provide more detailed instructions with less effort, creating a cycle of continuous improvement. Additionally, a SQL Agent (in beta) has been incorporated, allowing users to generate complex queries from natural language instructions. The results of these queries can be used directly as a data source for the rest of the agents, further streamlining the analysis process.
The system was successfully tested with real sales data from two Uruguayan online stores, wiki.com.uy and decotech.uy. Several deep learning and machine learning models were trained for different purposes, and the results were astounding: users with deep knowledge of their business but little experience in ML were able to train powerful models using different filters and features. It was like having a machine learning and deep learning analyst at the disposal of people who understand how their business works but lack technical expertise in AI. The system delivers trained models, predictions, and visualizations through a simple web dashboard.
What it does
This project is a fully autonomous, privacy-focused machine learning - deep learning system powered by a team of seven specialized AI agents. A user simply uploads a dataset, defines their objective in natural language (e.g., "predict customer churn based on this data"), and the system handles the entire complex ML workflow from start to finish.
The core value proposition is 100% Data Privacy. Because the entire process—from data analysis and code generation to model training—is orchestrated by a locally-run gpt-oss model, sensitive data never leaves the user's local infrastructure. This unlocks the value of "trapped" data that was previously off-limits to cloud-based AI, enabling industries like finance, healthcare, and R&D to leverage their most valuable assets securely. The system outputs trained models, predictions, and visualizations through a simple web dashboard.
How we built it
We built the system using a modular, agent-based architecture:
Local AI Brain: The intelligence of each agent is powered by a gpt-oss model (e.g., gpt-oss-120b) running on a local inference server like Ollama or vLLM.
Core ML Engine: We chose H2O AutoML for its robustness and proven ability to automatically find high-performing models.
Agent Framework: We used the watsonx Orchestrate ADK framework to create a specialized team of AI agents. Instead of one monolithic model, we designed a "team of experts," where each of the seven agents has a unique, well-defined role (e.g., DataProcessorAgent, ModelBuilderAgent).
Secure Execution: Security was a top priority. All Python code generated by the agents is executed within an isolated Docker container (a sandbox), preventing access to the host system and ensuring dependencies are managed cleanly.
Web Interface and API: To make the system user-friendly, we built a web dashboard using FastAPI for the backend and simple HTML/JS for the frontend. This allows users to easily upload files, monitor training in real-time, and view results.
Orchestration: A central "Pipeline Orchestrator" manages the entire workflow, deciding which agent to invoke at each step and passing the necessary information between them, from initial data analysis to final visualization.
Challenges we ran into
Agent Coordination: The biggest challenge was ensuring seamless communication between agents. Getting the DataProcessorAgent's analysis report to be perfectly understood by the ModelBuilderAgent to generate correct code required extensive trial and error and prompt refinement.
State Management: Docker containers are stateless. Managing the project's state (file paths, model artifacts, logs) across multiple, separate Docker executions for different pipeline stages was a significant architectural hurdle.
Handling AI Non-Determinism: LLMs can be unpredictable. An agent would sometimes generate flawed code or misinterpret a result. Building robust error-handling logic and retries, especially the feedback loop with the AnalystAgent, was crucial for making the pipeline reliable.
Local Model Optimization: A key challenge was balancing the great potential of gpt-oss with the performance limitations of running it locally. We evaluated several models; some, like gemma3 27B, qwen3 32B, and deepseek-v2 16B, often fell short of gpt-oss-120B, requiring more agent calls for corrections, losing focus, or even getting stuck for hours. On the other hand, much larger models such as deepseek-v2 236B and qwen3 235B were simply too big, consuming excessive resources and taking longer to complete tasks in a local environment, which made finding the right model a critical challenge.
Dynamic Adjustment: It was challenging to make the agents smart enough to recognize when a trained model's metrics were poor. We had to build logic that allowed them to detect this situation and decide whether to improve, adjust, or change the data and parameters to achieve better performance.
Accomplishments that we're proud of
Real-World Validation and Empowerment:We successfully tested the system with data from the online stores wiki.com.uy and decotech.uy. We demonstrated that users with business knowledge but no ML experience could, on their own, train models and achieve astounding results. We managed to put a virtual "machine learning data analyst" at their disposal.
Achieving True Data Privacy: We successfully created a powerful AutoML system where sensitive data never leaves the user's machine, addressing a major blocker for AI adoption in regulated industries.
Harnessing gpt-oss for Superior Reasoning: We are incredibly proud that using gpt-oss as the brain reduced the agents’ code self-correction cycle by 50% compared to other AI models. It demonstrated significant improvements in task execution, tool use, and understanding the agent workflow.
Building a Self-Healing System: The implementation of the AnalystAgent acts as a quality control specialist. It reviews the code and results from other agents and can send tasks back for correction, creating a robust, self-healing workflow.
The Power of Specialization: We proved that dividing a complex problem like an AutoML pipeline into smaller tasks for specialized agents is far more effective than a monolithic approach. This strategy gave the gpt-oss model reasoning "superpowers" for specific tasks.
What we learned
gpt-oss Excels in Reasoning: This project was a deep dive into practical multi-agent systems. The most critical lesson was that gpt-oss excels at reasoning tasks. It significantly sped up development and improved reliability, completing data processing, input handling, and visualization tasks more quickly and with fewer errors.
Prompt Engineering is Everything: The quality of the system's output is directly tied to the quality of the prompts given to each agent. We spent considerable time refining prompts to ensure each agent understood its role, limitations, and the exact format of its expected output.
Self-Correction Loops are a Must: We learned the importance of having a "validator" agent. The AnalystAgent acts as a quality assurance layer, creating a robust system that can catch and fix its own mistakes.
Specialization Unlocks Potential: Breaking a complex problem down for specialized agents is a highly effective design pattern. It allows the core LLM to focus its reasoning power on well-defined tasks, leading to better and more reliable results.
What's next for Multi-Agent Auto Machine Learning - Deep Learning System
Expanded ML Capabilities: We plan to extend the system's capabilities beyond tabular data to include time-series forecasting, NLP tasks, and eventually computer vision.
Enhanced User Interaction: We want to build a more interactive UI where users can collaborate with the agents, tweak parameters during the process, and perform more in-depth model comparisons.
Custom Agent Workflows: Allow users to define or customize their own agent workflows, adding or removing steps to tailor the pipeline to specific, unique business problems.
Advanced Feature Engineering: Empower the agents with more sophisticated tools for automatic feature engineering and data enrichment to further improve model performance.
Summary of an execution where the objective was to predict 30-day sales
The goal was to predict the total sales amount for the next 100 days from a CSV file. The process was fully automated by a team of artificial intelligence agents.
Main Phases
Data Analysis (Start: 3:45 PM) The system analyzed the ventas.csv file and detected that the data had a particular format:
First Training Attempt (Failed)
Correction and Second Training (Successful) Based on the error analysis, the ModelBuilderAgent corrected the script to:
Result:
The second attempt was successful. A Gradient Boosting Machine (GBM) model was trained with acceptable performance (R² = 0.63, meaning it explains approximately 63% of sales variability).
Prediction Generation Using the trained model, the PredictionAgent generated a sales forecast for the next 30 days and saved it in predictions.csv.
Results Visualization The VisualizationAgent combined the historical data and the new predictions to create a chart.
Final Conclusion The pipeline successfully completed its objective. Despite an initial training failure, the system was able to diagnose the issue, correct it automatically, and complete all phases of the process.
Final Outcome
The entire process took approximately 30 minutes.