The 120B parameter model aims to improve compute efficiency and accuracy for complex multi-agent workloads such as software development and cybersecurity triage. Credit: Nvidia Nvidia has introduced a new reasoning-focused AI model that combines multiple neural network architectures in a bid to improve how enterprise systems handle complex tasks and automation. The company said its Nemotron 3 Super model combines Mamba sequence modeling, transformer attention, and Mixture-of-Experts routing to support so-called “agentic” AI systems that can plan and execute multi-step workflows across enterprise applications. [ Related: More Nvidia news and insights ] In a statement, Nvidia said multi-agent systems can generate up to 15 times more tokens than standard chat interactions. This can lead to “context explosion,” which may cause agents to drift from the original goal and raise costs, as large reasoning models are used for each subtask. “We are releasing Nemotron 3 Super to address these limitations,” Nvidia said. “The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging.” Nvidia said the model is released with open weights, datasets, and training recipes, allowing developers to modify it and deploy it on their own infrastructure. The release reflects a broader shift in the AI industry as vendors move beyond chatbots toward models designed to power autonomous AI agents. “Enhanced reasoning directly supports better task planning, error correction, and workflow decomposition, which collectively increase the reliability of AI agents for enterprise use,” said Jaishiv Prakash, director analyst at Gartner. “However, the success of agentic systems will not just depend on model capability but on the overall system architecture, including orchestration, data integration, context management, and governance.” Architecture for enterprise efficiency Nemotron 3 Super reflects Nvidia’s push to improve performance for enterprise AI workloads that involve sustained reasoning and long-context processing. The model’s hybrid architecture, analysts say, could help organizations run complex agent workloads more efficiently on existing infrastructure. “Nemotron 3 Super combines Mamba’s linear-time sequence processing with Transformer attention and MoE routing, delivering higher throughput, lower latency, and better memory efficiency than pure transformers for long-context and multi-step workloads,” said Charlie Dai, VP and principal analyst at Forrester. “For enterprises, this translates into lower TCO, better utilization of on-prem or sovereign GPU clusters, and faster agent execution.” Tulika Sheel, senior vice president at Kadence International, said the model’s architecture is designed to activate only a subset of parameters for each task, which helps improve efficiency. “This design significantly improves throughput and lowers compute costs while maintaining accuracy,” Sheel said. “For enterprises, that can translate into faster inference, better performance on long-context workloads, and more cost-efficient deployment of large models.” Open models reshape strategy Open reasoning models are emerging as an option for enterprises seeking greater control over how AI systems are built and deployed. Research by McKinsey & Company attributes this interest to strong performance, ease of use, and lower implementation and maintenance costs compared with proprietary alternatives. “As a result, many organizations may adopt a hybrid strategy, combining open models for internal workloads with proprietary models for external or high-performance tasks,” Sheel said. “Open reasoning models could push enterprises toward more customizable, self-hosted AI strategies rather than full reliance on proprietary platforms.” Analysts also said that the ability to fine-tune and inspect models is becoming increasingly important as enterprises expand AI into regulated sectors such as finance, healthcare, and government. “Open reasoning models give enterprises a credible alternative to proprietary foundation models by enabling fine-tuning, inspection, and on-prem deployment,” Dai said. “This supports customization for domain logic, regulatory compliance, and data residency, while reducing dependency on closed APIs and usage-based pricing.” More Nvidia news: Nvidia partners with optics technology vendors Lumentum and Coherent to enhance AI infrastructure Nvidia partners with telecom providers for open 6G networks Nvidia plans a Windows PC SoC, setting up direct competition with Qualcomm, Intel, and AMD Nvidia lines up partners to boost security for industrial operations Meta scoops up more of Nvidia’s AI chip output Reports of Nvidia/OpenAI deal in jeopardy are overblown, says Nvidia’s CEO Eying AI factories, Nvidia buys bigger stake in CoreWeave China clears Nvidia H200 sales to tech giants, reshaping AI data center plans Nvidia is still working with suppliers on RAM chips for Rubin RISC-V chip designer SiFive integrates Nvidia NVLink Fusion to power AI data centers Nvidia H200 chips in China: US says yes, China says no Lenovo-Nvidia partnership targets faster AI infrastructure rollouts Generative AIArtificial Intelligence