Nvidia launches Nemotron 3 Super to power enterprise AI agents

news

Mar 12, 20265 mins

The 120B parameter model aims to improve compute efficiency and accuracy for complex multi-agent workloads such as software development and cybersecurity triage.

Nvidia has introduced a new reasoning-focused AI model that combines multiple neural network architectures in a bid to improve how enterprise systems handle complex tasks and automation.

The company said its Nemotron 3 Super model combines Mamba sequence modeling, transformer attention, and Mixture-of-Experts routing to support so-called “agentic” AI systems that can plan and execute multi-step workflows across enterprise applications.

[ Related: More Nvidia news and insights ]

In a statement, Nvidia said multi-agent systems can generate up to 15 times more tokens than standard chat interactions. This can lead to “context explosion,” which may cause agents to drift from the original goal and raise costs, as large reasoning models are used for each subtask.

“We are releasing Nemotron 3 Super to address these limitations,” Nvidia said. “The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging.”

Nvidia said the model is released with open weights, datasets, and training recipes, allowing developers to modify it and deploy it on their own infrastructure.

The release reflects a broader shift in the AI industry as vendors move beyond chatbots toward models designed to power autonomous AI agents.

“Enhanced reasoning directly supports better task planning, error correction, and workflow decomposition, which collectively increase the reliability of AI agents for enterprise use,” said Jaishiv Prakash, director analyst at Gartner. “However, the success of agentic systems will not just depend on model capability but on the overall system architecture, including orchestration, data integration, context management, and governance.”

Architecture for enterprise efficiency

Nemotron 3 Super reflects Nvidia’s push to improve performance for enterprise AI workloads that involve sustained reasoning and long-context processing. The model’s hybrid architecture, analysts say, could help organizations run complex agent workloads more efficiently on existing infrastructure.

“Nemotron 3 Super combines Mamba’s linear-time sequence processing with Transformer attention and MoE routing, delivering higher throughput, lower latency, and better memory efficiency than pure transformers for long-context and multi-step workloads,” said Charlie Dai, VP and principal analyst at Forrester. “For enterprises, this translates into lower TCO, better utilization of on-prem or sovereign GPU clusters, and faster agent execution.”

Tulika Sheel, senior vice president at Kadence International, said the model’s architecture is designed to activate only a subset of parameters for each task, which helps improve efficiency.

“This design significantly improves throughput and lowers compute costs while maintaining accuracy,” Sheel said. “For enterprises, that can translate into faster inference, better performance on long-context workloads, and more cost-efficient deployment of large models.”

Open models reshape strategy

Open reasoning models are emerging as an option for enterprises seeking greater control over how AI systems are built and deployed. Research by McKinsey & Company attributes this interest to strong performance, ease of use, and lower implementation and maintenance costs compared with proprietary alternatives.

“As a result, many organizations may adopt a hybrid strategy, combining open models for internal workloads with proprietary models for external or high-performance tasks,” Sheel said. “Open reasoning models could push enterprises toward more customizable, self-hosted AI strategies rather than full reliance on proprietary platforms.”

Analysts also said that the ability to fine-tune and inspect models is becoming increasingly important as enterprises expand AI into regulated sectors such as finance, healthcare, and government.

“Open reasoning models give enterprises a credible alternative to proprietary foundation models by enabling fine-tuning, inspection, and on-prem deployment,” Dai said. “This supports customization for domain logic, regulatory compliance, and data residency, while reducing dependency on closed APIs and usage-based pricing.”

More Nvidia news:

Generative AIArtificial Intelligence

by Prasanth Aby Thomas

Prasanth Aby Thomas is a freelance technology journalist who specializes in semiconductors, security, AI, and EVs. His work has appeared in DigiTimes Asia and asmag.com, among other publications.

Earlier in his career, Prasanth was a correspondent for Reuters covering the energy sector. Prior to that, he was a correspondent for International Business Times UK covering Asian and European markets and macroeconomic developments.

He holds a Master's degree in international journalism from Bournemouth University, a Master's degree in visual communication from Loyola College, a Bachelor's degree in English from Mahatma Gandhi University, and studied Chinese language at National Taiwan University.

Show me more

Topics

About

Policies

Our Network

More

Nvidia launches Nemotron 3 Super to power enterprise AI agents

The 120B parameter model aims to improve compute efficiency and accuracy for complex multi-agent workloads such as software development and cybersecurity triage.

[ Related: More Nvidia news and insights ]

Architecture for enterprise efficiency

Open models reshape strategy

More Nvidia news:

More from this author

OpenAI developing GitHub rival as AI coding platform race intensifies

Multi-token prediction technique triples LLM inference speed without auxiliary draft models

Mistral AI deepens compute ambitions with Koyeb acquisition

Alibaba’s Qwen3.5 targets enterprise agent workflows with expanded multimodal support

Researchers propose a self-distillation fix for ‘catastrophic forgetting’ in LLMs

OpenAI launches Codex app as enterprises weigh autonomous AI coding tools

Alibaba’s Qwen3-Max-Thinking expands enterprise AI model choices

Microsoft’s Fara-7B brings AI agents to the PC with on-device automation

Show me more

AI optimization: How we cut energy costs in social media recommendation systems

Cloud at 20: Cost, complexity, and control

Google adds vibe design to Stitch UI design tool

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)