Technology

Nvidia unveils Nemotron 3 Ultra: America’s smartest open-weights AI model, 30% cheaper to run

Name: Nvidia unveils Nemotron 3 Ultra: America’s smartest open-weights AI model, 30% cheaper to run
Uploaded: 2026-06-01T12:17:00+05:00
Description: Nemotron 3 Ultra, the new flagship AI model features 500-550 billion parameters

Nemotron 3 Ultra, the new flagship AI model features 500-550 billion parameters

By Aqsa Qaddus Tahir

Published June 01, 2026

Nvidia unveils Nemotron 3 Ultra: America’s smartest open-weights AI model, 30% cheaper to run

Nvidia has recently unveiled a new AI model with unprecedented capabilities at Computex 2026 in Taipei, Taiwan.

Named Nemotron 3 Ultra, the new flagship AI model is packed with 500-550 billion parameters. It is mainly designed for complex planning, reasoning and agentic workflows.

Talking about its efficiency, the model delivers 5x faster inference, thereby promising a significant reduction in AI-driven cost-per-inference for enterprises. Nemotron 3 Ultra model utilizes NVFP4 training techniques and latent mixture-of-experts (MoE), thereby optimizing performance by activating only relevant parts of the network per task for better efficiency.

Being a small but a faster open model built for long running agents, the newly unveiled model tops US open-weights rankings, outperforming rivals like Gemma 4 31B.

When it comes to costs, it delivers 30 percent lower costs for complicated agentic tasks. The feature holds a significant importance as nowadays tech companies have been facing AI-driven costs surge at unprecedented levels.

It serves as the new top-tier model, joining the mid-range Super and the lightweight Nano variants. The Super model was launched in March 2026 with 120 billion parameters.

According to Artificial Analysis which partnered with Nvidia to assess the model’s capabilities, such as intelligence and speed. In terms of intelligence, Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. This score makes it America’s smartest model till date, outweighing Gemma 4 31B (39), Nemotron 3 Super (36) and gpt-oss-120b (33). But it fails to outcompete Chinese open-weights frontier Kimi K2.6, showing scores at 54.

It also serves over 300 tokens per second. Peer models in its size such as DeepSeek and Moonshot are generally served at speeds of 50-100 tokens per second in the market.

The model will give Nvidia a significant edge in the competitive industry landscape. For instance, Nvidia will deploy its tools for companies that are looking to strengthen its position through means of AI integration.

Moreover, the outstanding performance, efficiency with significantly lower costs will grab the attention of major enterprises which are looking for cost-effective agentic AI deployments.

During the conference, Huang highlighted its openness for developers building everything from search tools to molecular simulations, paired with new agent deployment tools, as Nemotron 4 looms on the horizon.

Aqsa Qaddus Tahir is a reporter dedicated to science coverage, exploring breakthroughs, emerging research, and innovation. Her work centres on making scientific developments understandable and relevant, presenting well-researched stories that connect complex ideas with everyday life in a clear, engaging, and informative manner.

Share this story:

Make us preferred on Google