AI agent methods at present juggle separate fashions for imaginative and prescient, speech and language — shedding time and context as they cross information from one mannequin to the opposite.
Unveiled at present, NVIDIA Nemotron 3 Nano Omni is an open multimodal mannequin that brings these capabilities collectively into one system, enabling brokers to ship quicker, smarter responses with superior reasoning throughout video, audio, picture and textual content. This best-in-class mannequin offers enterprises and builders a manufacturing path for extra environment friendly and correct multimodal AI brokers with full deployment flexibility and management.
Nemotron 3 Nano Omni units a brand new effectivity frontier for open multimodal fashions with main accuracy and low price, topping six leaderboards for complicated doc intelligence, and video and audio understanding.
At a Look
What it’s
An open, omni-modal reasoning mannequin — the highest-efficiency open multimodal mannequin of its sort with main accuracy
What it handles
Textual content, photographs, audio, video, paperwork, charts and graphical interfaces (enter); textual content (output)
Who it’s for
Enterprises and builders constructing quick and dependable, agentic methods that want a multimodal notion sub-agent
The way it works
Capabilities because the “eyes and ears” in a system of brokers, working alongside fashions like Nemotron 3 Tremendous and Extremely or different proprietary fashions
Why it issues
Main multimodal accuracy and 9x greater throughput than different open omni fashions with the identical interactivity, leading to decrease price and higher scalability with out sacrificing responsiveness.
Structure
30B-A3B hybrid MoE with Conv3D, EVS, 256K context
Availability
April twenty eighth, 2026 through Hugging Face, OpenRouter, construct.nvidia.com and 25+ companion platforms
AI and software program corporations already adopting Nemotron 3 Nano Omni embody Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Applied sciences, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the mannequin.
“To construct helpful brokers, you may’t wait seconds for a mannequin to interpret a display screen,” mentioned Gautier Cloix, CEO of H Firm. “By constructing on Nemotron 3 Nano Omni, our brokers can quickly interpret full HD display screen recordings — one thing that wasn’t sensible earlier than. This isn’t only a velocity increase: It’s a elementary shift in how our brokers understand and work together with digital environments in actual time.”
Nemotron 3 Nano Omni Allows Quicker, Leaner Multimodal Brokers
Contemplate an AI agent for buyer assist processing a display screen recording whereas analyzing uploaded name audio and checking information logs — or an agent for finance tasked with parsing PDFs, spreadsheets, charts and voice notes. Right now, most agentic methods accomplish these duties with separate fashions for imaginative and prescient, speech and language.
This method will increase latency by way of repeated inference passes, fragments context throughout modalities, and provides price and inaccuracies over time.
By combining imaginative and prescient and audio encoders inside its 30B-A3B, hybrid mixture-of-experts structure, Nemotron 3 Nano Omni eliminates the necessity for separate notion fashions, driving inference effectivity at scale. It pairs this effectivity with sturdy multimodal notion accuracy, enabling AI systems to achieve 9x higher throughput than different open omni fashions with the identical interactivity. The result’s decrease prices and higher scalability with out sacrificing responsiveness or high quality.
In agentic methods, Nemotron 3 Nano Omni can work alongside proprietary cloud fashions or different NVIDIA Nemotron open fashions — equivalent to Nemotron 3 Tremendous for high-frequency execution or Nemotron 3 Extremely for complicated planning — in addition to proprietary fashions from different suppliers, to energy sub-agents for agentic workflows equivalent to laptop use, doc intelligence and audio-video reasoning.
- Pc use brokers — Nemotron 3 Nano Omni powers the notion loop for brokers navigating graphical consumer interfaces, reasoning over onscreen content material and understanding consumer interface state over time. H Firm’s newest computer usage agent, powered by Nemotron 3 Nano Omni, makes use of a local enter decision of 1920×1080 pixels to attain high-fidelity visible reasoning. In preliminary evaluations on the OSWorld benchmark, this integration confirmed a big leap in navigating complicated graphical interfaces and used Nemotron 3 Nano Omni’s potential to course of very high-resolution photographs.
- Doc intelligence — Interprets paperwork, charts, tables, screenshots and mixed-media inputs, enabling brokers to purpose throughout visible construction and textual content content material coherently. Important for enterprise evaluation and compliance workflows.
- Audio and video understanding — For customer support, analysis and monitoring workflows, Nemotron 3 Nano Omni maintains audio-video context, tying what was mentioned, proven and documented right into a single reasoning stream as an alternative of disconnected summaries.

Open and Customizable, Deployable Anyplace
Nemotron 3 Nano Omni is launched with open weights, datasets and coaching methods — giving organizations full transparency and management over how the mannequin is custom-made and deployed.
Builders can use instruments like NVIDIA NeMo for personalization, analysis and optimization for domain-specific use instances. As a result of the Nemotron household of fashions is open, organizations can deploy them in environments that meet regulatory, sovereignty or information localization necessities.
The Nemotron 3 household — together with Nano, Tremendous and Extremely fashions — has seen over 50 million downloads previously yr. Omni extends the household’s capabilities into multimodal and agentic domains.
The mannequin is obtainable on Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice and thru a broad ecosystem of NVIDIA Cloud Partners, inference platforms and cloud service suppliers.
Its open, light-weight structure helps constant deployment from native methods like NVIDIA Jetson {hardware}, NVIDIA DGX Spark and DGX Station to information middle and cloud environments.
Go to the NVIDIA technical weblog for tutorials, cookbooks and deployment guides for Nemotron 3 Nano Omni use instances. Stay updated on agentic AI, NVIDIA Nemotron and extra by subscribing to NVIDIA news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.








