Loading

Nemotron 3 Ultra – Model Docs — Person.run

Back to docs Concepts reference

Nemotron 3 Ultra

A 550B parameter (55B active) open reasoning model from NVIDIA, built for long-running agent workflows. It uses a hybrid Mamba-Transformer MoE architecture and supports a 1M token context window.

Model overview

Model ID: nvidia/nemotron-3-ultra-550b-a55b
Provider: nvidia
Type: language
Context window: 1,000,000 tokens
Max output tokens: 65,000

Tags: reasoning, tool-use, implicit-caching

Model pricing

Metric	Value
`Input tokens (/1M)`	`$0.60`
`Output tokens (/1M)`	`$2.40`
`Image generation`	`n/a`
`Cached input read (/1M)`	`$0.12`
`Cached input write (/1M)`	`n/a`
`Pricing source`	`gateway`

Related docs

Model catalog

Browse every generated model page.

Open catalog

Pricing matrix

Compare model pricing across the full catalog.

Open pricing matrix