Kling v3.0 Text-to-Video

Build upon an All-in-One product framework, the Kling 3.0 model series supports full multimodal input and output spanning text, images, audio, and video, bringing the understanding, generation, and editing of video together in one streamlined AI workflow. The models integrate multiple tasks, including text-to-video, image-to-video, reference-to-video, and in-video editing, into a single, native multimodal architecture, enabling the models to follow complex narrative logic, deliver precise shot control, and maintain strong prompt adherence.

Model overview

Model ID: klingai/kling-v3.0-t2v
Provider: klingai
Type: video
Context window: n/a tokens
Max output tokens: n/a

Model pricing

Metric	Value
`Input tokens (/1M)`	`n/a`
`Output tokens (/1M)`	`n/a`
`Image generation`	`n/a`
`Cached input read (/1M)`	`n/a`
`Cached input write (/1M)`	`n/a`
`Pricing source`	`estimate`

Related docs

Model catalog

Browse every generated model page.

Open catalog

Pricing matrix

Compare model pricing across the full catalog.

Open pricing matrix

Back to docs Concepts reference