Kling AI Launches Video 2.6 with Native Audio Generation, Redefining the AI Video Creation Workflow

Share this news:

Beijing, China, December 4, 2025 -- Kling AI, a world-leading AI-powered creative platform, today unveiled its latest Video 2.6 model, a major upgrade featuring native audio capabilities, solidifying its leadership in the generative AI video generation sector. The new model enables the end-to-end generation of video, dialogue, sound effects, and ambient sounds in a single step, offering content creators a truly immersive audio-visual experience.

Powered by strong semantic understanding, Video 2.6 features enhanced capabilities to interpret complex inputs, including textual descriptions, spoken language, and intricate storylines across various scenarios, ensuring the generated visuals and audio precisely match the creators' intent and aligned with their needs.

Enhanced Audio Quality and Better Audio-Visual Synchronization

The Video 2.6 model supports the generation of human voices (speaking, singing, rapping) and a wide range of environmental/effect sounds (e.g., glass shattering, fire crackling, ocean waves) with cleaner quality and richer layers. Creators can specify emotions, tones, rhythm, and volume for dialogues, enabling natural-sounding speech, from a soft whisper to a dramatic scream.

The model also achieves deep alignment between visual motion and sound rhythms, making sure that the pacing of speech, ambient sounds and visual actions are tightly coordinated for a lifelike experience.

Built with native audio capability, the model also supports advanced creative scenarios such as multi-character dialogue, music performances, news broadcasting, creative commercials etc. For more complex scenarios, Video 2.6 model can seamlessly blend voiceovers, background sounds, and object effects—for example, generating a commercial with the sound of rain splashing against window, a French lady's voiceover and her natural dialogue simultaneously.

Streamlined Creative Workflow

The model now supports the generation of 5-second and 10-second videos with English and Chinese language output. Users can simply input text to generate a full video with integrated voiceovers, sound effects, and ambient sound or transform static images into rich, dynamic videos complete with corresponding audio using the model's image-to-audio-visual feature.

The model has the potential to significantly reduce production costs and boost efficiency for a wide range of creators. For ecommerce merchants, by uploading a product image and key selling points, the model can quickly churn out a product showcase and explanatory video with natural dialogue and matching ambient sounds, ideal for digital storefronts and social media campaigns.

For advertisers, the Video 2.6 model enables the rapid creation of high-quality promotional clips in a single workflow, with integrated sound effects, narration, dialogues, and product demonstrations. For content creators and influencers, the model unlocks new realms of creative possibilities by enabling them to produce a diverse range of content—from interview segments and comedy sketches to music videos—maintaining a steady drumbeat of high-quality content to drive engagement.

For more on the Kling Video 2.6 Model, please go to: https://app.klingai.com/global/release-notes/c605hp1tzd?type=dialog

Contact Info:
Name: Jack Huang
Email: Send Email
Organization: KLING AI
Website: https://klingai.com/

Release ID: 89177843

CONTACT ISSUER

Name: Jack Huang

Email: Send Email

Organization: KLING AI

Website: https://klingai.com/

SUBSCRIBE FOR MORE