All articles
Seedance 2.0AI AvatarsTalking Heads

AI Avatars That Actually Talk: Lip-Synced Talking Heads with Seedance 2.0

June 2, 2026·7 min read·Prospolabs
AI Avatars That Actually Talk: Lip-Synced Talking Heads with Seedance 2.0

An AI avatar talking-head video means a generated presenter who speaks your script to camera with matching lip movement and audio. With Seedance 2.0 on Prospolabs, all three — the spoken voice, the synced mouth shapes and the video — come out of a single generation, because audio and lip sync are native to the model, not a second tool you stitch on afterward.

One generation, fully synced speech and lip movement — the audio is baked in, not bolted on.

Why native audio changes the talking-head workflow

The usual avatar pipeline is two tools. You generate a silent clip with one model, then push it through a separate lip-sync service to graft a voice track onto the mouth — two renders, two bills, two places for the sync to drift. When viewers judge a talking head, they judge it frame by frame on whether the lips match the words, so any seam in that pipeline shows.

Seedance 2.0 collapses that into one pass. Audio is always on: you describe the presenter and what they say, and the model produces the voice, the timing and the mouth shapes together. There is no second render, no re-upload, no separate lip-sync credit to buy — the speech and the lip movement are generated as one coherent result.

Close-up of an AI presenter speaking to camera in an office setting
Native audio means the mouth shapes match the words — no separate lip-sync pass required.

How to make a talking-head avatar on Prospolabs

You can build a presenter from a text prompt alone, or pin a specific face and voice using reference mode. Here is the full flow.

1. Top up and pick your model

Add credit to your Prospolabs balance — you can start from $5 — and choose your model. Use model id "seedance-2" for the highest-fidelity presenter, or "seedance-2-fast" for quick script takes you want to iterate on cheaply. The price is identical whether you generate from the UI or the API.

2. Write the presenter and the script

Describe the person, the setting and the line of dialogue in one prompt: who they are, how they are framed, the lighting, and exactly what they say. Because audio is native, the words you put in the prompt become the spoken track and the lip movement at the same time. Clips run 4 to 15 seconds at up to 1080p.

3. Pin a face and voice with reference mode (optional)

To keep one consistent spokesperson across many videos, use reference-to-video. Seedance 2.0 accepts up to 9 images, 3 videos and 3 audio clips, cited inline as @Image1, @Video1 and @Audio1. Cite a portrait at @Image1 to lock the face and a voice sample at @Audio1 to lock the delivery, and your presenter stays recognizable from clip to clip — ideal for a recurring brand host.

4. Generate, then retrieve the output

From the API you send a POST to /v1/generate; jobs run async, and the finished clip is returned at an output_url that stays valid for 7 days, so download or pipe it into your editor before then. If a run fails it is automatically refunded — you are never charged for a generation that does not deliver.

Audio and lip sync are included in the per-second price. There is no separate charge for the voice track and no second lip-sync step to pay for — one generation, one bill.

What a talking-head clip actually costs

Prospolabs charges per second of finished video in USD, with audio included. Seedance 2.0 runs $0.09/sec at 480p (retail $0.15), $0.18/sec at 720p (retail $0.30) and $0.41/sec at 1080p (retail $0.683). Seedance 2.0 Fast runs $0.07/sec at 480p (retail $0.117) and $0.15/sec at 720p (retail $0.25).

In practice, a 10-second 720p presenter clip on Seedance 2.0 is $1.80, including the synced voice. The same length on Seedance 2.0 Fast is $1.50. A polished 8-second 1080p spokesperson take is $3.28. Because there is no per-seat subscription, ten script variations cost ten generations — nothing more.

Where talking-head avatars earn their keep

  • Explainers and onboarding: a presenter walks new users through a feature without a studio booking or a reshoot when the copy changes.
  • Ads and landing-page hooks: a spokesperson delivers the offer to camera, and you generate a dozen scripted variants to test which line lands.
  • Localization: re-run the same prompt with the script in another language to publish a presenter for each market from one workflow.
Two AI presenters shown side by side, both speaking
Spin up multiple on-brand presenters for different audiences or languages from the same workflow.

Tips for clean lip sync

  • Match script length to clip length — a 5-second clip holds roughly one short sentence, so trim copy that runs long rather than rushing the delivery.
  • Specify the framing. Broadcast head-and-shoulders framing gives the model a clear, stable mouth to animate and reads cleaner than a wide shot.
  • Draft on Seedance 2.0 Fast at 480p to nail the script and pacing, then render the keeper at 1080p on Seedance 2.0.
  • Pin the voice with @Audio1 in reference mode when you need the same presenter to sound identical across a series.
  • Describe the tone — calm, energetic, warm — so the generated delivery matches the message, not just the words.

Ready to make one? Open Seedance 2.0 with native audio, check the transparent per-second pricing, or read the model docs for the full reference-mode syntax.

Frequently asked questions

  • No. Seedance 2.0 has native, always-on audio, so the spoken voice and the matching mouth movement are generated together in a single pass on Prospolabs. There is no second render or separate lip-sync service to add.

  • You pay per second in USD with audio included. Seedance 2.0 is $0.18/sec at 720p, so a 10-second presenter clip is $1.80. Seedance 2.0 Fast is $0.15/sec at 720p. A 1080p clip on Seedance 2.0 is $0.41/sec.

  • Yes. Use reference-to-video mode and cite a portrait at @Image1 to lock the face and a voice sample at @Audio1 to lock the delivery. Seedance 2.0 accepts up to 9 images, 3 videos and 3 audio clips per generation.

  • Seedance 2.0 generates clips from 4 to 15 seconds at up to 1080p, with audio and lip sync included in every generation.

  • Yes. Re-run the same prompt with the script written in another language. Because audio is native, the spoken track and lip movement regenerate together, letting you localize a presenter from one workflow on Prospolabs.

  • Failed runs are automatically refunded on Prospolabs, so you are never charged for a clip that does not deliver. Successful jobs return an output_url that stays valid for 7 days.

  • No. The per-second price is identical whether you generate from the Prospolabs UI or via the API (POST /v1/generate with model id "seedance-2" or "seedance-2-fast"). You can top up from $5.

related on Prospolabs