Dialogue tools
Generate single-voice text-to-speech audio with ElevenLabs v3 on Replicate. The generate_dialogue tool accepts one prompt and one voice per call — multi-voice scenes are produced by stitching multiple calls.
#Available voices
elevenlabs/v3 accepts these 26 voices (case-sensitive). Any value outside this list is rejected with 400 Unsupported voice and you are not charged.
#Female voices
| Voice ID | Description | Preview |
|---|---|---|
| Rachel | Calm, articulate American | |
| Aria | Expressive, raspy American | |
| Domi | Strong, confident young American | |
| Sarah | Soft, warm young American | |
| Jane | Mature, dignified Australian | |
| Juniper | Natural, articulate American | |
| Arabella | Mysterious British narrator | |
| Hope | Bright, upbeat American | |
| Blondie | Casual American conversationalist | |
| Priyanka | Sultry, soothing Indian | |
| Alexandra | Conversational young American | |
| Monika | Deep, natural Indian |
#Male voices
| Voice ID | Description | Preview |
|---|---|---|
| Drew | Well-rounded American narrator | |
| Clyde | Gritty war-veteran character | |
| Paul | Authoritative ground reporter | |
| Dave | Conversational young British | |
| Roger | Classy American businessman | |
| Fin | Sailor character, Irish accent | |
| James | Calm Australian narrator | |
| Bradford | Theatrical, articulate British | |
| Reginald | Intense, dramatic British character | |
| Gaming | Animated, energetic gaming character | |
| Austin | Easygoing American country | |
| Kuon | Cheerful, steady character voice | |
| Mark | Casual, relaxed American | |
| Grimblewood | Gruff fantasy creature character |
The full live catalog also flows through get_models — filter by model_type: "text-to-dialogue" to inspect pricing and capabilities programmatically.
#generate_dialogue
Generate a single-voice TTS clip.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
text (or prompt) | string | Yes | 5–5000 characters after [bracket] audio tags are stripped. |
voice | string | No | One of the 26 IDs above. Default: Rachel. |
stability | number | No | 0..1, default 0.5. Higher = more stable, lower = more expressive. |
similarity_boost | number | No | 0..1, default 0.75. |
style | number | No | 0..1, default 0. Style exaggeration. |
speed | number | No | 0.7..1.2, default 1.0. |
previous_text / next_text | string | No | Optional surrounding context to keep prosody consistent across stitched chunks. |
language_code | string | No | ISO code. Default en. One of 29 supported codes: ar, bg, cs, da, de, el, en, es, fi, fil, fr, hi, hr, id, it, ja, ko, ms, nl, pl, pt, ro, ru, sk, sv, ta, tr, uk, zh. Pass auto to default to en. |
Audio tags like
[HEY],[laughs],[whispers]are stripped server-side before TTS — they are not spoken or interpreted by the model.
Example:
{
"text": "Welcome back — ready to generate?",
"voice": "Rachel",
"stability": 0.5,
"language_code": "en"
}
Response: Returns a generation_id. Poll with get_generation_status.
#Multi-voice scenes
generate_dialogue is single-voice. For dialogue between two speakers, call the tool once per line (passing previous_text / next_text for prosody continuity), then concatenate the resulting audio files yourself.
#get_generation_status
Use the generation_id returned by generate_dialogue to check progress. When the status is completed, the audio file URL is in the outputs array (media_type: "audio").
#Credits and limits
- Cost: 26 credits per 1,000 characters (rounding: decimal ≤ 0.3 floors, > 0.3 ceils).
- Minimum: 1 credit for any non-empty text.
- Minimum length: 5 characters after audio tags are stripped.
- Maximum length: 5,000 characters per request.
See Limitations for full details.
