Dialogue tools

    Generate single-voice text-to-speech audio with ElevenLabs v3 on Replicate. The generate_dialogue tool accepts one prompt and one voice per call — multi-voice scenes are produced by stitching multiple calls.

    #Available voices

    elevenlabs/v3 accepts these 26 voices (case-sensitive). Any value outside this list is rejected with 400 Unsupported voice and you are not charged.

    #Female voices

    Voice IDDescriptionPreview
    RachelCalm, articulate American
    AriaExpressive, raspy American
    DomiStrong, confident young American
    SarahSoft, warm young American
    JaneMature, dignified Australian
    JuniperNatural, articulate American
    ArabellaMysterious British narrator
    HopeBright, upbeat American
    BlondieCasual American conversationalist
    PriyankaSultry, soothing Indian
    AlexandraConversational young American
    MonikaDeep, natural Indian

    #Male voices

    Voice IDDescriptionPreview
    DrewWell-rounded American narrator
    ClydeGritty war-veteran character
    PaulAuthoritative ground reporter
    DaveConversational young British
    RogerClassy American businessman
    FinSailor character, Irish accent
    JamesCalm Australian narrator
    BradfordTheatrical, articulate British
    ReginaldIntense, dramatic British character
    GamingAnimated, energetic gaming character
    AustinEasygoing American country
    KuonCheerful, steady character voice
    MarkCasual, relaxed American
    GrimblewoodGruff fantasy creature character

    The full live catalog also flows through get_models — filter by model_type: "text-to-dialogue" to inspect pricing and capabilities programmatically.


    #generate_dialogue

    Generate a single-voice TTS clip.

    Parameters:

    ParameterTypeRequiredDescription
    text (or prompt)stringYes5–5000 characters after [bracket] audio tags are stripped.
    voicestringNoOne of the 26 IDs above. Default: Rachel.
    stabilitynumberNo0..1, default 0.5. Higher = more stable, lower = more expressive.
    similarity_boostnumberNo0..1, default 0.75.
    stylenumberNo0..1, default 0. Style exaggeration.
    speednumberNo0.7..1.2, default 1.0.
    previous_text / next_textstringNoOptional surrounding context to keep prosody consistent across stitched chunks.
    language_codestringNoISO code. Default en. One of 29 supported codes: ar, bg, cs, da, de, el, en, es, fi, fil, fr, hi, hr, id, it, ja, ko, ms, nl, pl, pt, ro, ru, sk, sv, ta, tr, uk, zh. Pass auto to default to en.

    Audio tags like [HEY], [laughs], [whispers] are stripped server-side before TTS — they are not spoken or interpreted by the model.

    Example:

    {
      "text": "Welcome back — ready to generate?",
      "voice": "Rachel",
      "stability": 0.5,
      "language_code": "en"
    }
    

    Response: Returns a generation_id. Poll with get_generation_status.

    #Multi-voice scenes

    generate_dialogue is single-voice. For dialogue between two speakers, call the tool once per line (passing previous_text / next_text for prosody continuity), then concatenate the resulting audio files yourself.

    #get_generation_status

    Use the generation_id returned by generate_dialogue to check progress. When the status is completed, the audio file URL is in the outputs array (media_type: "audio").


    #Credits and limits

    • Cost: 26 credits per 1,000 characters (rounding: decimal ≤ 0.3 floors, > 0.3 ceils).
    • Minimum: 1 credit for any non-empty text.
    • Minimum length: 5 characters after audio tags are stripped.
    • Maximum length: 5,000 characters per request.

    See Limitations for full details.