We value your privacy

    We use cookies to run the site, measure performance, and personalise content. You can accept all or customise your choices.

    Manage your preferences at any time. Privacy Notice, Terms & Conditions, Cookie Policy.

    Media tools

    Generate images and videos using 40+ AI models. Always call get_models first to see available models, costs, and whether a model needs an input image.

    #generate_media

    Starts an image or video generation.

    Parameters:

    ParameterTypeRequiredDescription
    promptstringYesWhat to generate (e.g. “A red car on a mountain road”).
    modelstringYesModel ID (from get_models). Examples: nano-banana, sora-2, kling-2-6-image-to-video.
    generation_typestringNotext-to-image, text-to-video, image-to-video, or image-to-image. Default: text-to-image.
    negative_promptstringNoWhat to avoid in the output.
    source_media_urlsstring or arrayNoRequired for image-to-video and image-to-image. URL(s) to image(s), or for some models (e.g. Kling 2.6 Motion) image + video. See input limits below. Omit for text-to-image and text-to-video.
    aspect_ratiostringNoe.g. 1:1, 16:9, 9:16, 4:5, 21:9. Default: 1:1.
    durationstringNoVideo length. Only certain video models use this. See below.
    qualitystringNoe.g. fast, standard, pro, ultra. Default: standard.
    soundbooleanNoWhen true, request video with generated audio. Only certain video models use this. Default: false. See below.
    seednumberNoSeed for reproducible results.

    Example (text-to-image):

    {
      "prompt": "A futuristic city at sunset with flying cars",
      "model": "nano-banana",
      "generation_type": "text-to-image",
      "aspect_ratio": "16:9",
      "quality": "pro"
    }
    

    Example (image-to-video, one input image required):

    {
      "prompt": "Gentle motion and subtle movement",
      "model": "kling-2-6-image-to-video",
      "generation_type": "image-to-video",
      "source_media_urls": ["https://example.com/your-image.jpg"],
      "aspect_ratio": "16:9",
      "duration": "5s"
    }
    

    Response: Includes generation_id, status (e.g. pending), and often estimated_time_seconds and estimated_cost_credits. Poll with get_generation_status until status is completed or failed.

    Models that support duration:

    Model(s)Supported valuesNotes
    kling-2-6-text-to-video, kling-2-6-image-to-video5s, 10sOptional with/without audio (model variant).
    wan-2-5 (text-to-video, image-to-video)5s, 10s
    v1-pro-fast-i2v5s, 10s
    seedance-1-5-pro4s, 8s, 12sSupports both text-to-video (0–1 image optional) and image-to-video (2 images required).
    sora-2, sora-2-pro (text-to-video, image-to-video)10s, 15s
    sora-2-pro-storyboard10s, 15s, 25sScene-based; duration from shots.
    grok-text-to-video-6sFixed 6sDuration parameter ignored.
    kling-3-0-std, kling-3-0-pro3s15sSingle-shot mode. Max 2500 chars; supports @element_name references.
    grok-image-to-video, kling-2-5-image-to-video-pro, veo3-1Not configurableDuration not set via this parameter.

    For image-only models, duration is ignored.

    Models that support negative_prompt:

    Model(s)Notes
    imagen-4, imagen-4-fast, imagen-4-ultraText-to-image.
    wan-2-5 (text-to-video, image-to-video)
    kling-2-5-image-to-video-pro

    All other models ignore negative_prompt.

    Models that support quality (or equivalent):

    Model(s)How it worksValues
    sora-2-pro (text-to-video, image-to-video)Mapped to size (standard vs HD).standard, pro/high/hd (for HD).
    imagen-4 variantsMapped to model_variant.standard, fast, ultra (use quality: standard / fast / ultra).
    seedream-v4-5-edit (and seedream v4.5 text-to-image)Uses quality directly.e.g. basic and other API values.
    5-lite-text-to-image, 5-lite-image-to-imageUses quality directly.e.g. basic, high.
    veo3-1 vs veo3-1-fastDifferent model IDs, not a single quality param.Use model veo3-1 (quality) or veo3-1-fast (speed).
    flux-2, nano-banana-pro, nano-banana-2Resolution (1K/2K/4K), not a generic “quality” string.Use model variant or resolution param when available.

    For other models, quality is ignored.

    Prompt character limits:

    Some models enforce a maximum prompt length. Exceeding it may return an error or truncation.

    Model(s)Max characters
    wan-2-5800
    kling-2-6 (text-to-video, image-to-video)2,500
    kling-3-0-std, kling-3-0-pro2,500
    kling-2-5-image-to-video-pro2,500
    seedream-v4, seedream-v4-edit2,500
    seedream-v4-5, seedream-v4-5-edit3,000
    5-lite-text-to-image, 5-lite-image-to-image2,995
    gpt-1.5-image-medium, gpt-1.5-image-high3,000
    nano-banana, imagen-4, sora-2, flux-2, veo3-1, v1-pro-fast-i2v, grok (image/video)5,000
    nano-banana-pro (all variants)20,000
    nano-banana-2 (all variants)20,000

    Others may have no documented limit or use server defaults.

    Input file (image and video) limits:

    For image-to-video and image-to-image, source_media_urls is a list of URLs. Most models accept images only (JPEG, PNG, WebP, typically 10 MB max per file). Some models also accept video inputs; when they do, format and size limits apply (e.g. MP4, max duration).

    Model(s)Input typeLimitNotes
    kling-2-6-motion-control-720p, kling-2-6-motion-control-1080pImage + video1 image + 1 videoMotion Control: reference video drives motion. Video max 30 s; video file typically up to 100 MB (MP4/WebM).
    kling-3-0-motion-control-720p, kling-3-0-motion-control-1080pImage + video1 image + 1 videoKling 3.0 Motion Control: same as Kling 2.6. 24 credits/s (720p), 32 credits/s (1080p). Video max 30 s; video file typically up to 100 MB (MP4/WebM).
    kling-2-6-image-to-video, sora-2 (image-to-video), wan-2-5 (image-to-video), grok-image-to-video, v1-pro-fast-i2vImages only1 imageExactly one input image.
    kling-2-5-image-to-video-proImages only2 imagesStart frame and end frame.
    seedance-1-5-proImages onlyMode-dependentText-to-video (generation_type: "text-to-video"): 0–1 images optional. Image-to-video (generation_type: "image-to-video"): exactly 2 images required (start + end frame).
    kling-3-0-std, kling-3-0-proImages only1–2 imagesStart frame, or start + end frame. PNG/JPG/JPEG. Supports elements (see below).
    seedream-v4-editImages only10For editing.
    5-lite-text-to-image, 5-lite-image-to-imageImages only10For editing (image-to-image).
    nano-banana, nano-banana-editImages only10
    nano-banana-pro (all variants)Images only8
    nano-banana-2 (all variants)Images only8
    flux-2-edit (image-to-image)Images only8
    gpt-1.5-image (image-to-image)Images only16
    veo3-1 (image-to-video / reference modes)Images only1-3Depends on mode (1 text-to-video optional ref; 2 first+last frame; 3 reference).
    sora-2-pro-storyboardImages only1Optional.

    Use get_models to confirm input_media_types and capabilities for a given model. See Account tools for model list and pricing.

    Kling 3.0 – elements (optional):

    Elements let you reference images or videos in your prompt using @element_name. Pass kling_elements as an array of objects with name, description, and either element_input_urls (2–4 image URLs) or element_input_video_urls (1 video URL). Elements can only be used when you have at least 1 picture: text-to-video + 1 image, or image-to-video with first and last frame. Each element requires a title (name) and description. Image elements: JPG/PNG, min 300×300px, max 10MB each. Video elements: MP4/MOV, max 50MB.

    Seedance 1.5 Pro – two modes (check generation_type before using images):

    Modegeneration_typesource_media_urlsCan use images?
    Text-to-video"text-to-video"Empty or 1 URLOptional: 0–1 images. Omit for text-only; include 1 URL to animate that image.
    Image-to-video"image-to-video"Exactly 2 URLsRequired: exactly 2 images (start frame + end frame).

    Models that support audio (sound parameter):

    Model(s)Notes
    kling-2-6-text-to-video, kling-2-6-image-to-videoSet sound: true to get video with generated audio. Different pricing for audio vs no-audio variants.
    kling-3-0-std, kling-3-0-proSet sound: true to enable generated audio.
    seedance-1-5-proSet sound: true to enable generated audio. Supports both text-to-video and image-to-video modes.

    All other video models do not support the sound parameter. Image-only models ignore sound.


    #get_generation_status

    Check the status of a media generation and get output URLs when done.

    Parameters:

    ParameterTypeRequiredDescription
    generation_idstringYesID returned from generate_media.

    Response: Includes status (pending, queued, processing, completed, failed), progress, and when completed an outputs array with url, thumbnail_url, optimized_url, media_type, dimensions, etc.


    #get_generation_estimate

    Get a parameter-specific estimated processing time for a given model and options (no job is started). For a per-model estimated duration in one call, use get_models; each model includes estimated_time_seconds. Use get_generation_estimate when you need an estimate that depends on prompt length, duration, or other parameters.

    Parameters:

    ParameterTypeRequiredDescription
    modelstringYesModel ID.
    generation_typestringNoSame as in generate_media. Default: text-to-image.
    promptstringNoOptional; can affect estimate.
    negative_promptstringNoOptional.
    parametersobjectNoOptional extra parameters.

    Response: Estimated time (and optionally confidence/sample size) so you can set user expectations before calling generate_media.


    #Model rules

    • Text-to-image and text-to-video: Do not send source_media_urls (unless the model supports an optional reference image). Exception: seedance-1-5-pro in text-to-video mode accepts 0–1 optional images.
    • Image-to-video and image-to-image: Send image (and when supported, video) URL(s) in source_media_urls. Most models need images only; some (e.g. Kling 2.6 Motion Control) require 1 image + 1 video. seedance-1-5-pro in image-to-video mode requires exactly 2 images (start + end frame). Respect each model’s input limits above.
    • Audio: kling-2-6 and kling-3-0 support sound: true; other models ignore sound.
    • Use get_models to see which models support which generation types, input_media_types (e.g. image, video), and required input counts.

    See Limitations for rate limits and credits.