REST API — model requirements

    This reference lists default limits and rules the Kubeez REST API applies per internal model_id. It is generated from server/rest-api/tools/model_limits.py. After editing that file, regenerate this page by running:

    python server/rest-api/scripts/export_model_requirements_md.py
    

    #Live data from the API

    The canonical snapshot for enabled models is GET /v1/models. Each entry includes capabilities (merged from database metadata and the limits below), generation_types, requires_input_media, input_media_types, usage_notes, and pricing hints.

    If a model is enabled in the product but not listed in the table below, it still appears in GET /v1/models with database-only metadata — rely on that response for requirements.

    #How fields map to POST /v1/generate/media

    ConceptJSON fieldNotes
    Prompt lengthpromptStay within prompt max for the model.
    Modegeneration_typeMust be one of the model’s generation_types.
    Inputssource_media_urlsArray of HTTPS URLs (e.g. from POST /v1/upload/media). Order matters when the docs say “image first, then video” (Motion Control).
    Video lengthdurationOnly for models with duration_options; value must be one of those strings.
    Negative promptnegative_promptOnly where negative_prompt is yes in the table.
    Audio tracksoundOnly for video models where sound is yes.

    #Upload and file size

    POST /v1/upload/media accepts up to 500 MB per file. Providers may still reject oversized or invalid inputs (common image caps ~10 MB, Motion Control video limits, etc.). Use usage_notes and Media tools for edge cases (e.g. Kling elements, Seedance modes).

    #Per-model reference

    model_idprompt maxgeneration_typesrequires_inputinput_media_typesmax imagesmax videosduration_optionsnegativesoundusage_notes
    5-lite-image-to-image2995image-to-imageyesimage10nonoSeedream 5 Lite Edit (same edge function as 5-lite-text-to-image). Image-to-image only. Require source_media_urls (up to 10 image URLs). Set generation_type='image-to-image'.
    5-lite-text-to-image2995text-to-image, image-to-imageno10nonoSeedream 5 Lite (served by seedream-v5-lite-edit edge function). Text-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 10). Prompt max 2995 chars. quality: basic or high. aspect_ratio e.g. 1:1, 9:16, 16:9.
    flux-25000text-to-imageno0nonoText-to-image only (1K resolution). Do not pass source_media_urls.
    flux-2-1K5000text-to-imageno0nonoFlux 2 (1K resolution). Text-to-image only. Do not pass source_media_urls.
    flux-2-2K5000text-to-imageno0nonoFlux 2 (2K resolution). Text-to-image only. Do not pass source_media_urls.
    flux-2-edit-1K5000image-to-imageyesimage8nonoFlux 2 Edit (1K resolution). Image-to-image only. Requires 1-8 image URLs. Set generation_type='image-to-image'.
    flux-2-edit-2K5000image-to-imageyesimage8nonoFlux 2 Edit (2K resolution). Image-to-image only. Requires 1-8 image URLs. Set generation_type='image-to-image'.
    gpt-1.5-image-high3000text-to-image, image-to-imageno16nonoSame as gpt-1.5-image-medium. Prompt max 3000 chars. quality: high.
    gpt-1.5-image-medium3000text-to-image, image-to-imageno16nonoText-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 16 images). Prompt max 3000 chars. quality: medium or high.
    grok-image-to-video5000image-to-videoyesimage1nonoImage-to-video only. Require source_media_urls with exactly one image URL. Duration not configurable.
    grok-text-to-image5000text-to-imageno0nonoText-to-image only. Omit source_media_urls.
    grok-text-to-video-6s5000text-to-videono0nonoText-to-video only. Fixed 6s duration; duration parameter ignored. Omit source_media_urls.
    imagen-45000text-to-imageno0yesnoText-to-image only. Do not pass source_media_urls. negative_prompt supported. quality: standard, fast, ultra.
    imagen-4-fast5000text-to-imageno0yesnoFast version of Imagen 4. Text-to-image only. Omit source_media_urls. negative_prompt supported.
    imagen-4-ultra5000text-to-imageno0yesnoUltra version of Imagen 4. Text-to-image only. Omit source_media_urls. negative_prompt supported.
    kling-2-5-image-to-video-pro2500image-to-videoyesimage2yesnoImage-to-video with start and end frame. Require source_media_urls with exactly 2 image URLs. negative_prompt supported. Prompt max 2500 chars.
    kling-2-6-image-to-video2500image-to-videoyesimage15s, 10snoyesImage-to-video only. Require source_media_urls with exactly one image URL. duration 5s or 10s. sound: true for audio. Prompt max 2500 chars.
    kling-2-6-motion-control-1080p2500image-to-videoyesimage, video11nonoSame as kling-2-6-motion-control-720p. Requires 1 image + 1 video URL in source_media_urls.
    kling-2-6-motion-control-720p2500image-to-videoyesimage, video11nonoMotion Control: requires exactly 1 image URL + 1 video URL in source_media_urls (image first, then video). Video max 30s, typically MP4/WebM up to 100MB.
    kling-2-6-text-to-video2500text-to-videono05s, 10snoyesText-to-video only. Omit source_media_urls. duration 5s or 10s. Set sound: true for generated audio. Prompt max 2500 chars.
    kling-3-0-motion-control-1080p2500image-to-videoyesimage, video11nonoSame as kling-3-0-motion-control-720p. Requires 1 image + 1 video URL in source_media_urls.
    kling-3-0-motion-control-720p2500image-to-videoyesimage, video11nonoSame as Kling 2.6 Motion Control. Requires 1 image + 1 video URL in source_media_urls (image first, then video). Video max 30s.
    kling-3-0-pro2500text-to-video, image-to-videonoimage23s, 5s, 8s, 10s, 15snoyesKling 3.0 pro. Same rules as kling-3-0-std; higher quality tier. Text-to-video: omit source_media_urls. Image-to-video: 1–2 image URLs. duration 3s–15s. sound: true for generated audio. @element_name references supported.
    kling-3-0-std2500text-to-video, image-to-videonoimage23s, 5s, 8s, 10s, 15snoyesKling 3.0 standard. Text-to-video: omit source_media_urls. Image-to-video: pass 1–2 image URLs (start frame, or start + end frame). duration 3s–15s. sound: true for generated audio. Supports @element_name references in prompt (see Media tools docs).
    nano-banana5000text-to-image, image-to-imageno10nonoText-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 10 image URLs).
    nano-banana-220000text-to-image, image-to-imageno8nonoStandard resolution (1K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. Google Search grounding is always enabled.
    nano-banana-2-2K20000text-to-image, image-to-imageno8nonoHigh resolution (2K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. Google Search grounding is always enabled.
    nano-banana-2-4K20000text-to-image, image-to-imageno8nonoUltra-high resolution (4K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. Google Search grounding is always enabled.
    nano-banana-edit5000image-to-imageyesimage10nonoRequires source_media_urls with image URL(s). Up to 10 images. Set generation_type='image-to-image'.
    nano-banana-pro20000text-to-image, image-to-imageno8nonoStandard resolution (1K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images.
    nano-banana-pro-2K20000text-to-image, image-to-imageno8nonoHigh resolution (2K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images.
    nano-banana-pro-4K20000text-to-image, image-to-imageno8nonoUltra-high resolution (4K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images.
    seedance-1-5-pro5000text-to-video, image-to-videonoimage24s, 8s, 12snoyesSeedance 1.5 Pro. Two modes: (1) text-to-video: 0–1 image optional; (2) image-to-video: exactly 2 images (start + end frame) required. duration 4s, 8s, or 12s. Resolutions: 480p, 720p, 1080p. sound: true for generated audio. fixed_lens (boolean): when true, keeps camera/lens fixed. Variants: e.g. seedance-1-5-pro-720p-8s, seedance-1-5-pro-1080p-12s-audio.
    seedream-v4-53000text-to-image, image-to-imageno10nonoSeedream 4.5 (served by seedream-v4-5-edit edge function). Text-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 10). Prompt max 3000 chars. quality: basic or high. aspect_ratio e.g. 1:1, 9:16, 16:9.
    seedream-v4-5-edit3000image-to-imageyesimage10nonoSeedream 4.5 Edit (same edge function as seedream-v4-5). Image-to-image only. Require source_media_urls (up to 10 image URLs). Set generation_type='image-to-image'.
    seedream-v4-edit2500image-to-imageyesimage10nonoImage-to-image only. Require source_media_urls (up to 10 image URLs). Set generation_type='image-to-image'.
    sora-25000text-to-video, image-to-videono110s, 15snonoText-to-video: omit source_media_urls; use duration 10s or 15s. Image-to-video: pass exactly one image URL in source_media_urls.
    sora-2-pro5000text-to-video, image-to-videono110s, 15snonoSame as sora-2. quality: standard or pro/hd for HD. duration 10s or 15s.
    sora-2-pro-storyboard5000text-to-video, image-to-videonoimage110s, 15s, 25snonoScene-based. duration 10s, 15s, or 25s. Optional single image in source_media_urls.
    v1-pro-fast-i2v5000image-to-videoyesimage15s, 10snonoSeedance 1.0 (model_id: v1-pro-fast-i2v). Image-to-video only. Require source_media_urls with exactly one image URL. duration 5s or 10s.
    veo3-15000text-to-video, image-to-videonoimage3nonoText-to-video: omit or 1 optional ref image. Image-to-video / reference modes: 1-3 images. Duration not set via parameter.
    veo3-1-fast5000text-to-video, image-to-videonoimage3nonoSame as veo3-1; different model_id for speed. Duration not configurable.
    wan-2-5800text-to-video, image-to-videono15s, 10syesnoPrompt max 800 chars. Text-to-video: omit source_media_urls. Image-to-video: exactly one image URL. duration 5s or 10s. negative_prompt supported.
    Z-image5000text-to-imageno0nonoText-to-image only. Omit source_media_urls.

    See also API overview, REST API, Media tools, and Limitations.