REST API — model requirements
This reference lists default limits and rules the Kubeez REST API applies per internal model_id. It is generated from server/rest-api/tools/model_limits.py. After editing that file, regenerate this page by running:
python server/rest-api/scripts/export_model_requirements_md.py
#Live data from the API
The canonical snapshot for enabled models is GET /v1/models. Each entry includes capabilities (merged from database metadata and the limits below), generation_types, requires_input_media, input_media_types, usage_notes, and pricing hints.
If a model is enabled in the product but not listed in the table below, it still appears in GET /v1/models with database-only metadata — rely on that response for requirements.
#How fields map to POST /v1/generate/media
| Concept | JSON field | Notes |
|---|---|---|
| Prompt length | prompt | Stay within prompt max for the model. |
| Mode | generation_type | Must be one of the model’s generation_types. |
| Inputs | source_media_urls | Array of HTTPS URLs (e.g. from POST /v1/upload/media). Order matters when the docs say “image first, then video” (Motion Control). |
| Video length | duration | Only for models with duration_options; value must be one of those strings. |
| Negative prompt | negative_prompt | Only where negative_prompt is yes in the table. |
| Audio track | sound | Only for video models where sound is yes. |
#Upload and file size
POST /v1/upload/media accepts up to 500 MB per file. Providers may still reject oversized or invalid inputs (common image caps ~10 MB, Motion Control video limits, etc.). Use usage_notes and Media tools for edge cases (e.g. Kling elements, Seedance modes).
#Per-model reference
| model_id | prompt max | generation_types | requires_input | input_media_types | max images | max videos | duration_options | negative | sound | usage_notes |
|---|---|---|---|---|---|---|---|---|---|---|
| 5-lite-image-to-image | 2995 | image-to-image | yes | image | 10 | — | — | no | no | Seedream 5 Lite Edit (same edge function as 5-lite-text-to-image). Image-to-image only. Require source_media_urls (up to 10 image URLs). Set generation_type='image-to-image'. |
| 5-lite-text-to-image | 2995 | text-to-image, image-to-image | no | — | 10 | — | — | no | no | Seedream 5 Lite (served by seedream-v5-lite-edit edge function). Text-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 10). Prompt max 2995 chars. quality: basic or high. aspect_ratio e.g. 1:1, 9:16, 16:9. |
| flux-2 | 5000 | text-to-image | no | — | 0 | — | — | no | no | Text-to-image only (1K resolution). Do not pass source_media_urls. |
| flux-2-1K | 5000 | text-to-image | no | — | 0 | — | — | no | no | Flux 2 (1K resolution). Text-to-image only. Do not pass source_media_urls. |
| flux-2-2K | 5000 | text-to-image | no | — | 0 | — | — | no | no | Flux 2 (2K resolution). Text-to-image only. Do not pass source_media_urls. |
| flux-2-edit-1K | 5000 | image-to-image | yes | image | 8 | — | — | no | no | Flux 2 Edit (1K resolution). Image-to-image only. Requires 1-8 image URLs. Set generation_type='image-to-image'. |
| flux-2-edit-2K | 5000 | image-to-image | yes | image | 8 | — | — | no | no | Flux 2 Edit (2K resolution). Image-to-image only. Requires 1-8 image URLs. Set generation_type='image-to-image'. |
| gpt-1.5-image-high | 3000 | text-to-image, image-to-image | no | — | 16 | — | — | no | no | Same as gpt-1.5-image-medium. Prompt max 3000 chars. quality: high. |
| gpt-1.5-image-medium | 3000 | text-to-image, image-to-image | no | — | 16 | — | — | no | no | Text-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 16 images). Prompt max 3000 chars. quality: medium or high. |
| grok-image-to-video | 5000 | image-to-video | yes | image | 1 | — | — | no | no | Image-to-video only. Require source_media_urls with exactly one image URL. Duration not configurable. |
| grok-text-to-image | 5000 | text-to-image | no | — | 0 | — | — | no | no | Text-to-image only. Omit source_media_urls. |
| grok-text-to-video-6s | 5000 | text-to-video | no | — | 0 | — | — | no | no | Text-to-video only. Fixed 6s duration; duration parameter ignored. Omit source_media_urls. |
| imagen-4 | 5000 | text-to-image | no | — | 0 | — | — | yes | no | Text-to-image only. Do not pass source_media_urls. negative_prompt supported. quality: standard, fast, ultra. |
| imagen-4-fast | 5000 | text-to-image | no | — | 0 | — | — | yes | no | Fast version of Imagen 4. Text-to-image only. Omit source_media_urls. negative_prompt supported. |
| imagen-4-ultra | 5000 | text-to-image | no | — | 0 | — | — | yes | no | Ultra version of Imagen 4. Text-to-image only. Omit source_media_urls. negative_prompt supported. |
| kling-2-5-image-to-video-pro | 2500 | image-to-video | yes | image | 2 | — | — | yes | no | Image-to-video with start and end frame. Require source_media_urls with exactly 2 image URLs. negative_prompt supported. Prompt max 2500 chars. |
| kling-2-6-image-to-video | 2500 | image-to-video | yes | image | 1 | — | 5s, 10s | no | yes | Image-to-video only. Require source_media_urls with exactly one image URL. duration 5s or 10s. sound: true for audio. Prompt max 2500 chars. |
| kling-2-6-motion-control-1080p | 2500 | image-to-video | yes | image, video | 1 | 1 | — | no | no | Same as kling-2-6-motion-control-720p. Requires 1 image + 1 video URL in source_media_urls. |
| kling-2-6-motion-control-720p | 2500 | image-to-video | yes | image, video | 1 | 1 | — | no | no | Motion Control: requires exactly 1 image URL + 1 video URL in source_media_urls (image first, then video). Video max 30s, typically MP4/WebM up to 100MB. |
| kling-2-6-text-to-video | 2500 | text-to-video | no | — | 0 | — | 5s, 10s | no | yes | Text-to-video only. Omit source_media_urls. duration 5s or 10s. Set sound: true for generated audio. Prompt max 2500 chars. |
| kling-3-0-motion-control-1080p | 2500 | image-to-video | yes | image, video | 1 | 1 | — | no | no | Same as kling-3-0-motion-control-720p. Requires 1 image + 1 video URL in source_media_urls. |
| kling-3-0-motion-control-720p | 2500 | image-to-video | yes | image, video | 1 | 1 | — | no | no | Same as Kling 2.6 Motion Control. Requires 1 image + 1 video URL in source_media_urls (image first, then video). Video max 30s. |
| kling-3-0-pro | 2500 | text-to-video, image-to-video | no | image | 2 | — | 3s, 5s, 8s, 10s, 15s | no | yes | Kling 3.0 pro. Same rules as kling-3-0-std; higher quality tier. Text-to-video: omit source_media_urls. Image-to-video: 1–2 image URLs. duration 3s–15s. sound: true for generated audio. @element_name references supported. |
| kling-3-0-std | 2500 | text-to-video, image-to-video | no | image | 2 | — | 3s, 5s, 8s, 10s, 15s | no | yes | Kling 3.0 standard. Text-to-video: omit source_media_urls. Image-to-video: pass 1–2 image URLs (start frame, or start + end frame). duration 3s–15s. sound: true for generated audio. Supports @element_name references in prompt (see Media tools docs). |
| nano-banana | 5000 | text-to-image, image-to-image | no | — | 10 | — | — | no | no | Text-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 10 image URLs). |
| nano-banana-2 | 20000 | text-to-image, image-to-image | no | — | 8 | — | — | no | no | Standard resolution (1K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. Google Search grounding is always enabled. |
| nano-banana-2-2K | 20000 | text-to-image, image-to-image | no | — | 8 | — | — | no | no | High resolution (2K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. Google Search grounding is always enabled. |
| nano-banana-2-4K | 20000 | text-to-image, image-to-image | no | — | 8 | — | — | no | no | Ultra-high resolution (4K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. Google Search grounding is always enabled. |
| nano-banana-edit | 5000 | image-to-image | yes | image | 10 | — | — | no | no | Requires source_media_urls with image URL(s). Up to 10 images. Set generation_type='image-to-image'. |
| nano-banana-pro | 20000 | text-to-image, image-to-image | no | — | 8 | — | — | no | no | Standard resolution (1K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. |
| nano-banana-pro-2K | 20000 | text-to-image, image-to-image | no | — | 8 | — | — | no | no | High resolution (2K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. |
| nano-banana-pro-4K | 20000 | text-to-image, image-to-image | no | — | 8 | — | — | no | no | Ultra-high resolution (4K). Use aspect_ratio (1:1, 16:9, 9:16, 3:4, 4:3, 3:2, 2:3, 5:4, 4:5, 21:9). Text-to-image: omit source_media_urls. Image-to-image: up to 8 images. |
| seedance-1-5-pro | 5000 | text-to-video, image-to-video | no | image | 2 | — | 4s, 8s, 12s | no | yes | Seedance 1.5 Pro. Two modes: (1) text-to-video: 0–1 image optional; (2) image-to-video: exactly 2 images (start + end frame) required. duration 4s, 8s, or 12s. Resolutions: 480p, 720p, 1080p. sound: true for generated audio. fixed_lens (boolean): when true, keeps camera/lens fixed. Variants: e.g. seedance-1-5-pro-720p-8s, seedance-1-5-pro-1080p-12s-audio. |
| seedream-v4-5 | 3000 | text-to-image, image-to-image | no | — | 10 | — | — | no | no | Seedream 4.5 (served by seedream-v4-5-edit edge function). Text-to-image: omit source_media_urls. Image-to-image: pass source_media_urls (up to 10). Prompt max 3000 chars. quality: basic or high. aspect_ratio e.g. 1:1, 9:16, 16:9. |
| seedream-v4-5-edit | 3000 | image-to-image | yes | image | 10 | — | — | no | no | Seedream 4.5 Edit (same edge function as seedream-v4-5). Image-to-image only. Require source_media_urls (up to 10 image URLs). Set generation_type='image-to-image'. |
| seedream-v4-edit | 2500 | image-to-image | yes | image | 10 | — | — | no | no | Image-to-image only. Require source_media_urls (up to 10 image URLs). Set generation_type='image-to-image'. |
| sora-2 | 5000 | text-to-video, image-to-video | no | — | 1 | — | 10s, 15s | no | no | Text-to-video: omit source_media_urls; use duration 10s or 15s. Image-to-video: pass exactly one image URL in source_media_urls. |
| sora-2-pro | 5000 | text-to-video, image-to-video | no | — | 1 | — | 10s, 15s | no | no | Same as sora-2. quality: standard or pro/hd for HD. duration 10s or 15s. |
| sora-2-pro-storyboard | 5000 | text-to-video, image-to-video | no | image | 1 | — | 10s, 15s, 25s | no | no | Scene-based. duration 10s, 15s, or 25s. Optional single image in source_media_urls. |
| v1-pro-fast-i2v | 5000 | image-to-video | yes | image | 1 | — | 5s, 10s | no | no | Seedance 1.0 (model_id: v1-pro-fast-i2v). Image-to-video only. Require source_media_urls with exactly one image URL. duration 5s or 10s. |
| veo3-1 | 5000 | text-to-video, image-to-video | no | image | 3 | — | — | no | no | Text-to-video: omit or 1 optional ref image. Image-to-video / reference modes: 1-3 images. Duration not set via parameter. |
| veo3-1-fast | 5000 | text-to-video, image-to-video | no | image | 3 | — | — | no | no | Same as veo3-1; different model_id for speed. Duration not configurable. |
| wan-2-5 | 800 | text-to-video, image-to-video | no | — | 1 | — | 5s, 10s | yes | no | Prompt max 800 chars. Text-to-video: omit source_media_urls. Image-to-video: exactly one image URL. duration 5s or 10s. negative_prompt supported. |
| Z-image | 5000 | text-to-image | no | — | 0 | — | — | no | no | Text-to-image only. Omit source_media_urls. |
See also API overview, REST API, Media tools, and Limitations.
