Pricing Sign In

Canonical media flow

Five steps from a user file to a finished generation

01
get_upload_url(model_id)
Returns a temp browser link. Send it to the user verbatim.
02
User uploads in browser
Kubeez probes duration / dimensions client-side into media_upload_metadata.
03
get_upload_session(token)
Returns public URLs + per-file duration_seconds + a billing_readiness flag.
04
estimate_generation_cost(...)
Exact price preview, no credits deducted. Quote it to the user.
05
generate_*(...)
Only after the user confirms — generation_id returned for polling.

Flat-rate models (Nano Banana, Flux, Imagen, Z-Image, Seedream, Logo Maker, ad-copy, music, speech) skip step 4 and quote cost_per_generation directly.

How it works

The MCP server runs every generation as an async job: you start it, then poll until it's done. Your client handles polling automatically — this page is for anyone building one or debugging the flow.

#The canonical media flow

For any tool that takes a file from the user (image edit, motion control, lip-sync, captions, separation, ad-copy):

get_upload_url(model_id=X) — returns a temp browser link. Send it to the user verbatim.
User uploads in their browser. Kubeez probes duration / dimensions client-side and writes them to the metadata table.
get_upload_session(token) — returns the public URLs plus per-file duration_seconds and a billing_readiness flag (exact / pessimistic_fallback).
estimate_generation_cost(model, reference_*_seconds=…) — exact price preview, no credits deducted. Quote it to the user.
generate_*(...) — only after the user confirms.

Flat-rate models (Nano Banana, Flux, Imagen, Z-Image, Seedream, Logo Maker, ad-copy, music, speech) skip step 4 — quote cost_per_generation from get_models directly.

#Per-tool flows

#Images & video

Optional: get_models → pick a model_id and read its capabilities.
Optional: get_balance to check credits.
generate_media(prompt, model, …) → returns generation_id.
Poll get_generation_status(id) every 5s after a brief initial wait (~10s for image, ~30s for short video).
When status is completed, read outputs[] — each item has url, thumbnail_url, optional optimized_url.

#Music

generate_music(prompt, …) → generation_id.
Poll get_music_status(id) after ~30s, then every 5s.
When done, read songs[] — each has audio_url, stream_url, cover_image_url.

#Captions

generate_captions(media_url, …) is synchronous — returns word-level captions in the response (1–5 minute wait).

#Audio separation

generate_separation(media_url) → separation_id.
Poll get_separation_status(id) after ~20s, then every 5s.
When done, read vocals_url + instrumental_url.

#Ad creatives

create_ad_copy(reference_ad_url, …, variant_count) → generation_ids: [...] (one per variant).
Poll each id with get_generation_status until completed.

#Polling cadence

Every get_models row carries an estimated_time_seconds based on real historical completion times. Use it as the first-poll delay, then poll every 5s. Don't poll every second — you'll burn rate-limit budget for the same result.

Family	Typical first-poll wait
Image	~10s (z-image often <3s)
Short video (5–10s)	~30s
Long video (15s+)	~60s
Music	~30s
Speech / TTS	~5s
Audio separation	~20s

#Best practices

Quote cost before generating for any per-duration model — estimate_generation_cost is read-only.
Always start with get_models — never hardcode model_ids; capabilities drive the request shape.
Respect input requirements — get_models tells you requires_input_media, max_input_images, etc.
Handle 429 (rate limit) — back off and retry. Don't loop tightly on errors.
Surface clear errors to the user on insufficient credits or invalid inputs — don't auto-retry without their action.

See Limitations for rate limits and credit refund behavior.