Pricing Sign In

Quick connect

Add Kubeez to Cursor in one click

OAuth opens in your browser — no JSON to edit, no token to paste.

Connect

OAuth 2.1 sign-in for Cursor, Claude, ChatGPT — or paste a personal access token.

Quick start

Run your first generation in under five minutes.

Browse models

Every image, video, music, voice, captions, and separation model with pricing.

MCP overview

The Kubeez MCP server lets AI assistants like Claude, Cursor, and ChatGPT use Kubeez tools natively — no SDK install, no API integration code. One connection, every creative tool.

#What you can do

Images & video — 40+ models for text-to-image, image-to-image, text-to-video, image-to-video, motion control, and lip-sync.
Music — full songs with vocals or instrumental, plus optional custom lyrics, style, and BPM.
Dialogue — multi-speaker text-to-speech with 80+ voices.
Captions — word-level timestamped transcripts from any video.
Audio separation — split a track into vocals + instrumental stems.
Ad creatives — multiple ad variants from a reference ad and your product.

#Connect

Use this URL when adding the server in your MCP client:

https://mcp.kubeez.com/mcp

Cursor users: hit the Add to Cursor button at the top of this page — it uses Cursor's official install flow.

Method	When to use
OAuth (recommended)	Cursor, Claude Desktop, and most clients open a browser, you sign in to Kubeez, the client stores the token. Nothing to paste.
Personal access token	Clients that only accept an API key (e.g. ChatGPT custom GPTs). Generate one in Settings → MCP, paste it as the Bearer / API key. Same permissions as OAuth, revoke any time.

If your client supports custom headers, use Authorization: Bearer <token> with a personal access token.

#How it works

Add the server URL in your MCP client.
Sign in to Kubeez (OAuth) and approve the capabilities the client requests.
The client gets a token and can call tools — generations use your Kubeez credits.

All generation tools are async: start a job, poll until it finishes. The client handles polling automatically; the How it works page explains the flow if you're building one yourself.

#Requirements

A Kubeez account with credits.
Any MCP-compatible client (Claude Desktop, Cursor, Claude.ai, ChatGPT custom GPTs, your own integration).

For raw HTTP / API key access (servers, scripts, mobile backends), use the REST API instead.

Next: Quick start — connect and run your first generation in 5 minutes.

Real tool sequences

How AI assistants chain these tools

Music separation — split a song into vocals + instrumental

Audio separation with exact, duration-based billing.

1
get_upload_url(model_id="mvsep-40")
Returns a temp browser link. Hand it to the user verbatim.
2
get_upload_session(token)
Returns media_urls + total_audio_seconds + billing_readiness:"exact".
3
estimate_generation_cost(model="mvsep-40", reference_audio_seconds=…)
Quote the price. No credits deducted.
4
generate_separation(media_url=…) → get_separation_status
First poll ~20s, then every 5s. Returns vocals + instrumental URLs.

Motion control — animate a character from a video

Two uploads, ordered, billed by reference video duration.

1
get_upload_url(model_id="kling-3-0-motion-control-720p")
Link enforces 1 image + 1 video.
2
get_upload_session(token)
Ordered media_urls + total_video_seconds.
3
estimate_generation_cost(model="…motion-control-720p", reference_video_seconds=…)
Confirm before charging.
4
generate_media(source_media_urls=[image_url, video_url])
Order matters: image first, video second.

AutoCaptions — transcribe a video into word-level captions

Synchronous — captions return inside the response.

1
get_upload_url(model_id="auto-caption")
Same pattern: hand the link to the user.
2
get_upload_session(token) → total_video_seconds
Read duration straight from the session.
3
estimate_generation_cost(model="auto-caption", reference_video_seconds=…)
Quote per-second cost up front.
4
generate_captions(media_url=…)
Sync call — returns word-level JSON in the response (no polling).

Tips

Get it right the first time

Always preview cost before generating

Per-duration models (Music Separation, AutoCaptions, motion control, P-Video, Seedance 2 with reference video) bill by reference seconds. Call estimate_generation_cost first, quote the number, wait for go-ahead.

Bias low on polling cadence

First poll uses estimated_time_seconds from get_models, then every 5s. Polling every second wastes rate-limit budget for the same finish time.

Browser uploads → exact billing

Files uploaded via get_upload_url get duration probed client-side. billing_readiness:"exact". External URLs fall back to a pessimistic 15s default.

Scopes are tool-level

get_models needs no scope. generate_* needs generate:media / generate:music / generate:speech / generate:ads. read:balance and read:generations cover queries.