Quick connect

    Add Kubeez to Cursor in one click

    OAuth opens in your browser — no JSON to edit, no token to paste.

    MCP overview

    The Kubeez MCP server lets AI assistants like Claude, Cursor, and ChatGPT use Kubeez tools natively — no SDK install, no API integration code. One connection, every creative tool.

    #What you can do

    • Images & video — 40+ models for text-to-image, image-to-image, text-to-video, image-to-video, motion control, and lip-sync.
    • Music — full songs with vocals or instrumental, plus optional custom lyrics, style, and BPM.
    • Dialogue — multi-speaker text-to-speech with 80+ voices.
    • Captions — word-level timestamped transcripts from any video.
    • Audio separation — split a track into vocals + instrumental stems.
    • Ad creatives — multiple ad variants from a reference ad and your product.

    #Connect

    Use this URL when adding the server in your MCP client:

    https://mcp.kubeez.com/mcp
    

    Cursor users: hit the Add to Cursor button at the top of this page — it uses Cursor's official install flow.

    #Sign in

    MethodWhen to use
    OAuth (recommended)Cursor, Claude Desktop, and most clients open a browser, you sign in to Kubeez, the client stores the token. Nothing to paste.
    Personal access tokenClients that only accept an API key (e.g. ChatGPT custom GPTs). Generate one in Settings → MCP, paste it as the Bearer / API key. Same permissions as OAuth, revoke any time.

    If your client supports custom headers, use Authorization: Bearer <token> with a personal access token.

    #How it works

    1. Add the server URL in your MCP client.
    2. Sign in to Kubeez (OAuth) and approve the capabilities the client requests.
    3. The client gets a token and can call tools — generations use your Kubeez credits.

    All generation tools are async: start a job, poll until it finishes. The client handles polling automatically; the How it works page explains the flow if you're building one yourself.

    #Requirements

    • A Kubeez account with credits.
    • Any MCP-compatible client (Claude Desktop, Cursor, Claude.ai, ChatGPT custom GPTs, your own integration).

    For raw HTTP / API key access (servers, scripts, mobile backends), use the REST API instead.


    Next: Quick start — connect and run your first generation in 5 minutes.

    Real tool sequences

    How AI assistants chain these tools

    Music separation — split a song into vocals + instrumental

    Audio separation with exact, duration-based billing.

    1. 1
      get_upload_url(model_id="mvsep-40")

      Returns a temp browser link. Hand it to the user verbatim.

    2. 2
      get_upload_session(token)

      Returns media_urls + total_audio_seconds + billing_readiness:"exact".

    3. 3
      estimate_generation_cost(model="mvsep-40", reference_audio_seconds=…)

      Quote the price. No credits deducted.

    4. 4
      generate_separation(media_url=…) → get_separation_status

      First poll ~20s, then every 5s. Returns vocals + instrumental URLs.

    Motion control — animate a character from a video

    Two uploads, ordered, billed by reference video duration.

    1. 1
      get_upload_url(model_id="kling-3-0-motion-control-720p")

      Link enforces 1 image + 1 video.

    2. 2
      get_upload_session(token)

      Ordered media_urls + total_video_seconds.

    3. 3
      estimate_generation_cost(model="…motion-control-720p", reference_video_seconds=…)

      Confirm before charging.

    4. 4
      generate_media(source_media_urls=[image_url, video_url])

      Order matters: image first, video second.

    AutoCaptions — transcribe a video into word-level captions

    Synchronous — captions return inside the response.

    1. 1
      get_upload_url(model_id="auto-caption")

      Same pattern: hand the link to the user.

    2. 2
      get_upload_session(token) → total_video_seconds

      Read duration straight from the session.

    3. 3
      estimate_generation_cost(model="auto-caption", reference_video_seconds=…)

      Quote per-second cost up front.

    4. 4
      generate_captions(media_url=…)

      Sync call — returns word-level JSON in the response (no polling).

    Tips

    Get it right the first time

    Always preview cost before generating

    Per-duration models (Music Separation, AutoCaptions, motion control, P-Video, Seedance 2 with reference video) bill by reference seconds. Call estimate_generation_cost first, quote the number, wait for go-ahead.

    Bias low on polling cadence

    First poll uses estimated_time_seconds from get_models, then every 5s. Polling every second wastes rate-limit budget for the same finish time.

    Browser uploads → exact billing

    Files uploaded via get_upload_url get duration probed client-side. billing_readiness:"exact". External URLs fall back to a pessimistic 15s default.

    Scopes are tool-level

    get_models needs no scope. generate_* needs generate:media / generate:music / generate:speech / generate:ads. read:balance and read:generations cover queries.