ZenCreator MCP Docs
/
Connect

Generation tools & models#

This page is the catalog of generation tools — the values you pass as the tool_name argument to zencreator_create_task — and the models each generation tool offers.

Keep two layers distinct (see Concepts):

  • MCP tools are the zencreator_* tools your AI client sees. zencreator_create_task is one of them.
  • Generation tools are the tool_name values you pass into zencreator_create_task — e.g. by_prompt, image_editor, videogen, faceswap, upscaler, lipsync. Each generation tool exposes its own set of models (the model field).

Plainly: zencreator_create_task is an MCP tool; by_prompt is a generation tool you pass to it; SDXL_NSFW is a model you pass to by_prompt.

How to use this catalog#

  1. Pick a generation tool below for the task (text→image, edit, video, faceswap, …).
  2. Pick a model from that tool's list. Models marked (trusted) require a trusted account for NSFW use.
  3. Call zencreator_get_tool_schema for the exact input/output JSON Schema, a prompt-writing guide, and model-selection guidance before submitting.
  4. For per-model prompt conventions (Seedream prose vs. Qwen layout-first vs. Wan structured blocks, text-in-image rules, NSFW phrasing), call zencreator_get_model_prompt_guide.

NSFW and trusted gating#

ZenCreator supports uncensored / NSFW generation. Two account gates apply:

  • nsfw_allowed — adult content enabled on the account (toggle in ZenCreator account settings). If false, enable adult content and retry — do not silently fall back to SFW.
  • is_trusted — whether the account has Trusted Status. It unlocks ZenCreator's extended capabilities (uncensored NSFW generation, 18+ templates and LoRAs, Face Swap tools, more flexible generation) and gates the (trusted) models below plus the trusted-only tools male_undresser, flux_klein_lora, and text_to_video. Trusted status is granted automatically after the account's first successful payment (buying any credit pack in Billing) and is permanent; until then, submitting a trusted-only task fails and wastes credits.

Check both proactively with zencreator_get_me before an NSFW workflow. See Workflows for the full NSFW preflight.

The optional zencreator_craft_prompt sidecar (present on deployments that configure it) can author NSFW / model-specific prompts when the orchestrator itself declines to — it spends no ZenCreator credits.

Choosing & comparing#

Credit cost varies by generation tool, model, resolution, and duration — and it is backend-driven, so it drifts. Never hardcode credit numbers.

  • zencreator_compare_prices — shop across all models of a generation tool, cheapest-first (sweeps resolutions when relevant). Use this whenever the user wants the cheapest option.
  • zencreator_estimate_price — get the exact credit cost of one candidate input. Call this and state the cost to the user before submitting.

Image generation tools#

by_prompt — Text-to-image#

Generate an image from a text prompt, with no input image. The entry point for creation: use it when the user has no source image and describes the picture in words — building a character or content from scratch, concepts, backgrounds, NSFW from a description, quick drafts and final hero shots. Supports fast/quality modes, batches, aspect ratios and body-shape LoRAs.

Three content groups, pick by what you need:

  • Top cloud models, censored (GENERAL_NSFW, NANO_BANANA) — highest quality and realism, but block explicit content.
  • Uncensored but not built for porn (WAN_2_7_IMAGE, QWEN_IMAGE, SEEDREAM_5) — high quality and won't block, but won't create explicit content from scratch; they accurately transform NSFW references you provide.
  • Local, built for explicit NSFW (SDXL_NSFW, FLUX_KLEIN_NSFW) — slightly lower quality and more artifacts, but real explicit capability.

Models:

  • GENERAL_NSFW (default, trusted) — General-purpose workhorse with a good quality/speed/price balance and strong facial likeness; the NSFW version is uncensored. Does not produce explicit anatomy from text alone (it covers it up). Older model — occasional hand/limb artifacts.
  • GENERAL_SFW — The same pipeline, SFW only.
  • SDXL_NSFW (trusted) — Best choice for explicit NSFW anatomy from text alone (it knows anatomy from training). Local model: slightly lower quality, more artifacts. Text-only — does not accept reference images. Renders a fixed ≈2:3 portrait (~1248×1824) and ignores ratio/width/height; pick FLUX_KLEIN_NSFW or GENERAL_NSFW when a specific aspect ratio matters.
  • WAN — Legacy WAN image model; prefer WAN_2_7_IMAGE.
  • WAN_2_7_IMAGE / WAN_2_7_IMAGE_PRO — Modern model with higher quality and detail (Pro = top consistency). Renders bodies, scenes and composition more aesthetically with fewer hand/limb artifacts. Weaker at in-image text. Uncensored, but transforms your NSFW references rather than inventing explicit content.
  • QWEN_IMAGE / QWEN_IMAGE_PRO — Aesthetic results with good facial likeness and few artifacts; great for stylized / illustrative / anime subjects. Pro adds realism. Uncensored; transforms NSFW references.
  • SEEDREAM_5 — Newer generation: better prompt understanding, stronger stylization, better likeness, fewer artifacts. Uncensored; transforms NSFW references.
  • NANO_BANANA — Among the best for realism, and the only model that reliably renders legible in-image text (posters, signage, captions); strong real-world knowledge. Heavily censored — won't produce even mildly suggestive content. Weaker facial likeness.
  • FLUX_KLEIN_NSFW (trusted) — The most advanced local NSFW model: produces explicit content and also works with references — bring a character's face and create an action. Slightly lower quality, occasional artifacts.

by_prompt generates exactly one image per input. For N variants, pass N input objects in the inputs array of one task (do not raise batch_size).

image_editor — The main, most flexible image tool#

Edit and composite existing images by prompt. Bring references, edit and combine them; bring your character and dress them from a reference photo. It offers both SFW and NSFW models, and LoRA presets that extend NSFW capability. Use it to keep a product or object exactly (fabric, pattern, shape) while changing the scene. This is the most capable image tool — and the default for all reference-based generation (use it instead of by_ref).

Models:

  • GENERAL_NSFW (default, trusted) — Universal default, uncensored, good facial likeness; general NSFW edits such as outfit or pose changes. Older model — occasional limb artifacts.
  • NANO_BANANA — High realism and the most precise prompt-driven edits; required for any edit involving in-image text. Heavily censored (no NSFW); weaker likeness.
  • QWEN_IMAGE / QWEN_IMAGE_PRO — Aesthetic, good likeness, few artifacts; Pro adds realism. Uncensored; transforms your NSFW references rather than creating explicit content from scratch.
  • SEEDREAM_5 — Newer than the default: better prompt understanding, stronger stylization, better likeness, fewer artifacts. Uncensored; transforms NSFW references.
  • WAN_2_7_IMAGE / WAN_2_7_IMAGE_PRO — Aesthetic bodies and composition, few artifacts, precise editing; Pro = higher quality. Uncensored; transforms NSFW references. Weaker at in-image text.
  • FLUX_KLEIN_NSFW (trusted) — Local flagship for explicit NSFW with references: it knows anatomy and accepts a face reference. Slightly lower quality, occasional artifacts.
  • FLUX_KLEIN_LORA (trusted) — LoRA presets that extend NSFW capability (including undress presets) and style templates; pass a lora_id.

SDXL_NSFW is intentionally not offered here — it is text-only and cannot accept references. For explicit anatomy on a reference, use FLUX_KLEIN_NSFW.

by_ref — Generate a similar image from a reference#

Bring a reference photo and get a similar one. With GENERAL you can bring a character's face plus a reference photo and get a similar shot featuring your character. With SDXL you bring only a photo and get a similar one.

Legacy / explicit-request-only. For most reference-based work image_editor is more flexible (identity carryover, native aspect-ratio control, multiple references) and is the recommended choice — use by_ref only when explicitly asked for it by name.

Models:

  • SDXL (default) — Local, NSFW-capable. Input is a photo only. Fixed ≈4:5 portrait output (~1392×1752); by_ref has no ratio/width/height inputs.
  • GENERAL — Higher quality and realism; can carry a character's face into a reference-like shot. Fixed ≈ square output.

facegen — Create a face from scratch#

Generate a brand-new face from structured attributes: gender, age, origin/ethnicity, body type, eye/hair/beard color, hairstyle, beard and makeup. No reference image; returns several variants per request. Strength: full parametric control over appearance. There is no free-form prompt — required fields are gender, age, origin; the rest are optional appearance fields. Use it to mint a new persona reference (for likeness of a specific person, use faceswap or by_ref). Niche — used occasionally.

photoshoot — Photoshoot from face + body references#

Bring a photo of the face and a photo of the body (without a face) of your character; the tool runs them through prepared prompt presets and returns a batch of images. Presets are grouped by type, so you can produce a set in a given style or action. The prompt describes the scene (wardrobe, location, pose, lighting, mood) — not the subject, which the references encode. Strength: reproducible, identity-preserving results with no manual prompting — and, unlike by_ref, it honors a hard aspect ratio (pass ratio together with matching width/height).

Bring an image and get the same subject from different camera angles (up to 10). Use it for social-media carousels and a "3D" / product overview of an object or character. There is no prompt — angle variation is automatic; the main dial is the number of images. Niche — used occasionally.

collaber — Two characters in one frame#

Bring two characters and an optional background/location photo; the tool combines them into a single scene (1–4 images). The prompt describes their interaction and the joint scene, not their individual identities (those come from the two references). Strength: keeps both characters' likeness — a convenient preset for collabs and duets.

faceswap — Swap a face on a photo#

Bring the photo where the face should be replaced plus a face photo, and the tool swaps the character. Image only — there is no video face swap.

Models:

  • SDXL (default) — Lowest likeness of the set; fast/cheap baseline.
  • GENERAL — Better likeness, but not always stable.
  • GENERAL_ADVANCED — Improved general swap with the strongest identity preservation.
  • FULL_HEAD — Replaces the entire head, not just the face — use when the target's hairstyle or head shape differs strongly from the source.

undress / male_undresser — Remove clothing#

male_undresser is a 🔒 trusted-only tool — available only to trusted accounts (granted automatically after your first credit purchase). undress needs only nsfw_allowed.

Fully removes clothing from a character. Two variants: undress (default, tuned for female subjects) and male_undresser (Flux Klein edit-LoRA tuned for male anatomy). Both are fully automatic — single input image, no prompt, no parameters. These are convenience presets built on Flux Klein LoRA — the same result is available directly through image_editor with the Flux Klein LoRA presets, including presets that handle paired photos.

flux_klein_lora — Flux Klein with LoRA templates#

🔒 Trusted account required — available only to trusted accounts (granted automatically after your first credit purchase).

Generate or edit images with Flux Klein driven by a LoRA style/undress template. Inputs: image_assets (1–3 reference asset_ids), lora_id (required — the LoRA template id; browse via the templates catalog), an optional short prompt (the LoRA owns the style, so keep tweaks to scene/pose), and an optional ratio (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9). Base price 3 credits.

Standalone equivalent of image_editor with model=FLUX_KLEIN_LORA; prefer image_editor unless you specifically want the dedicated entry point. The undress / male_undresser presets are convenience wrappers over the same Flux Klein LoRA pipeline.

upscaler — Increase image resolution#

Brings an image up to the resolution you choose and restores detail. Best for low-resolution sources that need to be cleaned up and improved; gains are limited on an already-sharp 4K image. There is no prompt — the only meaningful choice is the version.

Versions:

  • basic / basic_safe_face — Baseline upscale; *_safe_face preserves the face.
  • natural_clarity — Cheap and natural-looking, any size.
  • premium_realism / premium_safe_face — Photorealistic detail; *_safe_face preserves the face.
  • ultra_clarity — Maximum detail.

Use a *_safe_face version whenever the image contains a face you need kept faithful; the other versions can subtly alter facial features while sharpening.

upscaler_faceswap — Face swap + upscale (legacy)#

Swaps the face from face_asset onto the person in ref_asset, then upscales the result. Inputs: ref_asset + face_asset (both required) and an optional upscaler_version (basic / premium_realism; default basic). Output: one image. Base price 2 credits.

Legacy combo — prefer image_editor: the shared face-swap-then-upscale pipeline gives insufficient face similarity. For better likeness, use faceswap (or image_editor with a face reference) and chain upscaler for a real resolution bump.


Video generation tools#

videogen — Generate video#

Animate a photo into video (image→video). Cost depends on the model, duration (5–15 s) and resolution (480p–1080p). Supports prompt enhancement, fixed camera, LoRAs and optional audio. videogen is the image-to-video tool — a starting frame (ref_asset) is required on every call; for pure text-to-video (no starting image) use text_to_video.

Quick guide: general content (best price/quality) → Seedance; NSFW / explicit content → [email protected] (trusted) or [email protected] (needs only nsfw_allowed); tasteful content with complex actions → Kling (censored).

Each model declares its own duration and resolution capability — filter against the user's request before suggesting one, and price every candidate with zencreator_compare_prices (which can sweep resolutions cheapest-first).

Wan — best prompt understanding and first-frame animation:

  • [email protected] — Latest line; top prompt understanding and first-frame animation. Continuous 2–15 s, 720p/1080p. (Does not accept a last_frame keyframe — for that use [email protected], seedance_pro, or seedance_v1_5_pro; to go beyond 15 s, chain clips via ref_asset.)
  • [email protected] (trusted) — Wan 2.7 for NSFW; the best choice when you have a first frame (or a frame with an action) to animate. Uncensored. Trusted-only — requires is_trusted.
  • [email protected] / [email protected] — Cheaper and older (flash is even cheaper and faster). Duration 5, 10, or 15 s; 720p/1080p.
  • [email protected] — Sharper motion than 2.2; duration 5 or 10 s; 480p/720p/1080p.
  • [email protected] — Frame-based duration; very flexible; uncensored NSFW base. This is the backend's fallback default when no model is passed (a factual fallback, not a recommendation — prefer Seedance for unspecified general content).
  • [email protected] — Presets with action-trained LoRAs that turn a first frame into a complex action. Includes "Blink" LoRAs: bring any photo of your character and the frame morphs into the desired NSFW action. The easiest option for beginners — no prompt needed, just a photo similar to the example's first frame.

Kling — censored, but animates a first frame well; newer versions cost more and understand prompts and complex actions better. Duration 5 or 10 s, 1080p:

Seedance — uncensored; best price/quality balance for content:

  • seedance_pro_fast — Faster and cheaper, less "smart". Any integer duration 2–12 s.
  • seedance_pro — Pricier and smarter. Any integer duration 2–12 s. Supports a start+end keyframe (last_frame).
  • seedance_v1_5_pro — Best quality and result; joint audio+video, micro-expressions, first+last frame. Continuous 4–12 s.

Grok:

  • [email protected] — Censored; animates a first frame, top image-to-video, always emits native audio. Continuous 1–10 s.

Native audio: models that support it accept generate_audio: true; [email protected] always emits audio. Check each model's capabilities via zencreator_get_tool_schema.

text_to_video — Text-to-video#

🔒 Trusted account required — available only to trusted accounts (granted automatically after your first credit purchase), regardless of whether the request is SFW or NSFW.

Generates a first frame (on Flux Klein, Wan 2.2 or Wan 2.7) and then animates it. Use this when there is no starting image; if you already have a frame, use videogen.

Durations are model-specific — an out-of-set duration bills a 1-credit no-op, so match them exactly:

  • [email protected] (default) — Top quality. Durations 5 / 10 / 15 s (not 8). 720p or 1080p (price ≈ 2.6 credits/s at 720p, ≈ 3.4 credits/s at 1080p — so 5 s = 13/17, 10 s = 26/34, 15 s = 39/51).
  • [email protected] — Budget. Durations 5 / 8 s (not 10/15). Resolution is ignored — flat 10 credits (5 s) / 13 credits (8 s).

video2video — Replace a character in a video#

Transfer motion / replace a character in a video using a reference video or an Instagram/TikTok URL (passed directly — no upload needed); the original soundtrack can be kept. SFW and NSFW variants. resolution (480p / 720p / 1080p) is required.

Modes:

  • kling_2_6_sfw — Handles character replacement best (censored); billed per second.
  • replace_sfw / replace_nsfw — Same character-replacement logic; replace_nsfw is uncensored.
  • animate_sfw / animate_nsfw — Motion transfer / animation; animate_nsfw is uncensored.
  • dreamactor_m2 (trusted) — Same character-replacement logic, uncensored. Trusted-only — requires is_trusted.

The uncensored modes trade some quality and prompt understanding — you may need to change the input (source video or character) to get a good result on the first try. Modes marked (trusted) are available only to trusted accounts.

lipsync — Talking head#

Bring an audio file and a first frame, and get a video in which the character speaks your audio. Up to 35 seconds; JPG/PNG under 5 MB. Use it to voice a character or avatar. Niche — used occasionally.

Models:

  • GENERAL_NSFW (default, trusted) — Specialized lipsync pipeline. Note: this GENERAL_NSFW refers to a different underlying model than GENERAL_NSFW under by_prompt / image_editor; the shared name is a backend artifact. lipsync takes audio + a first-frame portrait image (no video source) and has no text prompt.

video_upscaler — Upscale video#

Bring a medium-quality video and get a sharper, higher-resolution result. Use it as a final polish or to restore low-quality footage. There is no prompt. Niche — used occasionally.

video_merger — Concatenate clips#

Stitch 2–5 video clips into a single video, with a transition between each. No prompt, no model selection — it is the final assembly step after generating individual clips with videogen / text_to_video. Inputs: clips (2–5 items, each an uploaded video asset_id plus its source_duration_sec, with optional trim_start_sec / trim_end_sec to cut the clip), transition (cut / dissolve / fade / slide; default cut), keep_audio (default true), fps (24 or 30; default 30), and width / height (default 1280×720). Base price 1 credit.


See also#

  • Generation MCP toolszencreator_create_task, zencreator_run_and_wait, zencreator_get_tool_schema, zencreator_estimate_price, zencreator_compare_prices, and the rest of the task lifecycle.
  • Concepts — tasks, calls, assets, the tool-vs-model split.
  • Workflows — end-to-end recipes (image this turn, video, NSFW preflight).