Generation tools & models#
This page is the catalog of generation tools — the values you pass as the
tool_name argument to zencreator_create_task — and the
models each generation tool offers.
Keep two layers distinct (see Concepts):
- MCP tools are the
zencreator_*tools your AI client sees.zencreator_create_taskis one of them. - Generation tools are the
tool_namevalues you pass intozencreator_create_task— e.g.by_prompt,image_editor,videogen,faceswap,upscaler,lipsync. Each generation tool exposes its own set of models (themodelfield).
Plainly: zencreator_create_task is an MCP tool; by_prompt is a generation tool you
pass to it; SDXL_NSFW is a model you pass to by_prompt.
How to use this catalog#
- Pick a generation tool below for the task (text→image, edit, video, faceswap, …).
- Pick a model from that tool's list. Models marked (trusted) require a trusted account for NSFW use.
- Call
zencreator_get_tool_schemafor the exact input/output JSON Schema, a prompt-writing guide, and model-selection guidance before submitting. - For per-model prompt conventions (Seedream prose vs. Qwen layout-first vs. Wan
structured blocks, text-in-image rules, NSFW phrasing), call
zencreator_get_model_prompt_guide.
NSFW and trusted gating#
ZenCreator supports uncensored / NSFW generation. Two account gates apply:
nsfw_allowed— adult content enabled on the account (toggle in ZenCreator account settings). Iffalse, enable adult content and retry — do not silently fall back to SFW.is_trusted— whether the account has Trusted Status. It unlocks ZenCreator's extended capabilities (uncensored NSFW generation, 18+ templates and LoRAs, Face Swap tools, more flexible generation) and gates the (trusted) models below plus the trusted-only toolsmale_undresser,flux_klein_lora, andtext_to_video. Trusted status is granted automatically after the account's first successful payment (buying any credit pack in Billing) and is permanent; until then, submitting a trusted-only task fails and wastes credits.
Check both proactively with zencreator_get_me before an NSFW
workflow. See Workflows for the full NSFW preflight.
The optional
zencreator_craft_promptsidecar (present on deployments that configure it) can author NSFW / model-specific prompts when the orchestrator itself declines to — it spends no ZenCreator credits.
Choosing & comparing#
Credit cost varies by generation tool, model, resolution, and duration — and it is backend-driven, so it drifts. Never hardcode credit numbers.
zencreator_compare_prices— shop across all models of a generation tool, cheapest-first (sweeps resolutions when relevant). Use this whenever the user wants the cheapest option.zencreator_estimate_price— get the exact credit cost of one candidate input. Call this and state the cost to the user before submitting.
Image generation tools#
by_prompt — Text-to-image#
Generate an image from a text prompt, with no input image. The entry point for creation: use it when the user has no source image and describes the picture in words — building a character or content from scratch, concepts, backgrounds, NSFW from a description, quick drafts and final hero shots. Supports fast/quality modes, batches, aspect ratios and body-shape LoRAs.
Three content groups, pick by what you need:
- Top cloud models, censored (
GENERAL_NSFW,NANO_BANANA) — highest quality and realism, but block explicit content. - Uncensored but not built for porn (
WAN_2_7_IMAGE,QWEN_IMAGE,SEEDREAM_5) — high quality and won't block, but won't create explicit content from scratch; they accurately transform NSFW references you provide. - Local, built for explicit NSFW (
SDXL_NSFW,FLUX_KLEIN_NSFW) — slightly lower quality and more artifacts, but real explicit capability.
Models:
GENERAL_NSFW(default, trusted) — General-purpose workhorse with a good quality/speed/price balance and strong facial likeness; the NSFW version is uncensored. Does not produce explicit anatomy from text alone (it covers it up). Older model — occasional hand/limb artifacts.GENERAL_SFW— The same pipeline, SFW only.SDXL_NSFW(trusted) — Best choice for explicit NSFW anatomy from text alone (it knows anatomy from training). Local model: slightly lower quality, more artifacts. Text-only — does not accept reference images. Renders a fixed ≈2:3 portrait (~1248×1824) and ignoresratio/width/height; pickFLUX_KLEIN_NSFWorGENERAL_NSFWwhen a specific aspect ratio matters.WAN— Legacy WAN image model; preferWAN_2_7_IMAGE.WAN_2_7_IMAGE/WAN_2_7_IMAGE_PRO— Modern model with higher quality and detail (Pro = top consistency). Renders bodies, scenes and composition more aesthetically with fewer hand/limb artifacts. Weaker at in-image text. Uncensored, but transforms your NSFW references rather than inventing explicit content.QWEN_IMAGE/QWEN_IMAGE_PRO— Aesthetic results with good facial likeness and few artifacts; great for stylized / illustrative / anime subjects. Pro adds realism. Uncensored; transforms NSFW references.SEEDREAM_5— Newer generation: better prompt understanding, stronger stylization, better likeness, fewer artifacts. Uncensored; transforms NSFW references.NANO_BANANA— Among the best for realism, and the only model that reliably renders legible in-image text (posters, signage, captions); strong real-world knowledge. Heavily censored — won't produce even mildly suggestive content. Weaker facial likeness.FLUX_KLEIN_NSFW(trusted) — The most advanced local NSFW model: produces explicit content and also works with references — bring a character's face and create an action. Slightly lower quality, occasional artifacts.
by_prompt generates exactly one image per input. For N variants, pass N input objects in the
inputsarray of one task (do not raisebatch_size).
image_editor — The main, most flexible image tool#
Edit and composite existing images by prompt. Bring references, edit and combine them;
bring your character and dress them from a reference photo. It offers both SFW and NSFW
models, and LoRA presets that extend NSFW capability. Use it to keep a product or object
exactly (fabric, pattern, shape) while changing the scene. This is the most capable
image tool — and the default for all reference-based generation (use it instead of
by_ref).
Models:
GENERAL_NSFW(default, trusted) — Universal default, uncensored, good facial likeness; general NSFW edits such as outfit or pose changes. Older model — occasional limb artifacts.NANO_BANANA— High realism and the most precise prompt-driven edits; required for any edit involving in-image text. Heavily censored (no NSFW); weaker likeness.QWEN_IMAGE/QWEN_IMAGE_PRO— Aesthetic, good likeness, few artifacts; Pro adds realism. Uncensored; transforms your NSFW references rather than creating explicit content from scratch.SEEDREAM_5— Newer than the default: better prompt understanding, stronger stylization, better likeness, fewer artifacts. Uncensored; transforms NSFW references.WAN_2_7_IMAGE/WAN_2_7_IMAGE_PRO— Aesthetic bodies and composition, few artifacts, precise editing; Pro = higher quality. Uncensored; transforms NSFW references. Weaker at in-image text.FLUX_KLEIN_NSFW(trusted) — Local flagship for explicit NSFW with references: it knows anatomy and accepts a face reference. Slightly lower quality, occasional artifacts.FLUX_KLEIN_LORA(trusted) — LoRA presets that extend NSFW capability (including undress presets) and style templates; pass alora_id.
SDXL_NSFWis intentionally not offered here — it is text-only and cannot accept references. For explicit anatomy on a reference, useFLUX_KLEIN_NSFW.
by_ref — Generate a similar image from a reference#
Bring a reference photo and get a similar one. With GENERAL you can bring a character's
face plus a reference photo and get a similar shot featuring your character. With
SDXL you bring only a photo and get a similar one.
Legacy / explicit-request-only. For most reference-based work
image_editoris more flexible (identity carryover, native aspect-ratio control, multiple references) and is the recommended choice — useby_refonly when explicitly asked for it by name.
Models:
SDXL(default) — Local, NSFW-capable. Input is a photo only. Fixed ≈4:5 portrait output (~1392×1752); by_ref has noratio/width/heightinputs.GENERAL— Higher quality and realism; can carry a character's face into a reference-like shot. Fixed ≈ square output.
facegen — Create a face from scratch#
Generate a brand-new face from structured attributes: gender, age, origin/ethnicity, body
type, eye/hair/beard color, hairstyle, beard and makeup. No reference image; returns
several variants per request. Strength: full parametric control over appearance. There is
no free-form prompt — required fields are gender, age, origin; the rest are optional
appearance fields. Use it to mint a new persona reference (for likeness of a specific
person, use faceswap or by_ref). Niche — used occasionally.
photoshoot — Photoshoot from face + body references#
Bring a photo of the face and a photo of the body (without a face) of your character; the
tool runs them through prepared prompt presets and returns a batch of images. Presets are
grouped by type, so you can produce a set in a given style or action. The prompt
describes the scene (wardrobe, location, pose, lighting, mood) — not the subject, which
the references encode. Strength: reproducible, identity-preserving results with no manual
prompting — and, unlike by_ref, it honors a hard aspect ratio (pass ratio together with
matching width/height).
carousel — Multiple camera angles of one subject#
Bring an image and get the same subject from different camera angles (up to 10). Use it for social-media carousels and a "3D" / product overview of an object or character. There is no prompt — angle variation is automatic; the main dial is the number of images. Niche — used occasionally.
collaber — Two characters in one frame#
Bring two characters and an optional background/location photo; the tool combines them into
a single scene (1–4 images). The prompt describes their interaction and the joint
scene, not their individual identities (those come from the two references). Strength: keeps
both characters' likeness — a convenient preset for collabs and duets.
faceswap — Swap a face on a photo#
Bring the photo where the face should be replaced plus a face photo, and the tool swaps the character. Image only — there is no video face swap.
Models:
SDXL(default) — Lowest likeness of the set; fast/cheap baseline.GENERAL— Better likeness, but not always stable.GENERAL_ADVANCED— Improved general swap with the strongest identity preservation.FULL_HEAD— Replaces the entire head, not just the face — use when the target's hairstyle or head shape differs strongly from the source.
undress / male_undresser — Remove clothing#
male_undresseris a 🔒 trusted-only tool — available only to trusted accounts (granted automatically after your first credit purchase).undressneeds onlynsfw_allowed.
Fully removes clothing from a character. Two variants: undress (default, tuned for female
subjects) and male_undresser (Flux Klein edit-LoRA tuned for male anatomy). Both are fully
automatic — single input image, no prompt, no parameters. These are convenience presets
built on Flux Klein LoRA — the same result is available directly through image_editor
with the Flux Klein LoRA presets, including presets that handle paired photos.
flux_klein_lora — Flux Klein with LoRA templates#
🔒 Trusted account required — available only to trusted accounts (granted automatically after your first credit purchase).
Generate or edit images with Flux Klein driven by a LoRA style/undress template. Inputs:
image_assets (1–3 reference asset_ids), lora_id (required — the LoRA template id;
browse via the templates catalog), an optional short prompt (the LoRA owns the style, so
keep tweaks to scene/pose), and an optional ratio (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2,
21:9). Base price 3 credits.
Standalone equivalent of
image_editorwithmodel=FLUX_KLEIN_LORA; preferimage_editorunless you specifically want the dedicated entry point. Theundress/male_undresserpresets are convenience wrappers over the same Flux Klein LoRA pipeline.
upscaler — Increase image resolution#
Brings an image up to the resolution you choose and restores detail. Best for
low-resolution sources that need to be cleaned up and improved; gains are limited on an
already-sharp 4K image. There is no prompt — the only meaningful choice is the version.
Versions:
basic/basic_safe_face— Baseline upscale;*_safe_facepreserves the face.natural_clarity— Cheap and natural-looking, any size.premium_realism/premium_safe_face— Photorealistic detail;*_safe_facepreserves the face.ultra_clarity— Maximum detail.
Use a
*_safe_faceversion whenever the image contains a face you need kept faithful; the other versions can subtly alter facial features while sharpening.
upscaler_faceswap — Face swap + upscale (legacy)#
Swaps the face from face_asset onto the person in ref_asset, then upscales the result.
Inputs: ref_asset + face_asset (both required) and an optional upscaler_version
(basic / premium_realism; default basic). Output: one image. Base price 2 credits.
Legacy combo — prefer
image_editor: the shared face-swap-then-upscale pipeline gives insufficient face similarity. For better likeness, usefaceswap(orimage_editorwith a face reference) and chainupscalerfor a real resolution bump.
Video generation tools#
videogen — Generate video#
Animate a photo into video (image→video). Cost depends on the model, duration (5–15 s) and
resolution (480p–1080p). Supports prompt enhancement, fixed camera, LoRAs and optional
audio. videogen is the image-to-video tool — a starting frame (ref_asset) is
required on every call; for pure text-to-video (no starting image) use text_to_video.
Quick guide: general content (best price/quality) → Seedance; NSFW / explicit
content → [email protected] (trusted) or [email protected] (needs only nsfw_allowed);
tasteful content with complex actions → Kling (censored).
Each model declares its own duration and resolution capability — filter against the user's
request before suggesting one, and price every candidate with zencreator_compare_prices
(which can sweep resolutions cheapest-first).
Wan — best prompt understanding and first-frame animation:
[email protected]— Latest line; top prompt understanding and first-frame animation. Continuous 2–15 s, 720p/1080p. (Does not accept alast_framekeyframe — for that use[email protected],seedance_pro, orseedance_v1_5_pro; to go beyond 15 s, chain clips viaref_asset.)[email protected](trusted) — Wan 2.7 for NSFW; the best choice when you have a first frame (or a frame with an action) to animate. Uncensored. Trusted-only — requiresis_trusted.[email protected]/[email protected]— Cheaper and older (flashis even cheaper and faster). Duration 5, 10, or 15 s; 720p/1080p.[email protected]— Sharper motion than 2.2; duration 5 or 10 s; 480p/720p/1080p.[email protected]— Frame-based duration; very flexible; uncensored NSFW base. This is the backend's fallback default when no model is passed (a factual fallback, not a recommendation — prefer Seedance for unspecified general content).[email protected]— Presets with action-trained LoRAs that turn a first frame into a complex action. Includes "Blink" LoRAs: bring any photo of your character and the frame morphs into the desired NSFW action. The easiest option for beginners — no prompt needed, just a photo similar to the example's first frame.
Kling — censored, but animates a first frame well; newer versions cost more and understand prompts and complex actions better. Duration 5 or 10 s, 1080p:
[email protected]— Latest Kling, top motion and physics, plus native audio (dialogue, effects).[email protected]— High quality, cheaper, consistent at volume.[email protected]— Stable motion. Supports a start+end keyframe (last_frame).[email protected]— Legacy, lowest cost.
Seedance — uncensored; best price/quality balance for content:
seedance_pro_fast— Faster and cheaper, less "smart". Any integer duration 2–12 s.seedance_pro— Pricier and smarter. Any integer duration 2–12 s. Supports a start+end keyframe (last_frame).seedance_v1_5_pro— Best quality and result; joint audio+video, micro-expressions, first+last frame. Continuous 4–12 s.
Grok:
[email protected]— Censored; animates a first frame, top image-to-video, always emits native audio. Continuous 1–10 s.
Native audio: models that support it accept
generate_audio: true;[email protected]always emits audio. Check each model's capabilities viazencreator_get_tool_schema.
text_to_video — Text-to-video#
🔒 Trusted account required — available only to trusted accounts (granted automatically after your first credit purchase), regardless of whether the request is SFW or NSFW.
Generates a first frame (on Flux Klein, Wan 2.2 or Wan 2.7) and then animates it. Use this
when there is no starting image; if you already have a frame, use videogen.
Durations are model-specific — an out-of-set duration bills a 1-credit no-op, so match them exactly:
[email protected](default) — Top quality. Durations 5 / 10 / 15 s (not 8). 720p or 1080p (price ≈ 2.6 credits/s at 720p, ≈ 3.4 credits/s at 1080p — so 5 s = 13/17, 10 s = 26/34, 15 s = 39/51).[email protected]— Budget. Durations 5 / 8 s (not 10/15). Resolution is ignored — flat 10 credits (5 s) / 13 credits (8 s).
video2video — Replace a character in a video#
Transfer motion / replace a character in a video using a reference video or an
Instagram/TikTok URL (passed directly — no upload needed); the original soundtrack can be
kept. SFW and NSFW variants. resolution (480p / 720p / 1080p) is required.
Modes:
kling_2_6_sfw— Handles character replacement best (censored); billed per second.replace_sfw/replace_nsfw— Same character-replacement logic;replace_nsfwis uncensored.animate_sfw/animate_nsfw— Motion transfer / animation;animate_nsfwis uncensored.dreamactor_m2(trusted) — Same character-replacement logic, uncensored. Trusted-only — requiresis_trusted.
The uncensored modes trade some quality and prompt understanding — you may need to change the input (source video or character) to get a good result on the first try. Modes marked (trusted) are available only to trusted accounts.
lipsync — Talking head#
Bring an audio file and a first frame, and get a video in which the character speaks your audio. Up to 35 seconds; JPG/PNG under 5 MB. Use it to voice a character or avatar. Niche — used occasionally.
Models:
GENERAL_NSFW(default, trusted) — Specialized lipsync pipeline. Note: thisGENERAL_NSFWrefers to a different underlying model thanGENERAL_NSFWunderby_prompt/image_editor; the shared name is a backend artifact. lipsync takes audio + a first-frame portrait image (no video source) and has no text prompt.
video_upscaler — Upscale video#
Bring a medium-quality video and get a sharper, higher-resolution result. Use it as a final polish or to restore low-quality footage. There is no prompt. Niche — used occasionally.
video_merger — Concatenate clips#
Stitch 2–5 video clips into a single video, with a transition between each. No prompt, no
model selection — it is the final assembly step after generating individual clips with
videogen / text_to_video. Inputs: clips (2–5 items, each an uploaded video asset_id
plus its source_duration_sec, with optional trim_start_sec / trim_end_sec to cut the
clip), transition (cut / dissolve / fade / slide; default cut), keep_audio
(default true), fps (24 or 30; default 30), and width / height (default 1280×720).
Base price 1 credit.
See also#
- Generation MCP tools —
zencreator_create_task,zencreator_run_and_wait,zencreator_get_tool_schema,zencreator_estimate_price,zencreator_compare_prices, and the rest of the task lifecycle. - Concepts — tasks, calls, assets, the tool-vs-model split.
- Workflows — end-to-end recipes (image this turn, video, NSFW preflight).