ComfyUI Workflow for AI Drama Production: Complete Guide

AI drama production is exploding on TikTok, YouTube Shorts, and Instagram Reels. But without a solid ComfyUI workflow, you're wasting hours on inconsistent characters, janky backgrounds, and outputs that don't match your script. This guide walks through the exact pipeline used by professional AI drama studios to produce 9:16 vertical content at scale.

Why ComfyUI for AI Drama Production?

ComfyUI's node-based architecture gives you granular control over every stage of the pipeline—something Midjourney and DALL·E simply can't match. You need precise character consistency across scenes, controlled lighting, and reproducible workflows. ComfyUI delivers all three.

Key Advantages Over Other Tools

Full pipeline control: Chain image generation, upscaling, and post-processing in one graph
Model flexibility: Swap between SDXL, SD3.5, and Flux without leaving the UI
Reproducibility: Save workflows as JSON files—load, run, done
Cost efficiency: Local inference on a single RTX 3060 12GB produces 4-6 images per minute

Base Model Selection: Start with SDXL

SDXL 1.0 remains the sweet spot for AI drama production. It handles complex prompts with multiple characters and scene elements better than SD1.5, and it runs on consumer GPUs without the memory overhead of Flux. For a production pipeline, you'll want at minimum:

Checkpoints: SDXL 1.0 base + a finetuned realistic model (Juggernaut XL or RealVisXL)
VAE: sdxl_vae.safetensors (default) or a dedicated face-focused VAE
LoRAs: Character-specific LoRAs (300-500 images, 10-16 epochs)

The Core Pipeline: 6 Essential Nodes

1. Text Encoding with Dual CLIP

SDXL uses two CLIP text encoders—one large (OpenCLIP ViT-bigG) and one standard (CLIP ViT-L). Always route your positive and negative prompts through both. The dual encoding is what gives SDXL its superior prompt adherence.

Positive prompt: Detailed scene description with character references
Negative prompt: "deformed, bad anatomy, disfigured, extra limbs, blurry, low quality, watermark, text"

2. KSampler with DPM++ 2M Karras

For consistent outputs across a scene sequence, lock your sampler settings:

Sampler: dpmpp_2m_karras
Scheduler: karras
Steps: 25-30 (above 30 gives diminishing returns on SDXL)
CFG scale: 7.0 (the sweet spot for creative freedom vs prompt adherence)
Denoise: 1.0 for base generation, 0.5-0.7 for img2img refinement

3. Latent Upscaling with 4x-UltraSharp

Raw SDXL output at 1024×1024 needs upscaling for 9:16 vertical format. Build a two-stage upscaler:

First pass: Latent upscale by 1.5x using nearest-exact interpolation
Second pass: RealESRGAN 4x-UltraSharp model for pixel-level detail recovery
Output resolution: 1536×2732 (crop to 1080×1920 for TikTok)

4. ControlNet for Scene Consistency

When you need specific poses or compositions across multiple frames, ControlNet is non-negotiable. The most useful models for drama production:

OpenPose: Lock character poses from a reference frame
Canny: Preserve composition structure when changing scene elements
Depth: Maintain consistent scene layout across camera angle changes
Control weight: 0.5-0.7 for creative control, 0.8-1.0 for strict reproduction

5. IPAdapter for Character Identity

The single biggest pain point in AI drama is character consistency. IPAdapter solves this. Use the plus face model with a reference image of your character:

Input: One consistent reference portrait per character
Weight: 0.6-0.8 (too high limits scene flexibility)
Start at: 0.0 (applied from step 1)
End at: 0.6-0.8 (let the scene prompt take over in later steps)

6. FaceDetailer Node

Faces degrade during upscaling. The FaceDetailer node detects faces at each upscale stage and applies targeted restoration:

Detection model: yolov8n-face.pt (fast enough for real-time use)
Restoration model: GFPGAN v1.4 or CodeFormer (GFPGAN preserves more identity)
Fidelity: 0.5-0.7 (match your character reference)
Box expansion: 32px (prevents hard mask edges)

Character Consistency Workflow (The 3-Step Method)

Step 1: Character Sheet Generation

Before producing any scene, generate a character sheet at 1024×1536:

Three poses: front, three-quarter, side profile
Three expressions: neutral, happy, intense/serious
Two outfits: primary and variant
Seed locking: Generate with 6-8 different seeds, pick the best character interpretation, then lock that seed for every subsequent generation

Step 2: IPAdapter Embedding Extraction

From your chosen character sheet image, extract the IPAdapter embedding once and cache it:

Extract at 768×768 (IPAdapter clip vision encoder's native resolution)
Save to .npy file for reuse across all scenes
This eliminates regeneration inconsistency—every scene uses the exact same character reference

Step 3: Per-Scene Prompt Engineering

Each scene uses a template prompt structure:

[Character description] — pulled from your character sheet metadata
[Action/Pose] — specific to this scene
[Expression] — matching the dialogue emotion
[Setting] — scene background and props
[Lighting] — consistent studio lighting across all scenes
[Camera] — medium shot, close-up, etc.

Example prompt:
"close-up shot of a 35-year-old Chinese businessman in a tailored navy suit, standing in a modern high-rise office overlooking Shanghai skyline at golden hour, professional studio lighting, soft shadows, shallow depth of field, photorealistic, 8k"

Performance Optimization for Local GPUs

Running on consumer hardware? These settings squeeze maximum throughput:

Batch size: 2-4 images per generation (RTX 3060 can handle batch 2 at 1024×1024)
VAE tiling: Enable for resolutions above 1024×1024
Model offloading: Use sequential offload for models over 6GB VRAM
Smart memory: Enable in ComfyUI-Manager settings
XFormers: Install for memory-efficient attention (saves 15-20% VRAM)
Expected throughput: 8-10 images per minute at 768×768, 4-6 at 1024×1024

Production-Ready Folder Structure

Organization is the difference between a demo and a production pipeline. Use this structure:

/workflows/ — One JSON per scene (character generation, scene generation, upscale)
/characters/[name]/ — Character sheets, IPAdapter embeddings, LoRA files
/scenes/[episode]/[scene-number]/ — Raw outputs organized by episode and sequence
/upscaled/ — Final 1080×1920 rendered frames ready for video assembly
/references/ — Mood boards, lighting references, style frames

Common Pitfalls and Fixes

Inconsistent character faces: Your IPAdapter weight is too low. Bump to 0.7 and ensure your reference image has consistent lighting across scenes
Background bleeding into characters: Increase CFG to 7.5-8.0 or add a ControlNet Depth preprocessor at weight 0.4
Bloated workflow graphs: Group related nodes (ControlNet, FaceDetailer, Upscale) into collapsed node groups. Your entire pipeline shouldn't exceed 40-50 visible nodes
Out-of-memory errors: Lower batch size to 1, enable smart memory, and switch to the "normal" VAE decoding mode

Final Tips for Production Speed

Pre-generate all backgrounds separately using img2img from blank scenes—then composite in post
Use the ComfyUI "Save/Load Workflow as API" format for headless batch generation on a headless server
Automate your LoRA training with Kohya SS; train once, reuse across every episode
Aim for 15-20 seconds of screen time per 2-hour training session—about 90-120 frames at 1080×1920

Building a reliable ComfyUI pipeline for AI drama production takes upfront investment, but once the workflow is locked, you're producing consistent, studio-quality frames at 4-6 images per minute on consumer hardware. That's fast enough for real production.

Ready to create your AI drama series? Contact AI Drama Studio for a free consultation.