Multi-Character Dialog

Render a scene with multiple speaking creatures from a single text-to-dialogue call to ElevenLabs (eleven_v3). The server slices the joint audio per creature using forced alignment, assembles a 17-channel WAV with each creature on its audio_channel lane, and builds a multi-track Animation with neutral idle poses during silent turns. Async job — returns 202 with a job_id and reports progress over the WebSocket.

POST /api/v1/animation/dialog — Submit a dialog scene for rendering. Provide either inline turns or a saved script_id (not both). persistence is "adhoc" (TTL-cleaned, like ad-hoc speech) or "permanent" (lives under the sound library forever). Set autoplay true to interrupt the active scene and play immediately on completion.

// Inline turns
{
  "turns": [
    { "creature_id": "uuid", "text": "Hello there!" },
    { "creature_id": "uuid", "text": "[whispering] You're late." }
  ],
  "persistence": "adhoc",
  "autoplay": false
}

// Or render a saved script
{
  "script_id": "uuid",
  "persistence": "permanent",
  "autoplay": true
}

Dialog Scripts

Saved, editable dialog scenes. A script’s turns are snapshot onto each rendered Animation’s metadata (copy-on-write), so old renders stay readable even after the script is edited or deleted.

GET /api/v1/animation/dialog/script — List all saved scripts, newest first by updated_at.

GET /api/v1/animation/dialog/script/{scriptId} — Fetch one by its UUID.

POST /api/v1/animation/dialog/script — Create a new script. Server stamps the UUID + timestamps. Lenient parser — extras are silently ignored so a round-tripped DTO from the client doesn’t get rejected.

PUT /api/v1/animation/dialog/script/{scriptId} — Replace an existing script. created_at is preserved; updated_at bumps to now. 404 if no script exists at that id.

POST /api/v1/animation/dialog/script/validate — Shape-only validation without saving. Returns {valid, error_messages, missing_creature_ids} — never throws, so client forms can render inline errors without exception handling.

DELETE /api/v1/animation/dialog/script/{scriptId} — Delete a script. Animations rendered from it stay playable — they carry the CoW snapshot of turns in their metadata.

Dialog Preview

Inspect what a render will sound + look like without committing to a job. Generations are cached on disk so repeating the same turns is cheap.

POST /api/v1/animation/dialog/preview/meta — Generate (or load from cache) a preview. Returns cache_key, generation_id, cached flag, audio_url to fetch the mono WAV, voice_segments, and forced-alignment word/char timings.

GET /api/v1/animation/dialog/preview/audio/{cache_key}/{filename} — Stream the cached mono WAV for an <audio> element. URL comes from the audio_url on a /meta response.

POST /api/v1/animation/dialog/preview/multichannel — Return the assembled 17-channel WAV — for downloading into Audacity to inspect each creature’s lane. Same cache semantics as /meta.

POST /api/v1/animation/dialog/preview/lookup — Cheap cache-only lookup. Returns the list of cached generations (newest first) for a set of turns, or 404 if nothing is cached. UI uses this to badge a “Render” button as fast (cached) vs slow (will hit ElevenLabs).