This is the REST API and WebSocket interface for the Creature Server. All endpoints are prefixed with the server’s base URL. The server communicates over HTTP/1.1 and WebSocket on a local LAN.
Most endpoints accept and return application/json. A few accept raw binary audio or raw JSON strings. Async operations return a job ID immediately and report progress over the WebSocket.
Creatures
Manage creature configurations. A creature’s JSON definition file is the source of truth — the database is a cache. Universe assignment is runtime-only state.
GET /api/v1/creature — List all creatures known to the server.
GET /api/v1/creature/{creatureId} — Get a single creature by its UUID.
POST /api/v1/creature — Upsert a creature’s configuration. Accepts a raw creature JSON config string. Returns the creature DTO, or 400 if the config is invalid.
POST /api/v1/creature/validate — Validate a creature configuration without saving it. Useful for checking a config before deploying it to a controller.
POST /api/v1/creature/register — Register a creature controller with a specific universe. This is how a controller announces itself to the server on startup.
{
"creature_config": "<raw creature JSON string>",
"universe": 1
}
PATCH /api/v1/creature/{creatureId}/idle — Enable or disable the idle animation loop for a creature. Returns 409 if the creature isn’t registered to a universe.
{
"enabled": true
}
Animations
Animations are multi-track recordings of servo positions over time. They can be recorded with the joystick and stored permanently, or generated on the fly as ad-hoc animations that expire after 24 hours.
GET /api/v1/animation — List all stored animations.
GET /api/v1/animation/{animationId} — Get a single animation by its MongoDB ObjectId.
POST /api/v1/animation — Create or update a stored animation. Accepts a raw animation JSON string.
DELETE /api/v1/animation/{animationId} — Delete an animation and all of its tracks.
POST /api/v1/animation/play — Play a stored animation on a creature’s universe. If resumePlaylist is true, the active playlist resumes after this animation finishes.
{
"animation_id": "ObjectId string",
"universe": 1,
"resumePlaylist": false
}
POST /api/v1/animation/interrupt — Interrupt the currently playing animation with a new one. Uses the cooperative scheduler — the running animation yields gracefully. Same request format as /animation/play.
Ad-Hoc Animations
Ad-hoc animations are generated on the fly from text. The server calls ElevenLabs for TTS, generates lip sync data, blends it with a body animation, and plays the result. These are stored in a TTL collection and expire after 24 hours.
GET /api/v1/animation/ad-hoc — List all ad-hoc animations currently in the TTL collection.
GET /api/v1/animation/ad-hoc/{animationId} — Get a specific ad-hoc animation by its ID.
POST /api/v1/animation/ad-hoc — Create and immediately play an ad-hoc speech animation. This is an async job — returns 202 with a job ID and reports progress over the WebSocket.
{
"creature_id": "uuid",
"text": "Hello, I'm Beaky!",
"resume_playlist": false
}
POST /api/v1/animation/ad-hoc/prepare — Create an ad-hoc animation but don’t play it yet. Use this when timing matters — prepare ahead of time, then trigger playback manually. Same request format as above. Returns 202 with a job ID; the animation ID is delivered via the WebSocket when the job completes.
POST /api/v1/animation/ad-hoc/play — Play a previously prepared ad-hoc animation.
{
"animation_id": "ObjectId string",
"resume_playlist": false
}
Lip Sync Generation
POST /api/v1/animation/generate-lipsync — Regenerate lip sync data for an existing animation. Runs as an async job, returns 202 with a job ID.
{
"animation_id": "ObjectId string"
}
Streaming Sessions
Streaming sessions enable real-time conversation. Text arrives sentence by sentence (typically from an LLM), and the server pipelines TTS, lip sync generation, and playback so the creature starts talking within a couple of seconds — even while the LLM is still generating the rest of the response.
POST /api/v1/animation/ad-hoc-stream/start — Open a new streaming session. The server loads the creature’s config and prepares for incoming text. Returns a session_id.
{
"creature_id": "uuid",
"resume_playlist": false
}
POST /api/v1/animation/ad-hoc-stream/text — Send a sentence to the session. Each chunk kicks off an async pipeline: TTS, lip sync, blend, queue for playback. Returns chunks_received count.
{
"session_id": "uuid",
"text": "This is one sentence."
}
POST /api/v1/animation/ad-hoc-stream/finish — Signal that no more text is coming. The playback thread drains the remaining queue. Returns the final animation_id.
{
"session_id": "uuid"
}
Multi-Character Dialog
Render a scene with multiple speaking creatures from a single text-to-dialogue call to ElevenLabs (eleven_v3). The server slices the joint audio per creature using forced alignment, assembles a 17-channel WAV with each creature on its audio_channel lane, and builds a multi-track Animation with neutral idle poses during silent turns. Async job — returns 202 with a job_id and reports progress over the WebSocket.
POST /api/v1/animation/dialog — Submit a dialog scene for rendering. Provide either inline turns or a saved script_id (not both). persistence is "adhoc" (TTL-cleaned, like ad-hoc speech) or "permanent" (lives under the sound library forever). Set autoplay true to interrupt the active scene and play immediately on completion.
// Inline turns
{
"turns": [
{ "creature_id": "uuid", "text": "Hello there!" },
{ "creature_id": "uuid", "text": "[whispering] You're late." }
],
"persistence": "adhoc",
"autoplay": false
}
// Or render a saved script
{
"script_id": "uuid",
"persistence": "permanent",
"autoplay": true
}
Dialog Scripts
Saved, editable dialog scenes. A script’s turns are snapshot onto each rendered Animation’s metadata (copy-on-write), so old renders stay readable even after the script is edited or deleted.
GET /api/v1/animation/dialog/script — List all saved scripts, newest first by updated_at.
GET /api/v1/animation/dialog/script/{scriptId} — Fetch one by its UUID.
POST /api/v1/animation/dialog/script — Create a new script. Server stamps the UUID + timestamps. Lenient parser — extras are silently ignored so a round-tripped DTO from the client doesn’t get rejected.
PUT /api/v1/animation/dialog/script/{scriptId} — Replace an existing script. created_at is preserved; updated_at bumps to now. 404 if no script exists at that id.
POST /api/v1/animation/dialog/script/validate — Shape-only validation without saving. Returns {valid, error_messages, missing_creature_ids} — never throws, so client forms can render inline errors without exception handling.
DELETE /api/v1/animation/dialog/script/{scriptId} — Delete a script. Animations rendered from it stay playable — they carry the CoW snapshot of turns in their metadata.
Dialog Preview
Inspect what a render will sound + look like without committing to a job. Generations are cached on disk so repeating the same turns is cheap.
POST /api/v1/animation/dialog/preview/meta — Generate (or load from cache) a preview. Returns cache_key, generation_id, cached flag, audio_url to fetch the mono WAV, voice_segments, and forced-alignment word/char timings.
GET /api/v1/animation/dialog/preview/audio/{cache_key}/{filename} — Stream the cached mono WAV for an <audio> element. URL comes from the audio_url on a /meta response.
POST /api/v1/animation/dialog/preview/multichannel — Return the assembled 17-channel WAV — for downloading into Audacity to inspect each creature’s lane. Same cache semantics as /meta.
POST /api/v1/animation/dialog/preview/lookup — Cheap cache-only lookup. Returns the list of cached generations (newest first) for a set of turns, or 404 if nothing is cached. UI uses this to badge a “Render” button as fast (cached) vs slow (will hit ElevenLabs).
Sounds
Manage sound files stored on the server. These are audio files (MP3, WAV, OGG) that can be played through the creature’s speaker independently of animations.
GET /api/v1/sound — List all stored sound files.
GET /api/v1/sound/{filename} — Download a sound file. Returns binary audio with the appropriate content type (audio/mpeg, audio/wav, or audio/ogg).
GET /api/v1/sound/ad-hoc — List ad-hoc generated sounds (TTS output stored in the TTL collection).
GET /api/v1/sound/ad-hoc/{filename} — Download an ad-hoc sound file. Returns audio/wav.
POST /api/v1/sound/play — Queue a sound file for playback on the creature’s speaker.
{
"file_name": "squawk.wav"
}
POST /api/v1/sound/generate-lipsync — Generate lip sync data from a stored sound file using whisper.cpp. Runs as an async job, returns 202 with a job ID.
{
"sound_file": "hello.wav",
"allow_overwrite": false
}
POST /api/v1/sound/generate-lipsync/upload — Upload a WAV file and generate lip sync data synchronously. Send raw WAV binary as the request body with a filename query parameter. Returns lip sync mouth cue data.
Playlists
Playlists are ordered sequences of animations that play one after another on a universe. They’re useful for setting up a creature to perform a scripted show.
GET /api/v1/playlist — List all playlists.
GET /api/v1/playlist/id/{playlistId} — Get a playlist by its UUID.
POST /api/v1/playlist — Create or update a playlist. Accepts a raw playlist JSON string.
POST /api/v1/playlist/start — Start playing a playlist on a universe.
{
"universe": 1,
"playlist_id": "uuid"
}
POST /api/v1/playlist/stop — Stop the currently running playlist on a universe.
{
"universe": 1
}
GET /api/v1/playlist/status — Get the status of all playlists across all universes.
GET /api/v1/playlist/status/{universe} — Get the playlist status for a specific universe.
Voice
Voice generation via ElevenLabs. Each creature has its own voice settings in its definition file.
GET /api/v1/voice/list-available — List all voices available from ElevenLabs.
GET /api/v1/voice/subscription — Get the current ElevenLabs API subscription status (remaining characters, tier, etc.).
POST /api/v1/voice — Generate a sound file from text using a specific voice.
{
"text": "Polly wants a cracker!",
"voice_name": "Beaky",
"creature_id": "uuid (optional)"
}
Speech-to-Text
Transcription powered by whisper.cpp. The Creature Listener uses this to offload transcription from the Pi 5 to the server.
POST /api/v1/stt/transcribe — Transcribe raw audio to text. Accepts 16kHz mono float32 PCM audio as a raw binary request body.
// Response
{
"transcript": "Hey Beaky, what's for dinner?",
"audio_duration_sec": 2.5,
"transcription_time_ms": 340
}
Fixtures
DMX lighting fixtures — moving heads, color washes, and other E1.31-addressable devices on the animatronic network. Each fixture has a set of named channels, an optional persisted universe assignment, and a library of stored patterns that can be triggered ad-hoc or wired to bindings.
GET /api/v1/fixture — List all DMX fixtures known to the server.
GET /api/v1/fixture/{fixtureId} — Get a single fixture by its UUID.
POST /api/v1/fixture — Upsert a fixture’s configuration. Accepts a raw fixture JSON config string. Required fields: id, name, type, channel_offset, channels.
POST /api/v1/fixture/validate — Validate a fixture config payload without saving it.
DELETE /api/v1/fixture/{fixtureId} — Delete a fixture and any state attached to it.
Universe Assignment
Unlike creatures (whose universe is runtime-only), a fixture’s universe is persisted on the document so the server can stream DMX to it across restarts. The assignment is mirrored into a runtime lookup map for fast frame dispatch.
PUT /api/v1/fixture/{fixtureId}/universe — Assign the fixture to an E1.31 universe. Valid values are in [1, 63999].
{
"universe": 1
}
DELETE /api/v1/fixture/{fixtureId}/universe — Clear a fixture’s universe assignment.
Pattern Playback
Patterns are stored channel-value snapshots with optional fade-in / hold / fade-out timing. They can be wired to bindings, fired manually for testing, or previewed live from the Creature Console pattern editor without saving. Both endpoints require the fixture to have an assigned universe.
POST /api/v1/fixture/{fixtureId}/pattern/{patternId}/trigger — Manually fire a stored pattern, bypassing the binding match. Body is optional; if present, stop_after_ms must be in (0, 600000] (10 minutes max).
{
"stop_after_ms": 2000
}
POST /api/v1/fixture/{fixtureId}/pattern/preview — Fire a one-shot, not-persisted pattern built from the request body. Used by the Creature Console pattern editor’s Fire button to preview unsaved edits without an upsert round-trip. Refused with 400 if a live session is active on the fixture.
{
"values": [
{ "channel": "red", "value": 255 },
{ "channel": "green", "value": 128 },
{ "channel": "blue", "value": 0 }
],
"fade_in_ms": 250,
"hold_ms": 1000,
"fade_out_ms": 500,
"stop_after_ms": 2000
}
Live Control
Live control bypasses patterns and bindings to drive a fixture’s channels directly — intended for slider-driven tuning in the Creature Console. The active pattern (if any) is cancelled immediately on the first live call, and new patterns are refused on this fixture until the live session expires. The server enforces an auto-blackout deadline so a disconnected client can’t leave lights stuck on.
POST /api/v1/fixture/{fixtureId}/live — Write per-channel DMX values directly. Channels not named in values retain their previous live value (or default to 0 on the first call). timeout_ms is required and must be in (0, 600000].
{
"values": [
{ "channel": "pan", "value": 127 },
{ "channel": "tilt", "value": 64 }
],
"timeout_ms": 5000
}
Storyboards
A storyboard is a card of visual tiles the Creature Console can tap to do things. The server is a dumb persistence layer — it stamps id and timestamps, stores the document, broadcasts a cache-invalidation, and does not interpret tiles[].action. The client owns the action type vocabulary and the server preserves unknown shapes verbatim so old and new clients can interoperate as the vocabulary grows.
GET /api/v1/storyboard — List all storyboards (newest first by updated_at). Returns {count, items: [...]}.
GET /api/v1/storyboard/{id} — Fetch one storyboard by its UUID.
POST /api/v1/storyboard — Create a new storyboard. Server generates the UUID and stamps created_at / updated_at. Any id / created_at / updated_at the client sends in the body is silently ignored.
{
"title": "Halloween Front Porch",
"notes": "Beaky greets; Mango heckles.",
"tiles": [
{
"id": "uuid",
"x": 0.06, "y": 0.08, "width": 0.26, "height": 0.20,
"label": "Greet",
"sf_symbol": "hand.wave.fill",
"tint_color_hex": "#34C759",
"action": { "type": "ad_hoc_speech", "creature_id": "uuid", "resume_playlist": true }
}
]
}
PUT /api/v1/storyboard/{id} — Replace an existing storyboard. created_at is preserved; updated_at bumps to now. 404 if no storyboard exists at that id.
DELETE /api/v1/storyboard/{id} — Delete a storyboard.
Caps: title ≤ 256 chars, notes ≤ 16384 chars, ≤ 200 tiles, tile label ≤ 256 chars, tile id must be UUID-shaped. Tile action (when present) must be a JSON object — the server checks that and nothing else inside, on purpose.
Metrics
GET /api/v1/metric/counters — Get system performance counters — frames processed, events dispatched, WebSocket messages sent, etc.
Debug
Utility endpoints for development and debugging. These trigger cache invalidation messages on connected clients.
GET /api/v1/debug/cache-invalidate/creature — Broadcast a creature cache invalidation to all connected clients.
GET /api/v1/debug/cache-invalidate/animation — Broadcast an animation cache invalidation to all connected clients.
GET /api/v1/debug/cache-invalidate/playlist — Broadcast a playlist cache invalidation to all connected clients.
GET /api/v1/debug/playlist/update — Test playlist update broadcast to connected clients.
System
GET /api/v1/health — Health check endpoint. Returns the canonical envelope: {"status": "ok", "code": 200, "message": "Server is operational", "session_id": null}. See Error Envelope for the shape used by every non-entity JSON response.
WebSocket
The WebSocket provides real-time bidirectional communication between the server and its clients (the Creature Console, Creature Controller, etc.).
GET /api/v1/websocket — Upgrade to a WebSocket connection.
Client → Server Messages
- Notice — General notice messages from clients
- StreamFrame — DMX frame data for streaming playback
- BoardSensorReport — Board-level sensor data from a Raspberry Pi (temperature, voltage, etc.)
- MotorSensorReport — Motor sensor data from a Raspberry Pi (current draw, position feedback, etc.)
Server → Client Messages
- Database — Database change notifications
- LogMessage — Server log messages
- ServerCounters — Periodic system metrics
- VirtualStatusLights — Status light state updates (the virtual version of the physical LEDs from the Pi hat era)
- UpsertCreature — Creature configuration change notifications
- CacheInvalidation — Cache invalidation signals. The
cache_typefield is one ofcreature,animation,playlist,sound-list,ad-hoc-animation-list,ad-hoc-sound-list,fixture,dialog-script-list, orstoryboard-list - PlaylistStatus — Playlist state changes
- JobProgress — Progress updates for async jobs (lip sync generation, ad-hoc animations, dialog renders)
- JobComplete — Job completion notifications with results
- IdleStateChanged — Idle loop enable/disable notifications
- CreatureActivity — Creature activity reports (what each creature is currently doing)
Status Codes
The server uses these HTTP status codes consistently:
- 200 — Success
- 202 — Accepted (async job started, check WebSocket for progress)
- 400 — Bad Request (invalid input — client’s fault)
- 403 — Forbidden (path traversal attempt on file endpoints)
- 404 — Not Found
- 409 — Conflict (e.g., creature not registered to a universe)
- 422 — Unprocessable Entity (missing required fields for processing)
- 500 — Internal Server Error
Error Envelope
Every JSON response that isn’t a typed entity — every 4xx, every 5xx, and a few 2xx (DELETE confirmations, health) — uses the canonical StatusDto shape:
{
"status": "ok", // "ok" for 2xx, "not_found" for 404, "error" otherwise
"code": 200, // matches the HTTP status code
"message": "Storyboard deleted",
"session_id": null // only set for playback endpoints that returned a session
}
The status field is one of "ok", "error", or "not_found" (all lowercase). Clients can use it as a cheap discriminator without parsing the numeric code.