Streaming Sessions

Streaming sessions enable real-time conversation. Text arrives sentence by sentence (typically from an LLM), and the server pipelines TTS, lip sync generation, and playback so the creature starts talking within a couple of seconds — even while the LLM is still generating the rest of the response.

POST /api/v1/animation/ad-hoc-stream/start — Open a new streaming session. The server loads the creature’s config and prepares for incoming text. Returns a session_id.

{
  "creature_id": "uuid",
  "resume_playlist": false
}

POST /api/v1/animation/ad-hoc-stream/text — Send a sentence to the session. Each chunk kicks off an async pipeline: TTS, lip sync, blend, queue for playback. Returns chunks_received count.

{
  "session_id": "uuid",
  "text": "This is one sentence."
}

POST /api/v1/animation/ad-hoc-stream/finish — Signal that no more text is coming. The playback thread drains the remaining queue. Returns the final animation_id.

{
  "session_id": "uuid"
}