Streaming sessions enable real-time conversation. Text arrives sentence by sentence (typically from an LLM), and the server pipelines TTS, lip sync generation, and playback so the creature starts talking within a couple of seconds — even while the LLM is still generating the rest of the response.
POST /api/v1/animation/ad-hoc-stream/start — Open a new streaming session. The server loads the creature’s config and prepares for incoming text. Returns a session_id.
{
"creature_id": "uuid",
"resume_playlist": false
}
POST /api/v1/animation/ad-hoc-stream/text — Send a sentence to the session. Each chunk kicks off an async pipeline: TTS, lip sync, blend, queue for playback. Returns chunks_received count.
{
"session_id": "uuid",
"text": "This is one sentence."
}
POST /api/v1/animation/ad-hoc-stream/finish — Signal that no more text is coming. The playback thread drains the remaining queue. Returns the final animation_id.
{
"session_id": "uuid"
}