Overview

The streaming endpoint delivers persona responses as Server-Sent Events (SSE), giving your users a real-time typing experience. Each token arrives as it's generated, so the response appears progressively rather than after a full round-trip.

Streaming is ideal for chat interfaces, live demos, and any context where perceived latency matters. The first token typically arrives within 200–400ms.

Endpoint

POST /personas/prompt/stream

curl -N -X POST https://api.person.run/personas/prompt/stream \
  -H "x-api-key: $PERSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "'$PERSON_TENANT_ID'",
    "personaId": "'$PERSONA_ID'",
    "userPrompt": "Tell me about your favorite project."
  }'

The request body is the same as the synchronous prompt endpoint, minus the mode field. The response uses Content-Type: text/event-stream.

Event types

The stream emits four event types in order. Your client should handle each one.

Event	Data	Description
`ready`	`{ "startedAt": "ISO datetime" }`	`Stream is open and generation is starting.`
`token`	`{ "delta": "string" }`	`A chunk of the response text. Concatenate deltas to build the full response.`
`done`	`{ "sessionId", "response", "modelName", "tokensIn", "tokensOut", "latencyMs" }`	`Generation complete. Contains the full response and usage metadata.`
`error`	`{ "error": "string" }`	`An error occurred. The stream will close after this event.`

Raw SSE format

Each event follows the standard SSE wire format. Here's what the raw stream looks like:

SSE stream

event: ready
data: {"startedAt":"2026-02-20T12:00:00.000Z"}

event: token
data: {"delta":"I "}

event: token
data: {"delta":"remember "}

event: token
data: {"delta":"working on "}

event: token
data: {"delta":"a healthcare dashboard..."}

event: done
data: {"sessionId":"sess-uuid","prompt":"Tell me about your favorite project.","response":"I remember working on a healthcare dashboard...","modelName":"gpt-4.1-mini","tokensIn":842,"tokensOut":156,"latencyMs":2340}

JavaScript client

Use the native EventSource API or a fetch-based approach for more control. Here's an example using fetch that works in both browsers and Node.js 18+:

stream-client.ts

async function streamPersonaResponse(
  personaId: string,
  prompt: string,
  onToken: (delta: string) => void
) {
  const res = await fetch("https://api.person.run/personas/prompt/stream", {
    method: "POST",
    headers: {
      "x-api-key": process.env.PERSON_API_KEY!,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      tenantId: process.env.PERSON_TENANT_ID,
      personaId,
      userPrompt: prompt,
    }),
  });

  if (!res.ok) throw new Error(`Stream failed: ${res.status}`);

  const reader = res.body!.getReader();
  const decoder = new TextDecoder();
  let buffer = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split("\n");
    buffer = lines.pop() ?? "";

    for (const line of lines) {
      if (line.startsWith("data: ")) {
        const payload = JSON.parse(line.slice(6));
        if (payload.delta) onToken(payload.delta);
        if (payload.sessionId) return payload; // done event
      }
    }
  }
}

TipFor React apps, feed each delta into a state setter to get a live typing effect.

Python client

The same approach works in Python using the httpx library with streaming:

stream_client.py

import httpx
import json
import os

def stream_persona_response(persona_id: str, prompt: str):
    with httpx.stream(
        "POST",
        "https://api.person.run/personas/prompt/stream",
        headers={
            "x-api-key": os.environ["PERSON_API_KEY"],
            "Content-Type": "application/json",
        },
        json={
            "tenantId": os.environ["PERSON_TENANT_ID"],
            "personaId": persona_id,
            "userPrompt": prompt,
        },
    ) as response:
        response.raise_for_status()
        for line in response.iter_lines():
            if line.startswith("data: "):
                payload = json.loads(line[6:])
                if "delta" in payload:
                    print(payload["delta"], end="", flush=True)
                if "sessionId" in payload:
                    print()  # newline after done
                    return payload

Async mode (alternative)

If you don't need real-time streaming, you can use the standard prompt endpoint with mode: "async" instead. This queues the prompt as a background job and returns a jobId you can poll, or provide a responseUrl to receive a webhook callback when the result is ready.

Async prompt

curl -X POST https://api.person.run/personas/prompt \
  -H "x-api-key: $PERSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "'$PERSON_TENANT_ID'",
    "personaId": "'$PERSONA_ID'",
    "userPrompt": "Describe your morning routine.",
    "mode": "async",
    "responseUrl": "https://your-app.com/webhooks/persona"
  }'

Response (202)

{
  "jobId": "job-uuid-...",
  "status": "queued",
  "mode": "async",
  "pollUrl": "https://api.person.run/personas/prompt/jobs/job-uuid-...",
  "responseUrl": "https://your-app.com/webhooks/persona",
  "attemptCount": 0
}

Poll the pollUrl to check job status. The job will transition from queued → running → succeeded. Failed jobs retry up to 5 times automatically.