Overview
The streaming endpoint delivers persona responses as Server-Sent Events (SSE), giving your users a real-time typing experience. Each token arrives as it's generated, so the response appears progressively rather than after a full round-trip.
Streaming is ideal for chat interfaces, live demos, and any context where perceived latency matters. The first token typically arrives within 200–400ms.
Endpoint
curl -N -X POST https://api.person.run/personas/prompt/stream \
-H "x-api-key: $PERSON_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"tenantId": "'$PERSON_TENANT_ID'",
"personaId": "'$PERSONA_ID'",
"userPrompt": "Tell me about your favorite project."
}'The request body is the same as the synchronous prompt endpoint, minus the mode field. The response uses Content-Type: text/event-stream.
Event types
The stream emits four event types in order. Your client should handle each one.
| Event | Data | Description |
|---|---|---|
ready | { "startedAt": "ISO datetime" } | Stream is open and generation is starting. |
token | { "delta": "string" } | A chunk of the response text. Concatenate deltas to build the full response. |
done | { "sessionId", "response", "modelName", "tokensIn", "tokensOut", "latencyMs" } | Generation complete. Contains the full response and usage metadata. |
error | { "error": "string" } | An error occurred. The stream will close after this event. |
Raw SSE format
Each event follows the standard SSE wire format. Here's what the raw stream looks like:
event: ready
data: {"startedAt":"2026-02-20T12:00:00.000Z"}
event: token
data: {"delta":"I "}
event: token
data: {"delta":"remember "}
event: token
data: {"delta":"working on "}
event: token
data: {"delta":"a healthcare dashboard..."}
event: done
data: {"sessionId":"sess-uuid","prompt":"Tell me about your favorite project.","response":"I remember working on a healthcare dashboard...","modelName":"gpt-4.1-mini","tokensIn":842,"tokensOut":156,"latencyMs":2340}JavaScript client
Use the native EventSource API or a fetch-based approach for more control. Here's an example using fetch that works in both browsers and Node.js 18+:
async function streamPersonaResponse(
personaId: string,
prompt: string,
onToken: (delta: string) => void
) {
const res = await fetch("https://api.person.run/personas/prompt/stream", {
method: "POST",
headers: {
"x-api-key": process.env.PERSON_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify({
tenantId: process.env.PERSON_TENANT_ID,
personaId,
userPrompt: prompt,
}),
});
if (!res.ok) throw new Error(`Stream failed: ${res.status}`);
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
if (line.startsWith("data: ")) {
const payload = JSON.parse(line.slice(6));
if (payload.delta) onToken(payload.delta);
if (payload.sessionId) return payload; // done event
}
}
}
}delta into a state setter to get a live typing effect.Python client
The same approach works in Python using the httpx library with streaming:
import httpx
import json
import os
def stream_persona_response(persona_id: str, prompt: str):
with httpx.stream(
"POST",
"https://api.person.run/personas/prompt/stream",
headers={
"x-api-key": os.environ["PERSON_API_KEY"],
"Content-Type": "application/json",
},
json={
"tenantId": os.environ["PERSON_TENANT_ID"],
"personaId": persona_id,
"userPrompt": prompt,
},
) as response:
response.raise_for_status()
for line in response.iter_lines():
if line.startswith("data: "):
payload = json.loads(line[6:])
if "delta" in payload:
print(payload["delta"], end="", flush=True)
if "sessionId" in payload:
print() # newline after done
return payloadAsync mode (alternative)
If you don't need real-time streaming, you can use the standard prompt endpoint with mode: "async" instead. This queues the prompt as a background job and returns a jobId you can poll, or provide a responseUrl to receive a webhook callback when the result is ready.
curl -X POST https://api.person.run/personas/prompt \
-H "x-api-key: $PERSON_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"tenantId": "'$PERSON_TENANT_ID'",
"personaId": "'$PERSONA_ID'",
"userPrompt": "Describe your morning routine.",
"mode": "async",
"responseUrl": "https://your-app.com/webhooks/persona"
}'{
"jobId": "job-uuid-...",
"status": "queued",
"mode": "async",
"pollUrl": "https://api.person.run/personas/prompt/jobs/job-uuid-...",
"responseUrl": "https://your-app.com/webhooks/persona",
"attemptCount": 0
}Poll the pollUrl to check job status. The job will transition from queued → running → succeeded. Failed jobs retry up to 5 times automatically.