Overview
person.run tracks detailed metrics for every persona interaction. You can monitor conversation volume, response quality, token usage, and latency in real time from the dashboard — or export raw events via the API for custom analysis pipelines.
What's tracked
Every API operation emits a usage event. These events power the dashboard charts and are available for export.
| Event | Tracked data |
|---|---|
persona_create | Tenant, persona seed, timestamp |
persona_prompt | Persona ID, prompt text, response, session ID, tokens in/out, model, latency |
persona_list | Tenant, query parameters, result count |
persona_update | Persona ID, fields changed |
persona_delete | Persona ID, tenant |
persona_timeline_append | Persona ID, memory type, strength, event label |
persona_timeline_supersede | Timeline entry ID, reason |
persona_consistency_check | Persona ID, issue count, issue types |
file_upload | Persona ID, file type, file size |
timeline_reconcile_job | Persona ID, job status |
Dashboard metrics
The dashboard provides an overview of your most important metrics:
- Prompt events — total persona prompt events over the current period.
- Persona count — number of personas created for the tenant.
- Timeline entries — total timeline memories across all personas.
- Token usage — tokens consumed (input and output) across all personas.
Usage tracking and limits
Usage is tracked per tenant and enforced against your plan's limits. Non-credit usage limits return 403 when exceeded.
{
"error": "Usage limit exceeded"
}Prompt requests are credit-primary. If a prompt would exceed your credit balance, the API returns 402 with top-up/upgrade guidance.
{
"error": "Insufficient credits",
"requiredCredits": 1200,
"availableCredits": 300,
"action": "upgrade_or_topup",
"billingPath": "/dashboard/billing"
}Response metadata
Every prompt response (sync, async, and streaming) includes metadata you can use for your own analytics:
| Field | Description |
|---|---|
sessionId | Unique identifier for this prompt/response pair. |
modelName | The AI model used for generation (e.g., gpt-4.1-mini). |
tokensIn | Number of input tokens (prompt + context + memories). |
tokensOut | Number of output tokens (generated response). |
latencyMs | Total generation time in milliseconds. |
For streaming responses, this metadata is included in the done event.
Rate limiting
Rate limits are enforced at two levels to protect the platform and ensure fair usage:
IP-level rate limiting
Mutation endpoints (POST, PUT, PATCH, DELETE) are rate-limited to 300 requests per minute per IP address. This prevents abuse from any single source.
Per-route rate limiting
Each endpoint has its own rate limit scoped to your API key. When exceeded, the response includes a Retry-After header indicating when you can retry.
Async job callbacks
When using async mode for prompts or document ingestion, you can provide a responseUrl to receive a callback when the job completes. Callbacks are delivered via QStash with at-least-once semantics.
{
"kind": "persona.prompt.result",
"jobId": "job-uuid",
"status": "succeeded",
"tenantId": "your-tenant-id",
"personaId": "persona-uuid",
"attemptCount": 1,
"result": {
"sessionId": "session-uuid",
"prompt": "How do you approach design challenges?",
"response": "As a product designer, I..."
},
"error": null,
"completedAt": "2026-02-20T12:00:02.340Z"
}