Skip to main content

Chat

Generate a single in-character reply for a persona and a recent message buffer. Phantom retrieves relevant memories from your vector store, injects them into the prompt, runs inbound and outbound moderation, and (on a cadence) extracts new memories in the background.

POST /api/v1/companion/chat
  • Auth: Bearer key with the chat:companion scope.
  • Requires: a configured vector store (otherwise 400).
Build a request in the Playground

Construct a chat request interactively and watch the response come back in the Playground - edit the persona and per-turn options, send a turn, then copy the exact request and response JSON straight into your own code.

Request body

FieldTypeRequiredNotes
chat_idstringOpaque per-user conversation id. Pattern ^[A-Za-z0-9][A-Za-z0-9_.:@-]{0,127}$, max 128 chars. Reuse it across turns to accumulate memory.
personaobjectThe character. See below.
messagesarray1–100 turns. The last message must be role: "user".
modelstring-Override the generation model (1–200 chars).
temperaturenumber-02. Default 0.8.
max_tokensinteger-132000. Defaults to the model's own limit.
extract_memoryboolean-Default true. Set false to skip background memory write-back (retrieval still runs).

persona

Every field describes the companion character (e.g. Luna), not the user. All fields except name are optional free text.

FieldTypeRequiredNotes
namestring1–120 chars. The character's display name.
nicknamestring-Up to 120 chars. What the user calls the character.
genderstring-Up to 200 chars.
date_of_birthstring-Up to 200 chars. Free text — a date, an age, or a range.
languagestring-Up to 200 chars, e.g. "English, Portuguese".
lives_in_countrystring-Up to 200 chars.
lives_in_citystring-Up to 200 chars.
comes_from_countrystring-Up to 200 chars.
comes_from_citystring-Up to 200 chars.
living_situationstring-Up to 2000 chars.
daily_routinestring-Up to 2000 chars.
goalsstring-Up to 2000 chars.
biostring-Up to 4000 chars. Background, personality, backstory.
boundariesstring-Up to 2000 chars. Lines the character will not cross.
chat_stylestring-Up to 80 chars, e.g. "playful, teasing". Slots into the generated preamble.
custom_instructionsstring-Up to 8000 chars. Appended as a strong directive; does not replace the preamble.

The persona is stateless - send it on every turn. Phantom doesn't store it.

messages

Each item is { "role": "user" | "assistant", "content": "..." }, with content 1–16000 chars. Send a rolling window of recent turns; the last one must be from the user.

Example request

curl -X POST https://api.phantomrouter.ai/api/v1/companion/chat \
-H "Authorization: Bearer $PHANTOM_KEY" \
-H "Content-Type: application/json" \
-d '{
"chat_id": "user-42",
"persona": {
"name": "Luna",
"bio": "A warm, witty companion who loves astronomy.",
"chat_style": "playful, curious"
},
"messages": [
{ "role": "user", "content": "Morning! Remember I had that big presentation?" }
],
"temperature": 0.8
}'

Response

{
"reply": "Morning, you! Of course - the big one you were dreading. How'd it land?",
"chat_id": "user-42",
"moderation": { "inbound_prohibited": false, "outbound_filtered": false },
"memory": { "retrieved_count": 3, "profile_used": true },
"extraction_enqueued": true
}
FieldTypeNotes
replystringThe in-character reply.
chat_idstringEchoes the request.
moderation.inbound_prohibitedbooleantrue if the user's message was refused.
moderation.outbound_filteredbooleantrue if the generated reply was replaced by a safe fallback.
moderation.reasonstring?Present when moderation acted.
memory.retrieved_countintegerHow many memories were injected this turn.
memory.profile_usedbooleanWhether a structured profile was injected.
extraction_enqueuedbooleanWhether a background memory-extraction job was queued this turn.

Moderation behavior

Moderation runs inline and returns 200, not an error:

  • Inbound refusal - if the user's message is prohibited, reply is a canned refusal, moderation.inbound_prohibited is true, and nothing is generated or extracted.
  • Outbound filter - if the generated reply is unsafe, it's replaced with a safe fallback and moderation.outbound_filtered is true.

Memory & cadence

Retrieval runs on every turn. Extraction is cadence-gated - by default Phantom extracts new memories every few user turns, not on every message, and the job runs in the background (extraction_enqueued: true). Set extract_memory: false to opt out of write-back for a turn while still retrieving.

Errors

StatusCodeWhen
400INVALID_REQUESTBody failed validation, or no vector store is configured.
401UNAUTHORIZEDMissing or invalid key.
402PAYMENT_REQUIREDInsufficient credit balance.
403FORBIDDENKey lacks chat:companion.
503SERVICE_UNAVAILABLETransient model-provider or vector-store failure. Retry with backoff.