Skip to main content
POST
/
messages
Messages
curl --request POST \
  --url https://api.example.com/messages \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "max_tokens": 123,
  "messages": [
    {}
  ],
  "system": "<string>",
  "temperature": 123,
  "stream": true
}
'
Creates a model response using the Anthropic Messages request and response format. Requests are served through the attested gateway and return an x-receipt-id header for verification.

Endpoint

POST https://inference.phala.com/v1/messages

Request Body

model
string
required
Model id returned by List Models.
max_tokens
integer
required
Maximum number of tokens to generate.
messages
array
required
Conversation turns. Each item has a role (user or assistant) and content.
system
string
System prompt, sent as a top-level field.
temperature
number
Sampling temperature.
stream
boolean
Stream the response as server-sent events.

Example

curl https://inference.phala.com/v1/messages \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "max_tokens": 100,
    "messages": [
      {"role": "user", "content": "Say hello in three words."}
    ]
  }'

Response

{
  "id": "msg_01ABC...",
  "type": "message",
  "role": "assistant",
  "model": "anthropic/claude-sonnet-4.5",
  "content": [
    { "type": "text", "text": "Hello, howdy, hi!" }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": { "input_tokens": 14, "output_tokens": 8 }
}
Read assistant text from content[].text on blocks with type: "text". stop_reason is usually end_turn, max_tokens, stop_sequence, or tool_use.

Verification

Use the response x-receipt-id header with Get Receipt. A confidential response has upstream.verified.result = verified and required = true.

Responses

Trust Boundary