Endpoint
https://inference.phala.com/v1.
Request Body
Model ID to use for completion.Examples:
phala/qwen3.5-27b, phala/gemma-3-27b-it, z-ai/glm-5, openai/gpt-oss-120b.Conversation messages. Each message includes
role and content.Sampling temperature. Typical range is
0 to 2.Maximum number of output tokens for most open models and GPU TEE models.
Maximum output tokens for newer OpenAI reasoning models that do not accept
max_tokens.Set to
true to receive server-sent event chunks.Function/tool definitions that supported models can call.
Controls whether the model may call tools. Common values are
auto, none, or a specific tool selection object.Requests structured output from supported models, including JSON schema mode.
Examples
Response
x-receipt-id with Get Receipt when you need cryptographic proof for this specific response. The response id can also be used as a receipt lookup id.
| Header | Meaning |
|---|---|
x-receipt-id | Signed receipt id for this response. |
x-aci-identity | Attested gateway workload identity. |
x-aci-keyset-digest | Digest of the attested gateway keyset. |
Feature Notes
- Streaming uses the same
stream: trueoption as the OpenAI API. - Vision models accept multimodal
contentarrays withimage_urlentries. - Tool calling uses OpenAI-compatible
tools,tool_choice, assistanttool_calls, and tool response messages. - Structured output uses
response_formaton supported models.
Next Steps
List Models
Discover available Confidential AI models and capabilities
Verify Responses
Fetch the receipt for a chat completion response

