Phala Cloud Documentation — Confidential AI on TEE

Endpoint

POST https://api.redpill.ai/v1/chat/completions

Creates a response for a chat conversation. Use the same OpenAI-compatible request shape you already use with the OpenAI SDK, then set the base URL to https://api.redpill.ai/v1.

Request Body

model

string

required

Model ID to use for completion.Examples: phala/qwen3.5-27b, phala/gemma-3-27b-it, z-ai/glm-5, openai/gpt-oss-120b.

messages

array

required

Conversation messages. Each message includes role and content.

[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Explain GPU TEE in one paragraph."}
]

temperature

number

Sampling temperature. Typical range is 0 to 2.

max_tokens

integer

Maximum number of output tokens for most open models and GPU TEE models.

max_completion_tokens

integer

Maximum output tokens for newer OpenAI reasoning models that do not accept max_tokens.

stream

boolean

Set to true to receive server-sent event chunks.

tools

array

Function/tool definitions that supported models can call.

tool_choice

string | object

Controls whether the model may call tools. Common values are auto, none, or a specific tool selection object.

response_format

object

Requests structured output from supported models, including JSON schema mode.

Examples

curl https://api.redpill.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "phala/qwen3.5-27b",
    "messages": [
      {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
    ]
  }'

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "phala/qwen3.5-27b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "GPU TEE protects inference by..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 48,
    "total_tokens": 64
  }
}

The id field is the request ID. Use it with Request Signature when you need cryptographic proof for this specific response.

Feature Notes

Streaming uses the same stream: true option as the OpenAI API.
Vision models accept multimodal content arrays with image_url entries.
Tool calling uses OpenAI-compatible tools, tool_choice, assistant tool_calls, and tool response messages.
Structured output uses response_format on supported models.

Phala Cloud

Chat Completions

Endpoint

Request Body

Examples

Response

Feature Notes

Next Steps

List Models

Verify Responses

Phala Cloud

Documentation Index

​Endpoint

​Request Body

​Examples

​Response

​Feature Notes

​Next Steps

List Models

Verify Responses

Endpoint

Request Body

Examples

Response

Feature Notes

Next Steps