Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.phala.com/llms.txt

Use this file to discover all available pages before exploring further.

Endpoint

POST https://api.redpill.ai/v1/chat/completions
Creates a response for a chat conversation. Use the same OpenAI-compatible request shape you already use with the OpenAI SDK, then set the base URL to https://api.redpill.ai/v1.

Request Body

model
string
required
Model ID to use for completion.Examples: phala/qwen3.5-27b, phala/gemma-3-27b-it, z-ai/glm-5, openai/gpt-oss-120b.
messages
array
required
Conversation messages. Each message includes role and content.
[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Explain GPU TEE in one paragraph."}
]
temperature
number
Sampling temperature. Typical range is 0 to 2.
max_tokens
integer
Maximum number of output tokens for most open models and GPU TEE models.
max_completion_tokens
integer
Maximum output tokens for newer OpenAI reasoning models that do not accept max_tokens.
stream
boolean
Set to true to receive server-sent event chunks.
tools
array
Function/tool definitions that supported models can call.
tool_choice
string | object
Controls whether the model may call tools. Common values are auto, none, or a specific tool selection object.
response_format
object
Requests structured output from supported models, including JSON schema mode.

Examples

curl https://api.redpill.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "phala/qwen3.5-27b",
    "messages": [
      {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
    ]
  }'

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "phala/qwen3.5-27b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "GPU TEE protects inference by..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 48,
    "total_tokens": 64
  }
}
The id field is the request ID. Use it with Request Signature when you need cryptographic proof for this specific response.

Feature Notes

  • Streaming uses the same stream: true option as the OpenAI API.
  • Vision models accept multimodal content arrays with image_url entries.
  • Tool calling uses OpenAI-compatible tools, tool_choice, assistant tool_calls, and tool response messages.
  • Structured output uses response_format on supported models.

Next Steps

List Models

Discover available Confidential AI models and capabilities

Verify Responses

Fetch the signature for a chat completion response