> ## Documentation Index > Fetch the complete documentation index at: https://docs.phala.com/llms.txt > Use this file to discover all available pages before exploring further. # Chat Completions > Create OpenAI-compatible chat completion responses with Confidential AI models. ## Endpoint ```bash theme={"system"} POST https://inference.phala.com/v1/chat/completions ``` Creates a response for a chat conversation. Use the same OpenAI-compatible request shape you already use with the OpenAI SDK, then set the base URL to `https://inference.phala.com/v1`. ## Request Body Model ID to use for completion. Examples: `phala/qwen3.5-27b`, `phala/gemma-3-27b-it`, `z-ai/glm-5`, `openai/gpt-oss-120b`. Conversation messages. Each message includes `role` and `content`. ```json theme={"system"} [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain GPU TEE in one paragraph."} ] ``` Sampling temperature. Typical range is `0` to `2`. Maximum number of output tokens for most open models and GPU TEE models. Maximum output tokens for newer OpenAI reasoning models that do not accept `max_tokens`. Set to `true` to receive server-sent event chunks. Function/tool definitions that supported models can call. Controls whether the model may call tools. Common values are `auto`, `none`, or a specific tool selection object. Requests structured output from supported models, including JSON schema mode. ## Examples ```bash cURL theme={"system"} curl https://inference.phala.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "phala/qwen3.5-27b", "messages": [ {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"} ] }' ``` ```python Python theme={"system"} from openai import OpenAI client = OpenAI( api_key="", base_url="https://inference.phala.com/v1", ) response = client.chat.completions.create( model="phala/qwen3.5-27b", messages=[ {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"} ], ) print(response.choices[0].message.content) ``` ```typescript TypeScript theme={"system"} import OpenAI from "openai"; const client = new OpenAI({ apiKey: "", baseURL: "https://inference.phala.com/v1", }); const response = await client.chat.completions.create({ model: "phala/qwen3.5-27b", messages: [ { role: "user", content: "What privacy guarantees does GPU TEE provide?" }, ], }); console.log(response.choices[0].message.content); ``` ## Response ```json theme={"system"} { "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "phala/qwen3.5-27b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "GPU TEE protects inference by..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 16, "completion_tokens": 48, "total_tokens": 64 } } ``` Raw HTTP responses include verification headers. Use `x-receipt-id` with [Get Receipt](/phala-cloud/confidential-ai/confidential-model/api-reference/receipts) when you need cryptographic proof for this specific response. The response `id` can also be used as a receipt lookup id. | Header | Meaning | | --------------------- | -------------------------------------- | | `x-receipt-id` | Signed receipt id for this response. | | `x-aci-identity` | Attested gateway workload identity. | | `x-aci-keyset-digest` | Digest of the attested gateway keyset. | ## Feature Notes * Streaming uses the same `stream: true` option as the OpenAI API. * Vision models accept multimodal `content` arrays with `image_url` entries. * Tool calling uses OpenAI-compatible `tools`, `tool_choice`, assistant `tool_calls`, and tool response messages. * Structured output uses `response_format` on supported models. ## Next Steps Discover available Confidential AI models and capabilities Fetch the receipt for a chat completion response