> ## Documentation Index > Fetch the complete documentation index at: https://docs.phala.com/llms.txt > Use this file to discover all available pages before exploring further. > Use Confidential AI through an OpenAI-compatible API, dedicated model deployments, or custom GPU TEE infrastructure, with attestation reports and signed receipts you can verify. # Overview Confidential AI overview - run AI models with hardware-level privacy in GPU TEEs

Confidential AI overview - run AI models with hardware-level privacy in GPU TEEs

## Why Confidential AI? Traditional cloud AI deployments ask you to trust the platform operator and the upstream model provider. Phala Confidential AI runs inference through an attested gateway and verified TEE providers, then gives you cryptographic evidence for each response. The on-demand API is OpenAI-compatible and served through an Attested Confidential Inference (ACI) gateway. Every response includes a signed receipt, and the gateway publishes an attestation report that proves which TEE workload served the request. Confidential model responses additionally show, in the receipt, that the upstream provider was verified and channel-bound before your prompt was forwarded. Phala Cloud offers pre-deployed **Models** (API or Dedicated) for quick integration, and **GPU TEE** for custom infrastructure. See [available models](https://phala.com/confidential-ai-models) for supported models, use cases, and deployment options. **Pre-deployed models, pay per request** Best for quick integration. 5 minute setup with OpenAI-compatible API and no infrastructure management. **Same models, dedicated performance** Best for high-volume workloads. Same API as API Access, but with hourly billing and dedicated GPU resources. **Custom infrastructure, full control** Best for custom models. Rent dedicated GPU TEE servers for training, fine-tuning, or any custom workload. ## Quick Tour of Confidential AI ### Models: API and Dedicated **API Access** provides pre-deployed models through an OpenAI-compatible API at `https://inference.phala.com/v1`. Pay per request with no infrastructure to manage. Start with [API Access](/phala-cloud/confidential-ai/confidential-model/confidential-ai-api). For advanced API features, explore [Tool Calling](/phala-cloud/confidential-ai/confidential-model/tool-calling) to enable LLMs to interact with external tools and APIs securely within TEE. **Dedicated Models** give you the same pre-deployed models but with dedicated GPU resources and hourly pricing. Choose this for predictable performance or high-volume workloads. See [Dedicated Models](/phala-cloud/confidential-ai/confidential-gpu/model-template). ### GPU TEE: Custom Infrastructure For complete infrastructure control beyond pre-deployed models, use [GPU TEE](/phala-cloud/confidential-ai/confidential-gpu/deploy-and-verify) to rent dedicated GPU servers. Run any workload including custom models for inference, training, or fine-tuning. Configure GPU, CPU, RAM, and storage to match your exact needs. ### Verify Attestation and Receipts To verify the API path, fetch a fresh [Attestation Report](/phala-cloud/confidential-ai/confidential-model/api-reference/attestation) from `GET /v1/aci/attestation`. It proves the gateway workload identity, TEE quote, source provenance, and public keyset used to sign receipts. Then use the response `x-receipt-id` header to fetch the [Receipt](/phala-cloud/confidential-ai/confidential-model/api-reference/receipts). The receipt binds request and response hashes to the attested workload and records whether the upstream provider was verified. See [Verify a Response](/phala-cloud/confidential-ai/verify/verify-signature) for the full flow. ### Benchmark Our [benchmark](/phala-cloud/confidential-ai/benchmark) shows GPU TEE mode achieves 99% of native performance on H100/H200 GPUs. ### FAQs Check [FAQs](/phala-cloud/confidential-ai/faqs) for frequently asked questions about Confidential AI. ## What makes Phala Cloud Confidential AI Different? * **Drop-in compatibility**: OpenAI-compatible API with popular models (DeepSeek, Llama, GPT-OSS, Qwen) ready for immediate use * **Verifiable security**: Gateway attestation, signed per-response receipts, and upstream verification events you can check yourself * **Flexible deployment**: Choose from API access (pay per request), dedicated models (hourly dedicated GPU), or GPU TEE (full infrastructure control) ## Open Source Foundation Our underlying technology is open source. Check out the [dstack](https://github.com/Dstack-TEE/dstack) repository to see how LLMs run securely in GPU TEEs.