Overview - Phala

Confidential AI overview - run AI models with hardware-level privacy in GPU TEEs

Why Confidential AI?

Traditional cloud AI deployments expose your models and data to the cloud provider. Confidential AI addresses this by running everything inside hardware-protected TEE. Your models stay private, your data stays secure, and you get cryptographic proof that execution happened in a trusted environment. Phala Cloud offers two ways to run confidential AI workloads: pre-deployed Models (API or Dedicated) for quick integration, or GPU TEE for custom infrastructure. Explore our Confidential AI Models to see available pre-trained models, use cases, and deployment options.

API Access

Pre-deployed models, pay per requestBest for quick integration. 5 minute setup with OpenAI-compatible API and no infrastructure management.

Dedicated Models

Same models, dedicated performanceBest for high-volume workloads. Same API as API Access, but with hourly billing and dedicated GPU resources.

GPU TEE

Custom infrastructure, full controlBest for custom models. Rent dedicated GPU TEE servers for training, fine-tuning, or any custom workload.

Quick Tour of Confidential AI

Models: API and Dedicated

API Access provides pre-deployed LLMs with OpenAI-compatible APIs for quick integration. Pay per request with no infrastructure to manage. Start with API Access. For advanced API features, explore Tool Calling to enable LLMs to interact with external tools and APIs securely within TEE. Dedicated Models give you the same pre-deployed models but with dedicated GPU resources and hourly pricing. Choose this for predictable performance or high-volume workloads. See Dedicated Models.

GPU TEE: Custom Infrastructure

For complete infrastructure control beyond pre-deployed models, use GPU TEE to rent dedicated GPU servers. Run any workload including custom models for inference, training, or fine-tuning. Configure GPU, CPU, RAM, and storage to match your exact needs.

Verify Attestation and Signature

To ensure your workloads run securely in TEE, you can Verify Attestation to check the TEE hardware, operating system, source code, and distributed root-of-trust attestations. Then you can Verify Signature to confirm the integrity of your Confidential AI API requests and responses.

Benchmark

Our performance benchmark shows TEE mode on H100/H200 GPUs runs up to 99% efficiency, nearly matching native performance. This means you get confidential computing with minimal performance penalty.

FAQs

Check FAQs for frequently asked questions about Confidential AI.

What makes Phala Cloud Confidential AI Different?

Seamless integration: Drop-in OpenAI API compatibility with popular models (DeepSeek, Llama, GPT-OSS, Qwen) ready for immediate use
Verifiable security: Hardware-enforced privacy with cryptographic attestation proving execution in genuine TEE environments
Flexible deployment: Choose from API access (pay per request), dedicated models (hourly dedicated GPU), or GPU TEE (full infrastructure control)

Open Source Foundation

Our underlying technology is open source. Check out the dstack repository to see how LLMs run securely in GPU TEEs.

Phala Cloud

​Why Confidential AI?

API Access

Dedicated Models

GPU TEE

​Quick Tour of Confidential AI

​Models: API and Dedicated

​GPU TEE: Custom Infrastructure

​Verify Attestation and Signature

​Benchmark

​FAQs

​What makes Phala Cloud Confidential AI Different?

​Open Source Foundation