Overview - Phala

Confidential AI overview - run AI models with hardware-level privacy in GPU TEEs

Why Confidential AI?

Traditional cloud AI deployments expose your models and data to the cloud provider. Confidential AI runs inference, training, and fine-tuning inside GPU TEEs — the cloud provider cannot access your models or data. You get cryptographic proof that execution happened in a trusted environment. Phala Cloud offers two ways to run confidential AI workloads: pre-deployed Models (API or Dedicated) for quick integration, or GPU TEE for custom infrastructure. See available models for supported models, use cases, and deployment options.

API Access

Pre-deployed models, pay per requestBest for quick integration. 5 minute setup with OpenAI-compatible API and no infrastructure management.

Dedicated Models

Same models, dedicated performanceBest for high-volume workloads. Same API as API Access, but with hourly billing and dedicated GPU resources.

GPU TEE

Custom infrastructure, full controlBest for custom models. Rent dedicated GPU TEE servers for training, fine-tuning, or any custom workload.

Quick Tour of Confidential AI

Models: API and Dedicated

API Access provides pre-deployed LLMs with OpenAI-compatible APIs for quick integration. Pay per request with no infrastructure to manage. Start with API Access. For advanced API features, explore Tool Calling to enable LLMs to interact with external tools and APIs securely within TEE. Dedicated Models give you the same pre-deployed models but with dedicated GPU resources and hourly pricing. Choose this for predictable performance or high-volume workloads. See Dedicated Models.

GPU TEE: Custom Infrastructure

For complete infrastructure control beyond pre-deployed models, use GPU TEE to rent dedicated GPU servers. Run any workload including custom models for inference, training, or fine-tuning. Configure GPU, CPU, RAM, and storage to match your exact needs.

Verify Attestation and Signature

To ensure your workloads run securely in TEE, you can Verify Attestation to check the TEE hardware, operating system, source code, and distributed root-of-trust attestations. Then you can Verify Signature to confirm the integrity of your Confidential AI API requests and responses.

Benchmark

Our benchmark shows GPU TEE mode achieves 99% of native performance on H100/H200 GPUs.

FAQs

Check FAQs for frequently asked questions about Confidential AI.

What makes Phala Cloud Confidential AI Different?

Drop-in compatibility: OpenAI-compatible API with popular models (DeepSeek, Llama, GPT-OSS, Qwen) ready for immediate use
Verifiable security: Hardware-enforced privacy with cryptographic attestation proving execution in genuine TEE environments
Flexible deployment: Choose from API access (pay per request), dedicated models (hourly dedicated GPU), or GPU TEE (full infrastructure control)

Open Source Foundation

Our underlying technology is open source. Check out the dstack repository to see how LLMs run securely in GPU TEEs.

On-demand APIGet started quickly with pre-deployed models running in GPU TEE with OpenAI-compatible interface. Pay per request.

Why Confidential AI?
Quick Tour of Confidential AI
Models: API and Dedicated
GPU TEE: Custom Infrastructure
Verify Attestation and Signature
Benchmark
FAQs
What makes Phala Cloud Confidential AI Different?
Open Source Foundation

Phala Cloud

​Why Confidential AI?

API Access

Dedicated Models

GPU TEE

​Quick Tour of Confidential AI

​Models: API and Dedicated

​GPU TEE: Custom Infrastructure

​Verify Attestation and Signature

​Benchmark

​FAQs

​What makes Phala Cloud Confidential AI Different?

​Open Source Foundation