> ## Documentation Index
> Fetch the complete documentation index at: https://docs.phala.com/llms.txt
> Use this file to discover all available pages before exploring further.

> Deploy dedicated GPU resources for AI models with hourly pricing. Same models as the API but with dedicated performance.

# Dedicated Models

Dedicated Models let you host large language models on dedicated GPU hardware inside trusted execution environments (TEEs). You get the same pre-deployed models as the [Confidential AI API](/phala-cloud/confidential-ai/confidential-model/confidential-ai-api), but with dedicated resources and hourly pricing instead of per-request billing.

Choose this option when you need predictable performance or cost efficiency for high-volume inference workloads. For custom workloads beyond pre-deployed models, see [GPU TEE](/phala-cloud/confidential-ai/confidential-gpu/deploy-and-verify).

This option uses the same OpenAI-compatible API as API Access, including all advanced features like tool calling and vision. The only difference is dedicated GPU resources with hourly billing instead of shared infrastructure.

## Prerequisites

* Phala Cloud account with sufficient credits
* Basic familiarity with LLM deployment

## Step 1: Deploy your model

### Access the deployment page

Sign in to your Phala Cloud account and select **Confidential AI Models** from the top navigation bar.

<Frame>
  <img src="https://mintcdn.com/phalanetwork-1606097b/416gZMDMREnPDd33/images/confidential-ai/confidential-model/confidential-ai-models-01.png?fit=max&auto=format&n=416gZMDMREnPDd33&q=85&s=bf67d04582e6fe3e1f186c063550b2a3" alt="Confidential AI Models" width="2552" height="1618" data-path="images/confidential-ai/confidential-model/confidential-ai-models-01.png" />
</Frame>

### Choose a model

Locate the model you want to deploy (e.g., `qwen-2.5-7b-instruct`) and click **Deploy** to open the deployment wizard. Review the model description to understand its capabilities. For example, Qwen 2.5 models are instruction-tuned language models optimized for chat and task completion.

<Note>
  Don't see your preferred model? We can add custom models—reach out through [Support](/phala-cloud/support).
</Note>

### Configure deployment

Fill out the deployment form with these details:

**Service Name**: Give your deployment a memorable name (e.g., `qwen-2-5-7b-instruct`). Use the auto-generated name or create your own.

**Node & Image**: Select your preferred deployment location and image version. Available options depend on current capacity (e.g., `gpu1 us-west-1` with `dstack-nvidia-0.5.0`).

**Resource Plan**: Review the hardware allocation for your chosen model. For example, `qwen-2.5-7b-instruct` typically includes:

* H200 GPU with 141 GB VRAM
* 24 vCPUs
* 256 GB RAM
* 40 GB disk
* \$3.20 per hour pricing

**Scheduled Destroy**: Set an auto-destruction date to avoid ongoing charges, or leave empty for manual control.

### Launch the deployment

Review your configuration and click **Deploy** to launch. Provisioning takes a few minutes.

<Frame>
  <img src="https://mintcdn.com/phalanetwork-1606097b/2QC7UYJ801piCkJW/images/confidential-ai/confidential-model/confidential-ai-models-02.png?fit=max&auto=format&n=2QC7UYJ801piCkJW&q=85&s=5a32e53bb7c4e7f59bed347f2c7f9fe2" alt="Deploy LLM Model" width="1674" height="1700" data-path="images/confidential-ai/confidential-model/confidential-ai-models-02.png" />
</Frame>

Monitor the status in the Active Workloads section on the Confidential AI Models page. Your model will progress through these states:

* **Preparing**: Allocating resources
* **Starting**: Launching the model
* **Running**: Ready to use

<Warning>
  Billing starts when the model enters Running status. Remember to stop or schedule destruction when you no longer need the instance.
</Warning>

## Step 2: Access Open WebUI

Once your model reaches Running status, you can access it through Open WebUI, a browser-based chat interface.

### Open the interface

When the workload shows **RUNNING** status, a **Connect UI** button appears under the model card. The button initially reads "Initializing..." while the interface loads. Click it when active to open a new tab.

<Frame>
  <img src="https://mintcdn.com/phalanetwork-1606097b/2QC7UYJ801piCkJW/images/confidential-ai/confidential-model/confidential-ai-models-03.png?fit=max&auto=format&n=2QC7UYJ801piCkJW&q=85&s=130fdaf1dcfa1ebbcbfe237f406ec05a" alt="Model Details Page" width="2742" height="1566" data-path="images/confidential-ai/confidential-model/confidential-ai-models-03.png" />
</Frame>

### Create admin account

On your first visit, create an admin account with your name, email, and password. Click **Create Admin Account** to continue.

<Warning>
  **Critical**: Choose a strong password and store it securely. Because your model runs in a confidential environment, even we (the cloud provider) cannot access or reset your password. If you forget it, you'll lose access to all chat history permanently.
</Warning>

After account creation, click **Okay, Let's Go!** to access the chat interface.

## Step 3: Interact with your model

Open WebUI provides a standard chat interface. Type your questions in the prompt box, manage conversation history in the left sidebar, and use the tools menu (+) for advanced features like file uploads. The interface is intuitive and works like other chat applications you may be familiar with.

<Frame>
  <img src="https://mintcdn.com/phalanetwork-1606097b/2QC7UYJ801piCkJW/images/confidential-ai/confidential-model/confidential-ai-models-04.png?fit=max&auto=format&n=2QC7UYJ801piCkJW&q=85&s=bf209819956e7abdb2301f3464c8350a" alt="Open WebUI Chat Interface" width="2088" height="1426" data-path="images/confidential-ai/confidential-model/confidential-ai-models-04.png" />
</Frame>

## Step 4: Access via API

Beyond the WebUI, you can access your dedicated model programmatically through a REST API.

Navigate to the **Overview** page for your running instance. In the **Inference API** section, click **Quickstart** to see example commands and integration code.

To make API calls, you'll need an API key from Open WebUI. Open the WebUI, navigate to **Settings** → **Account** → **API keys**, and create a new key. Use this key as a Bearer token in your requests.

<Warning>
  Protect your API key like a password. Anyone with the key can access your model.
</Warning>

Here's a quick example for chat completions:

```bash theme={"system"}
curl -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{
    "model": "qwen/qwen-2.5-7b-instruct",
    "messages": [
      {"role": "user", "content": "Hello! Can you tell me a fun fact about space?"}
    ]
  }' \
  https://<base_url>/v1/chat/completions
```

For TEE verification, your instance exposes attestation endpoints:

```bash theme={"system"}
# Get attestation report
curl -H "Authorization: Bearer sk-your-api-key" \
  "https://<base_url>/v1/attestation/report?model=MODEL_NAME"

# Get signature for a chat completion
curl -H "Authorization: Bearer sk-your-api-key" \
  "https://<base_url>/v1/signature/{chat_completion_id}?model=MODEL_NAME&signing_algo=ecdsa"
```

See [Verify Attestation](/phala-cloud/confidential-ai/verify/verify-attestation) and [Verify Signature](/phala-cloud/confidential-ai/verify/verify-signature) for how to use these endpoints to confirm your requests were processed inside a TEE.

## Managing your instance

Return to the Confidential AI Models page anytime to monitor your model:

**Overview button**: Opens a dashboard with resource metrics, logs, and attestation data.

**Connect UI button**: Always launches the Open WebUI for your model.

<Note>
  Billing continues while the instance is running. Stop or schedule destruction when you no longer need the model to avoid ongoing charges.
</Note>

## Troubleshooting

### Connect UI button stuck on "Initializing..."

Wait 2-3 minutes after the model reaches Running status. The interface needs time to fully start. Refresh the Confidential AI Models page if the button doesn't activate.

### Cannot access Open WebUI

Verify your model status is **Running** in the Active Workloads section. Check that you're using the correct URL (should end in `dstack-prod5.phala.network`).

### Forgot admin password

We can't reset your admin password because the model is running in a confidential environment that even the cloud provider doesn't have access to. You will lose all the chat history if you forget your password.

## Next steps

You've deployed a dedicated model and accessed it through Open WebUI! Now you can:

<CardGroup cols={2}>
  <Card icon="shield-check" title="Verify attestation" href="/phala-cloud/confidential-ai/verify/verify-attestation">
    Confirm your model runs in genuine TEE hardware
  </Card>

  <Card icon="microchip" title="GPU TEE" href="/phala-cloud/confidential-ai/confidential-gpu/deploy-and-verify">
    Deploy custom models with full infrastructure control
  </Card>
</CardGroup>
