Skip to main content

Error Response Format

Errors return a JSON body with an error object and an HTTP status code:
{
  "error": {
    "message": "Invalid API key provided",
    "type": "authentication_error",
    "code": null,
    "param": null
  }
}
The type field is the machine-readable discriminator.
typeMeaning
authentication_errorMissing or invalid API key.
invalid_request_errorMalformed request body or unsupported parameter.
model_not_foundThe requested model id is unavailable.
upstream_errorThe upstream provider failed or timed out.

Status Codes

StatusMeaning
400Bad request, unknown model, or invalid parameter.
401Missing or invalid API key.
403Forbidden, for example insufficient credits.
429Rate limited.
500Gateway server error.
502Upstream provider unavailable.
503Service temporarily unavailable.

SDK Handling

import os
import time
from openai import OpenAI, AuthenticationError, RateLimitError, APIError

client = OpenAI(
    base_url="https://inference.phala.com/v1",
    api_key=os.environ["API_KEY"],
)

try:
    response = client.chat.completions.create(
        model="phala/qwen3.5-27b",
        messages=[{"role": "user", "content": "Hello"}],
    )
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limited; retry with backoff")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")

Retry Policy

Retry only transient errors:
  • Retry 429, 500, 502, and 503 with exponential backoff.
  • Do not retry 400 or 401 until you fix the request or key.
  • If a confidential upstream verification fails, treat it as a failed security condition, not a normal retry loop, unless the API response documents it as transient.
from openai import RateLimitError, APIError

def with_retry(call, max_retries=3):
    for attempt in range(max_retries):
        try:
            return call()
        except (RateLimitError, APIError) as e:
            status = getattr(e, "status_code", 500)
            if status in (429, 500, 502, 503) and attempt < max_retries - 1:
                time.sleep(2 ** attempt)
                continue
            raise

Common Cases

Confirm the Authorization: Bearer <API_KEY> header, check for whitespace, and create a fresh key from the Phala dashboard if needed.
List valid model ids with GET /v1/models. Do not assume a model id exists until it appears in the catalog.
Some models support max_tokens; others require max_completion_tokens. Check supported_parameters in /v1/models.
Back off and retry. For sustained high volume, use dedicated models or dedicated GPU TEE capacity.
Retry after backoff or choose another model. For sensitive prompts, confirm the replacement model returns upstream.verified.result = verified.

Best Practices

  • Branch on HTTP status and error.type, not message text.
  • Keep API keys in environment variables or a secret manager.
  • Never log request bodies for sensitive prompts.
  • Verify x-receipt-id when the response is security-sensitive.