XML Gateway API Documentation

This documentation provides details on how to use the XML Gateway API to interact with OpenAI's language models.

Authentication

The API supports two authentication methods:

1. API Key Authentication

Include your API key in the X-API-Key header:

Example Request Header
X-API-Key: your-api-key-here

2. JWT Token Authentication

First, obtain a JWT token by sending your API key to the /auth/token endpoint:

Request to /auth/token (POST)
{
  "api_key": "your-api-key-here"
}
Response from /auth/token
{
  "access_token": "eyJhbGciOiJIUzI1...",
  "refresh_token": "eyJhbGciOiJIUzI1...",
  "token_type": "Bearer",
  "expires_in": 1800,
  "refresh_expires_in": 86400,
  "user": {
    "user_id": "user_1",
    "name": "User 1",
    "tier": "starter"
  }
}

Then, include the token in the Authorization header:

Example Authorization Header
Authorization: Bearer eyJhbGciOiJIUzI1...

Access tokens expire after 30 minutes. Use the refresh token to obtain a new access token:

Request to /auth/refresh (POST)
{
  "refresh_token": "eyJhbGciOiJIUzI1..."
}

/ask Endpoint (POST)

The main endpoint for sending prompts to OpenAI and receiving responses in XML format.

Sample Request

XML Request Body
<Request>
  <prompt>Who was James Baldwin?</prompt>
  <model>gpt-4o</model>
</Request>

Sample Response (200 OK)

XML Response Body
<Response>
  <answer>James Baldwin was an American writer and activist who explored racial, sexual, and class distinctions in Western society...</answer>
</Response>

Error Response (400 Bad Request)

XML Response Body
<Error>
  <message>Invalid XML structure. Missing 'prompt' element.</message>
</Error>

Rate Limit Response (429 Too Many Requests)

XML Response Body
<Error>
  <message>Rate limit exceeded. Too many requests in a short period. Please wait 30 seconds before trying again.</message>
</Error>

Quota Exceeded Response (403 Forbidden)

XML Response Body
<Error>
  <message>Token limit exceeded. Monthly usage: 105000 tokens. Limit: 100000 tokens. Please upgrade your plan or wait until next month.</message>
</Error>

Rate Limit Headers

All responses include rate limit information in the following headers:

  • X-RateLimit-Limit: Maximum requests per minute
  • X-RateLimit-Remaining: Requests remaining in the current window
  • X-RateLimit-Reset: Seconds until the rate limit resets

Rate-limited responses (429) also include a Retry-After header indicating when to retry.

/test Endpoint (POST)

A test endpoint that accepts the same XML as /ask but returns a static response. Use this for testing without consuming OpenAI tokens.

Sample Response (200 OK)

XML Response Body
<Response>
  <answer>This is a test response. In a production environment, this would be generated by OpenAI's API.</answer>
</Response>

Token Billing & Usage

Token Calculation

The API tracks token usage for both input (prompt) and output (completion) tokens. OpenAI uses tokens to measure usage, where a token is approximately 4 characters or 0.75 words.

Token Estimation

For planning purposes, you can estimate token usage as follows:

  • ~1.3 tokens per character
  • ~0.75 tokens per word

The API automatically tracks actual token usage from the OpenAI API response.

Usage Statistics

You can check your current usage with the /api/usage/summary endpoint:

Request to /api/usage/summary
GET /api/usage/summary?days=7
X-API-Key: your-api-key-here
Response from /api/usage/summary
{
  "usage": {
    "total_tokens": 25000,
    "prompt_tokens": 10000,
    "completion_tokens": 15000,
    "models": {
      "gpt-4o": 20000,
      "gpt-3.5-turbo": 5000
    },
    "period_days": 7
  },
  "limits": {
    "token_limit": 100000,
    "used_tokens": 25000,
    "remaining_tokens": 75000,
    "percentage_used": 25
  }
}

API Tiers & Rate Limits

Starter

$0.002

per 1K tokens
  • 100,000 tokens per month
  • 60 requests per minute
  • Access to all models

Pro

$0.0015

per 1K tokens
  • 1,000,000 tokens per month
  • 300 requests per minute
  • Priority support

Enterprise

Custom

contact for pricing
  • Custom token limits
  • 600 requests per minute
  • SLA & dedicated support