Rate Limits

Understand API rate limiting and best practices for managing your request volume on AletheionGuard.

Overview

AletheionGuard implements rate limiting to ensure fair usage and maintain service quality for all users. Rate limits are applied per IP address and vary by endpoint.

Key Concept: Rate limits are enforced using the SlowAPI library with a sliding window algorithm that tracks your requests over time.

Rate Limits by Endpoint

Each API endpoint has its own rate limit applied per IP address:

EndpointRate LimitDescription
POST /v1/audit100/minuteSingle response auditing
POST /v1/batch20/minuteBatch auditing (up to 100 items)
POST /v1/compare50/minuteModel comparison
POST /v1/calibrate100/minuteCalibration feedback
GET /healthNo limitHealth check endpoint

Note: Rate limits are currently applied per IP address globally. Future versions may introduce API key-based rate limiting with customizable tiers.

Rate Limit Exceeded (429)

When you exceed the rate limit, the API returns a 429 status code:

HTTP/1.1 429 Too Many Requests
{
"error": "Rate limit exceeded: 100 per 1 minute"
}

Important: When you receive a 429 error, wait at least 60 seconds before making another request. Implement exponential backoff to avoid being temporarily blocked.

Best Practices

1. Implement Exponential Backoff

When you receive a 429 response, wait before retrying. Use exponential backoff to progressively increase wait times.

import time
import requests
def request_with_backoff(url, max_retries=5):
for attempt in range(max_retries):
response = requests.post(url, headers=headers, json=data)
if response.status_code == 429:
wait_time = 60 * (2 ** attempt) # Exponential backoff
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
continue
return response

2. Use Batch Endpoints

Process multiple items in a single request using the batch endpoint to reduce total API calls.

# Instead of 100 individual requests (100 req/min limit):
for text in texts:
response = requests.post("/v1/audit", json={"text": text})
# Use batch endpoint (20 req/min limit, but processes 100 items):
response = requests.post(
"/v1/batch",
json={"items": [
{"text": text, "context": ctx}
for text, ctx in zip(texts, contexts)
][:100] }
)

3. Cache Results

Cache audit results for identical inputs to reduce API calls.

import hashlib
import requests
def cache_key(text, context=None):
content = f"{text}:{context or ''}"
return hashlib.sha256(content.encode()).hexdigest()
cache = {}
def audit_with_cache(text, context=None):
key = cache_key(text, context)
if key in cache:
return cache[key] # Cached result
result = requests.post(url, json={"text": text}).json()
cache[key] = result
return result

4. Distribute Requests Over Time

Spread requests evenly instead of bursting them all at once.

import time
def rate_limited_requests(items, requests_per_minute=90):
# Stay below 100/min limit with buffer
delay = 60.0 / requests_per_minute # Seconds between requests
for item in items:
response = requests.post(url, json=item)
yield response
time.sleep(delay) # Throttle requests

Technical Details

Rate Limiting Implementation

  • Library: SlowAPI (Flask-Limiter port for FastAPI)
  • Algorithm: Sliding window with in-memory storage (Redis support available)
  • Key Function: Rate limits applied per IP address (get_remote_address)
  • Window Duration: 1 minute (60 seconds)
  • Response: HTTP 429 with error message when limit exceeded

Redis Backend: For production deployments with multiple workers, configure Redis by setting the REDIS_URL environment variable. This enables distributed rate limiting across all instances.

Frequently Asked Questions

Do batch requests count as one request or multiple?

Batch requests count as a single request against your rate limit, regardless of how many items are in the batch (up to 100 items max). This makes batching very efficient for bulk operations.

Are rate limits applied per API key or per IP address?

Rate limits are currently applied per IP address. All requests from the same IP address share the same rate limit pool, regardless of API key. Future versions may introduce API key-based rate limiting.

What happens if I use a proxy or load balancer?

If you're behind a proxy or load balancer, make sure it forwards the original client IP address via the X-Forwarded-For or X-Real-IP headers. Otherwise, all requests will appear to come from the proxy's IP and share the same rate limit.

Can I request higher rate limits?

Currently, rate limits are fixed per endpoint. For custom rate limits or dedicated infrastructure, contact us about enterprise deployment options. We can configure custom rate limits for self-hosted or dedicated instances.

How long does the rate limit window last?

Rate limits use a 60-second sliding window. This means if you hit 100 requests at timestamp 0, you'll need to wait until timestamp 60 before the first request expires from the window. The sliding window is more flexible than fixed windows.

Next Steps