Rate Limits
Understand API rate limiting and best practices for managing your request volume on AletheionGuard.
Overview
AletheionGuard implements rate limiting to ensure fair usage and maintain service quality for all users. Rate limits are applied per IP address and vary by endpoint.
Key Concept: Rate limits are enforced using the SlowAPI library with a sliding window algorithm that tracks your requests over time.
Rate Limits by Endpoint
Each API endpoint has its own rate limit applied per IP address:
| Endpoint | Rate Limit | Description |
|---|---|---|
| POST /v1/audit | 100/minute | Single response auditing |
| POST /v1/batch | 20/minute | Batch auditing (up to 100 items) |
| POST /v1/compare | 50/minute | Model comparison |
| POST /v1/calibrate | 100/minute | Calibration feedback |
| GET /health | No limit | Health check endpoint |
Note: Rate limits are currently applied per IP address globally. Future versions may introduce API key-based rate limiting with customizable tiers.
Rate Limit Exceeded (429)
When you exceed the rate limit, the API returns a 429 status code:
Important: When you receive a 429 error, wait at least 60 seconds before making another request. Implement exponential backoff to avoid being temporarily blocked.
Best Practices
1. Implement Exponential Backoff
When you receive a 429 response, wait before retrying. Use exponential backoff to progressively increase wait times.
2. Use Batch Endpoints
Process multiple items in a single request using the batch endpoint to reduce total API calls.
3. Cache Results
Cache audit results for identical inputs to reduce API calls.
4. Distribute Requests Over Time
Spread requests evenly instead of bursting them all at once.
Technical Details
Rate Limiting Implementation
- •Library: SlowAPI (Flask-Limiter port for FastAPI)
- •Algorithm: Sliding window with in-memory storage (Redis support available)
- •Key Function: Rate limits applied per IP address (
get_remote_address) - •Window Duration: 1 minute (60 seconds)
- •Response: HTTP 429 with error message when limit exceeded
Redis Backend: For production deployments with multiple workers, configure Redis by setting the REDIS_URL environment variable. This enables distributed rate limiting across all instances.
Frequently Asked Questions
Do batch requests count as one request or multiple?
Batch requests count as a single request against your rate limit, regardless of how many items are in the batch (up to 100 items max). This makes batching very efficient for bulk operations.
Are rate limits applied per API key or per IP address?
Rate limits are currently applied per IP address. All requests from the same IP address share the same rate limit pool, regardless of API key. Future versions may introduce API key-based rate limiting.
What happens if I use a proxy or load balancer?
If you're behind a proxy or load balancer, make sure it forwards the original client IP address via the X-Forwarded-For or X-Real-IP headers. Otherwise, all requests will appear to come from the proxy's IP and share the same rate limit.
Can I request higher rate limits?
Currently, rate limits are fixed per endpoint. For custom rate limits or dedicated infrastructure, contact us about enterprise deployment options. We can configure custom rate limits for self-hosted or dedicated instances.
How long does the rate limit window last?
Rate limits use a 60-second sliding window. This means if you hit 100 requests at timestamp 0, you'll need to wait until timestamp 60 before the first request expires from the window. The sliding window is more flexible than fixed windows.