Rate Limiting¶
Control API usage with flexible quotas. Hector supports multi-window rate limiting with token and request count tracking.
Overview¶
Rate limiting protects:
- Your budget - Cap LLM token usage per user/session
- Your infrastructure - Prevent request floods
- Fair usage - Ensure equitable access in multi-tenant scenarios
Configuration¶
Rate limiting is configured via environment variables (not YAML):
export HECTOR_RATE_LIMIT_ENABLED=true
export HECTOR_RATE_LIMIT_SCOPE=user
export HECTOR_RATE_LIMIT_BACKEND=sql
export HECTOR_RATE_LIMIT_LIMITS='[
{"type": "token", "window": "day", "limit": 100000},
{"type": "count", "window": "minute", "limit": 60}
]'
Limit Types¶
| Type | Tracks | Use Case |
|---|---|---|
token |
LLM tokens consumed | Cost control, billing |
count |
Request count | DDoS protection, throttling |
Time Windows¶
| Window | Duration | Use Case |
|---|---|---|
minute |
60 seconds | Burst protection |
hour |
60 minutes | Short-term limits |
day |
24 hours | Daily quotas |
week |
7 days | Weekly budgets |
month |
30 days | Billing cycles |
Scopes¶
| Scope | Behavior |
|---|---|
session |
Each session has independent quotas |
user |
All sessions for a user share quotas |
Storage Backends¶
| Backend | Config | Use Case |
|---|---|---|
memory |
Default | Single instance, dev/test |
sql |
HECTOR_RATE_LIMIT_BACKEND=sql |
Production, multi-instance |
Note: Memory backend resets on restart. Use SQL for production.
Configuration Reference¶
| Variable | Default | Description |
|---|---|---|
HECTOR_RATE_LIMIT_ENABLED |
false |
Enable rate limiting |
HECTOR_RATE_LIMIT_SCOPE |
session |
session or user |
HECTOR_RATE_LIMIT_BACKEND |
memory |
memory or sql |
HECTOR_RATE_LIMIT_LIMITS |
[] |
JSON array of limit configs |
HECTOR_RATE_LIMIT_IP_HEADERS |
- | Headers for client IP (e.g., CF-Connecting-IP) |
Limit Object Schema¶
{
"type": "token", // or "count"
"window": "day", // minute, hour, day, week, month
"limit": 100000 // maximum value
}
Example: Production Setup¶
# Enable rate limiting with SQL backend
export HECTOR_RATE_LIMIT_ENABLED=true
export HECTOR_RATE_LIMIT_BACKEND=sql
export HECTOR_RATE_LIMIT_SCOPE=user
# Multi-layer limits
export HECTOR_RATE_LIMIT_LIMITS='[
{"type": "count", "window": "minute", "limit": 60},
{"type": "count", "window": "hour", "limit": 500},
{"type": "token", "window": "day", "limit": 500000},
{"type": "token", "window": "month", "limit": 5000000}
]'
# Cloudflare IP detection
export HECTOR_RATE_LIMIT_IP_HEADERS="CF-Connecting-IP,X-Forwarded-For"
API Response¶
When a limit is exceeded, clients receive:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 45
{
"error": "rate limit exceeded",
"limit_type": "token",
"window": "day",
"limit": 100000,
"used": 100000,
"reset_at": "2026-01-21T00:00:00Z"
}