Rate Limiting¶
Control API usage and costs with flexible, multi-layer rate limiting.
Overview¶
Rate limiting protects your system from: - Cost overruns - Limit token usage for LLM APIs - Abuse - Prevent spam and excessive requests - Resource exhaustion - Control load on your system
Hector's rate limiting supports: - ✅ Multi-layer time windows (minute → month) - ✅ Dual tracking (tokens + request counts) - ✅ Per-session or per-user scoping - ✅ SQL or memory storage
Quick Start¶
Add rate limiting to any session store:
session_stores:
  default:
    backend: sql
    sql:
      driver: sqlite
      database: ./sessions.db
    rate_limit:
      enabled: true
      scope: session
      backend: memory
      limits:
        - type: count
          window: minute
          limit: 60
        - type: token
          window: day
          limit: 100000
Configuration¶
Basic Structure¶
rate_limit:
  enabled: true          # Enable rate limiting
  scope: session         # "session" or "user"
  backend: memory        # "memory" or "sql"
  limits:                # List of limits to enforce
    - type: count
      window: minute
      limit: 60
Limit Types¶
count - Request/message count
- type: count
  window: minute
  limit: 60           # Max 60 requests per minute
token - LLM token usage
- type: token
  window: day
  limit: 100000      # Max 100k tokens per day
Time Windows¶
| Window | Duration | Use Case | 
|---|---|---|
minute | 
60 sec | Burst protection | 
hour | 
60 min | Short-term throttling | 
day | 
24 hours | Daily quotas | 
week | 
7 days | Weekly budgets | 
month | 
30 days | Monthly billing | 
Scopes¶
Session Scope - Each session independent
scope: session
User Scope - All sessions share quota
scope: user
Storage Backends¶
Memory Backend - Fast, volatile
backend: memory
SQL Backend - Persistent
backend: sql
Common Patterns¶
Pattern 1: Spam Prevention¶
rate_limit:
  enabled: true
  scope: session
  limits:
    - type: count
      window: minute
      limit: 10
Pattern 2: Cost Control¶
rate_limit:
  enabled: true
  scope: user
  limits:
    - type: token
      window: day
      limit: 50000
    - type: token
      window: month
      limit: 1000000
Pattern 3: Multi-Layer Protection¶
rate_limit:
  enabled: true
  scope: user
  limits:
    - type: count
      window: minute
      limit: 60
    - type: count
      window: hour
      limit: 1000
    - type: token
      window: day
      limit: 100000
Pattern 4: Tiered Limits¶
session_stores:
  free-tier:
    rate_limit:
      enabled: true
      scope: user
      limits:
        - {type: count, window: minute, limit: 10}
        - {type: token, window: day, limit: 10000}
  pro-tier:
    rate_limit:
      enabled: true
      scope: user
      limits:
        - {type: count, window: minute, limit: 100}
        - {type: token, window: day, limit: 500000}
Error Handling¶
When a rate limit is exceeded, the API returns:
{
  "error": "rate limit exceeded",
  "details": {
    "limit_type": "count",
    "window": "minute",
    "current": 60,
    "limit": 60,
    "retry_after": "45s"
  }
}
How It Works¶
Request Flow¶
1. User sends message
   ↓
2. Check rate limits
   ↓
3a. Within limits → Process message
3b. Exceeded → Return 429 error
Token Tracking¶
Tokens are counted after LLM response:
1. Message processed
   ↓
2. LLM returns response (1,234 tokens)
   ↓
3. Record token usage
   ↓
4. Check against limits
Window Management¶
Windows are sliding, not fixed:
Limit: 60 requests/minute
12:00:00 → 12:00:59 : First window
12:00:30 → 12:01:29 : Sliding window
12:01:00 → 12:01:59 : New window
Performance¶
Memory Backend¶
- Lookup: O(1)
 - Storage: In-memory map
 - Capacity: ~10k req/sec
 
SQL Backend¶
- Lookup: O(1) with indexes
 - Storage: Dedicated 
rate_limitstable - Capacity: ~1k req/sec (depends on database)
 
Best Practices¶
- Start conservative - Begin with lower limits
 - Monitor usage - Track actual patterns
 - Use multiple windows - Combine minute + day limits
 - Choose right scope:
 sessionfor spam preventionuserfor cost control- Use SQL in production - For persistence
 - Provide clear feedback - Show retry-after time
 
Architecture¶
SessionService
    ↓ wraps
RateLimitedSessionService
    ↓ checks
RateLimiter
    ↓ queries
Store (Memory or SQL)
    ↓ persists
rate_limits table
SQL Schema¶
CREATE TABLE rate_limits (
    scope VARCHAR(50) NOT NULL,
    identifier VARCHAR(255) NOT NULL,
    limit_type VARCHAR(50) NOT NULL,
    window VARCHAR(50) NOT NULL,
    amount BIGINT NOT NULL,
    window_end TIMESTAMP NOT NULL,
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL,
    PRIMARY KEY (scope, identifier, limit_type, window)
);