Operations¶
This guide covers operating Hector in production: database, task queue, checkpointing, and multi-tenancy.
Database Configuration¶
Hector stores sessions, tasks, checkpoints, and app configurations.
SQLite (Default)¶
Best for single-instance deployments:
hector serve --database "sqlite://.hector/hector.db"
PostgreSQL (Production)¶
Required for multi-instance or HA:
hector serve --database "postgres://user:pass@localhost:5432/hector?sslmode=require"
Schema migrations run automatically on startup.
Task Queue¶
Durable background execution with retry and recovery.
Queue Flow¶
Request → Enqueue → [Worker Pool] → Execute → Complete
↓
Retry on failure
↓
Dead Letter Queue
Configuration¶
| Flag | Default | Description |
|---|---|---|
--queue-workers |
4 |
Concurrent workers |
--queue-max-retries |
3 |
Retry attempts |
--queue-initial-delay |
1s |
First retry delay |
--queue-max-delay |
5m |
Max backoff cap |
--queue-stale-threshold |
5m |
Zombie task recovery |
hector serve \
--queue-workers 8 \
--queue-max-retries 5 \
--queue-stale-threshold 3m
Task States¶
| State | Description |
|---|---|
pending |
Waiting in queue |
running |
Being processed |
completed |
Successfully finished |
failed |
Exhausted retries |
input_required |
Awaiting human approval |
Stale Task Recovery¶
If a worker crashes, other workers pick up stale tasks after threshold.
Checkpoint System¶
Incremental progress snapshots for long-running operations.
Use Cases¶
- RAG Indexing - Resume after restart via file checksums
- Workflow Stages - Track progress through pipelines
- HITL Tasks - Preserve state awaiting approval
Recovery Behavior¶
On restart, Hector automatically recovers: - Pending tasks from queue - In-progress RAG indexing - HITL-waiting tasks
No manual intervention required.
Human-in-the-Loop (HITL)¶
When a tool requires approval, the task checkpoints and waits:
tools:
deploy_prod:
type: command
command: "./deploy.sh"
require_approval: true
Flow¶
- Agent calls approval-required tool
- Task enters
input_requiredstate - Checkpoint saved to database
- API returns approval request
- Human approves/rejects via API
- Task resumes or terminates
API¶
# List pending approvals
curl http://localhost:8080/tasks?status=input_required
# Approve
curl -X POST http://localhost:8080/tasks/{id}/approve \
-H "Authorization: Bearer ${TOKEN}"
Multi-Tenancy¶
Deploy multiple isolated apps on a single Hector instance.
Architecture¶
┌─────────────────────────────────────────────┐
│ Hector Server │
│ ┌───────────┐ ┌───────────┐ ┌─────────┐ │
│ │ App: A │ │ App: B │ │ App: C │ │
│ └───────────┘ └───────────┘ └─────────┘ │
└─────────────────────────────────────────────┘
Each app has isolated agents, tools, sessions, and configuration.
Admin API¶
# Create app
curl -X POST http://localhost:8080/admin/apps \
-H "Authorization: Bearer ${AUTH_SECRET}" \
-d '{"name": "customer-a", "config": {...}}'
# List apps
curl http://localhost:8080/admin/apps
Tenant Routing via JWT¶
JWT tenant_id claim maps to app name automatically.
Health Checks¶
curl http://localhost:8080/health
{
"status": "ok",
"version": "v1.20.0",
"database": "connected"
}
Kubernetes Probes¶
livenessProbe:
httpGet:
path: /health
port: 8080
Graceful Shutdown¶
On SIGTERM/SIGINT:
- Stop accepting new requests
- Wait for in-flight requests (30s timeout)
- Checkpoint active tasks
- Close database connections