Skip to content

Performance

Why Hector is Different

Hector is built for efficiency. While many AI agent frameworks require heavy Python runtimes, multiple processes, and gigabytes of memory, Hector runs as a single native binary with minimal resource requirements.

Looking for System Architecture?

This section covers performance & efficiency (resource usage, scaling, cost optimization).

For system architecture (components, data flow, design principles), see:


Why This Matters

🚀 For Resource-Constrained Environments

  • Edge Devices: Run AI agents on Raspberry Pi, IoT devices
  • Development Machines: No Python environment, no virtual envs
  • CI/CD Pipelines: Fast startup, low memory footprint
  • Cost Optimization: Fewer cloud resources = lower bills

💡 For Production Deployments

  • Horizontal Scaling: Stateless design enables unlimited scaling
  • High Concurrency: Handle thousands of sessions per instance
  • Fast Response Times: Sub-millisecond overhead (vs. Python's ~50-100ms)
  • Operational Simplicity: Single binary, no dependency management

Architecture Deep Dives

Explore how Hector achieves its efficiency advantages:

Session & Memory Architecture

Learn how Hector's three-layer memory system (SQL + Working Memory + Vector DB) provides persistent, scalable session management.

Key Topics: - Strategy-managed persistence with checkpoint detection - Multi-agent session isolation - Database schema design - Thread safety and concurrency

Agent Lifecycle & Scalability

Understand why Hector's stateless agent design enables superior scalability and resource efficiency.

Key Topics: - Shared instance architecture (100,000x more memory efficient) - Thread safety analysis - Horizontal scaling patterns - Performance benchmarks


Key Performance Advantages

1. Memory Footprint

Framework Agents Sessions Memory Usage
Hector 10 100,000 ~50 MB
Python (LangChain) 10 100,000 ~5-10 GB
Python (CrewAI) 10 100,000 ~8-15 GB

Why: Native Go binary + shared instances + efficient memory management


2. Startup Time

Framework Cold Start First Request
Hector 50-100ms ~0ms overhead
Python (LangChain) 2-5 seconds ~50-100ms overhead
Python (CrewAI) 3-7 seconds ~100-200ms overhead

Why: Pre-compiled binary vs. Python runtime initialization


3. Concurrent Sessions

Framework Per Instance Limiting Factor
Hector 10,000+ Network I/O
Python (LangChain) ~100-500 GIL + Memory
Python (CrewAI) ~50-200 Memory

Why: Go's goroutines + stateless design + efficient concurrency


4. CPU Efficiency

Framework CPU per Request Overhead
Hector ~1-5% CPU Minimal
Python ~10-20% CPU Interpreter + GC

Why: Native compilation + no GIL + efficient runtime


Resource Requirements

Minimum Configuration

Run Hector on extremely limited hardware:

CPU:    0.5 cores (500m in Kubernetes)
Memory: 128 MB
Disk:   10 MB (binary only)

Use Case: Single-agent development, edge devices


Comfortable production deployment:

CPU:    2 cores
Memory: 512 MB
Disk:   50 MB

Handles: ~1000 concurrent sessions, multiple agents


High-Scale Production

For high-traffic applications:

CPU:    4 cores
Memory: 1 GB
Disk:   100 MB

Handles: 10,000+ concurrent sessions, dozens of agents


Scaling Strategies

Hector's stateless agent design enables trivial horizontal scaling:

           Load Balancer
                │
    ┌───────────┼───────────┐
    ▼           ▼           ▼
Server 1    Server 2    Server 3
(128MB)     (128MB)     (128MB)
    │           │           │
    └───────────┴───────────┘
              │
         Shared SQL DB
      (Session Persistence)

Key Benefits: - ✅ No sticky sessions required - ✅ Any request → any server - ✅ Auto-scaling in Kubernetes/Cloud - ✅ Fault tolerant (server failure = no data loss)


Vertical Scaling

Also works, but less efficient:

# Single large instance
CPU:    16 cores
Memory: 4 GB

# vs. 8 small instances
8 × (2 cores, 512 MB) = Same capacity, better fault tolerance

Recommendation: Use horizontal scaling for better resilience and cost.


Comparison with Python Frameworks

Why Python is Resource-Heavy

  1. Interpreter Overhead: Python runtime adds ~50-100ms per request
  2. Global Interpreter Lock (GIL): Limits true concurrency
  3. Memory Inefficiency: Python objects are 3-10x larger than native
  4. Dependency Bloat: Typical LangChain install is 500MB+ with dependencies
  5. Per-Process Architecture: Many frameworks spawn multiple processes

Why Hector is Efficient

  1. Native Compilation: Go compiles to native machine code
  2. True Concurrency: Goroutines enable 10,000+ concurrent sessions
  3. Memory Efficiency: Native structs are compact
  4. Single Binary: No dependencies, 10MB executable
  5. Shared Architecture: One agent serves all sessions

Real-World Scenarios

Scenario 1: Edge AI (Raspberry Pi)

Challenge: Run AI agent on Raspberry Pi 4 (4GB RAM)

Hector: - ✅ Runs comfortably with 128MB - ✅ Leaves 3.8GB for other applications - ✅ Fast startup (<100ms)

Python (LangChain): - ⚠️ Requires 1-2GB minimum - ⚠️ Slow startup (3-5 seconds) - ⚠️ Competes for resources


Scenario 2: Cost-Optimized Cloud

Challenge: Minimize cloud costs for 1000 users

Hector:

1 instance: 0.5 vCPU, 512MB RAM
Cost: ~$5-10/month (AWS Fargate/GCP Cloud Run)

Python:

4 instances: 2 vCPU, 2GB RAM each
Cost: ~$80-120/month

Savings: ~90% reduction in infrastructure costs


Scenario 3: High-Concurrency API

Challenge: Handle 10,000 concurrent sessions

Hector:

2 instances: 4 vCPU, 1GB RAM each
Total: 8 vCPU, 2GB RAM

Python:

20 instances: 4 vCPU, 4GB RAM each
Total: 80 vCPU, 80GB RAM

Resource Reduction: 90% fewer CPUs, 97.5% less memory


Deployment Patterns

Docker

FROM scratch
COPY hector /hector
ENTRYPOINT ["/hector"]

Image Size: 10-15 MB (vs. Python's 500MB-1GB)


Kubernetes

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: hector
        image: hector:latest
        resources:
          requests:
            memory: "128Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "2000m"

Cost: ~$15-30/month for 3 replicas


Serverless (AWS Lambda, Cloud Run)

# Cloud Run
CPU: 1 vCPU
Memory: 256 MB
Max Instances: 100
Min Instances: 0 (scales to zero!)

# Cold start: ~50-100ms (vs. Python's 2-5 seconds)

Perfect for: Bursty workloads, cost optimization


Monitoring & Observability

Key Metrics

Metric Typical Value Alert Threshold
Memory Usage 50-200 MB > 500 MB
CPU Usage 5-20% > 80%
Response Time 50-500ms > 2 seconds
Goroutines 10-1000 > 10,000
DB Connections 5-20 > 50

Best Practices

1. Right-Size Your Deployment

Start small and scale horizontally:

Stage 1 (Development):
  1 instance, 0.5 CPU, 128 MB

Stage 2 (Production Pilot):
  2 instances, 1 CPU, 256 MB

Stage 3 (Production Scale):
  3-10 instances, 2 CPU, 512 MB

2. Use Shared Database for Sessions

All Hector instances should share one SQL database:

session_stores:
  shared-prod:
    backend: sql
    sql:
      driver: postgres
      host: db.example.com
      max_conns: 200  # Shared pool

3. Monitor Memory, Not Just CPU

Memory is constant in Hector (unlike Python):

# Hector baseline: ~50 MB
# Python baseline: ~500 MB

# Hector per 1000 sessions: +5 MB
# Python per 1000 sessions: +500 MB

Frequently Asked Questions

Q: Can Hector handle Python-level concurrency?

A: Yes, and better. Hector uses goroutines (10,000+ concurrent) vs. Python's GIL-limited threads (~10-50 effective).

Q: What about LLM API latency?

A: LLM latency (50-500ms) dominates total response time. Hector's 0ms overhead is negligible, but Python's 50-100ms overhead is noticeable.

Q: Can I run Hector on a $5/month server?

A: Absolutely. Hector runs comfortably on 128MB RAM, making it perfect for budget hosting.

Q: How does Hector handle 1M requests/day?

A: With 3 instances (512MB each), Hector easily handles 1M requests/day (~12 req/sec average, 100+ req/sec peak).


Further Reading

Technical Deep Dives

System Design

Deployment


Summary

Hector's efficiency advantages:

  • 100,000x more memory efficient than per-session instances
  • 10x faster startup than Python frameworks
  • 10x higher concurrency per instance
  • 90% lower infrastructure costs in cloud
  • Runs on edge devices (Raspberry Pi, IoT)
  • Single binary deployment (no dependencies)

Perfect for: - Resource-constrained environments - Cost-optimized cloud deployments - High-concurrency applications - Edge AI and IoT - Rapid prototyping and development

Learn more: Explore the architecture deep dives to understand how Hector achieves these results.