Why Hector?¶

Built for Production, Not Prototyping¶

Hector is designed for teams who need to run AI agents in production with the same operational rigor as their other services.

The Production Gap¶

Most AI agent frameworks optimize for developer experience and rapid prototyping. They're excellent for experimentation but lack production essentials:

❌ No built-in observability
❌ Configuration requires code changes and redeployment
❌ No authentication/authorization primitives
❌ Large resource footprint (Python runtimes)
❌ Custom APIs instead of standards

Hector fills this gap.

Core Philosophy¶

1. Operations-First¶

Hot Configuration Reload

# Update agent configuration in Consul
consul kv put hector/prod @new-config.json

# Agents reload automatically—no restart, no downtime

Built-in Observability

Prometheus metrics: latency, token usage, costs, errors
OpenTelemetry traces: distributed tracing across agent calls
Structured logging: context propagation and correlation

Production Deployment Patterns

Kubernetes-ready with health checks
Distributed config from Consul/Etcd/ZooKeeper
Zero-downtime rolling updates

2. Security-First¶

Not security as an afterthought—security as a design principle:

Authentication: JWT with JWKS, API keys
Authorization: Authentication-based visibility controls (public/internal/private agents)
Command Sandboxing: Allowlist-based tool execution
Secret Management: Environment variable interpolation

global:
  auth:
    enabled: true
    jwks_url: https://auth.company.com/.well-known/jwks.json

agents:
  internal-analyst:
    visibility: private  # Not exposed externally
    llm: gpt-4o
    tools:
      - name: command
        config:
          sandboxing: true
          allowed_commands: [ls, cat, grep]

3. Standards-Based¶

A2A Protocol Native

Hector implements the Agent-to-Agent (A2A) protocol as a first-class citizen:

Standardized discovery, messaging, and streaming
Federation: agents call agents across boundaries
Interoperability: integrate external A2A services

agents:
  coordinator:
    llm: gpt-4o
    tools: [agent_call]  # Call any A2A agent

  external-specialist:
    type: a2a
    url: https://external-service/v1
    description: "Specialized processing service"

4. Zero-Code Configuration¶

Define sophisticated agents entirely in YAML:

agents:
  analyst:
    llm: gpt-4o
    tools: [search, write_file, agent_call]
    reasoning:
      engine: chain-of-thought
      max_iterations: 100
    memory:
      working:
        strategy: summary_buffer
        max_tokens: 4000
      long_term:
        type: vector
        database: qdrant-cluster

No Python. No JavaScript. No code.

When to Choose Hector¶

✅ Choose Hector When:¶

Running agents in production with SLAs and uptime requirements
Need observability (Prometheus, OpenTelemetry, distributed tracing)
Require security controls (auth, authorization, sandboxing)
Want operational flexibility (hot reload, distributed config)
Building multi-agent systems with A2A federation
Deploying to resource-constrained environments (edge, Lambda)
Platform engineers managing AI infrastructure for teams

⚠️ Consider Alternatives When:¶

Rapid prototyping with frequent code changes
Heavy Python ecosystem integration required
Custom agent behaviors needing programmatic control
Research projects without production requirements

Comparison: Hector vs. Traditional Frameworks¶

Aspect	Hector	LangChain/LlamaIndex
Configuration	Pure YAML	Python/JS code
Hot Reload	Built-in, zero-downtime	Requires redeployment
Observability	Prometheus + OTEL native	Manual instrumentation
Security	JWT, visibility controls, sandboxing	DIY
Binary Size	30MB (stripped)	200-500MB+ runtime
Startup Time	<100ms	2-10 seconds
Deployment	Single 30MB binary	Runtime + dependencies
A2A Protocol	Native	Not supported
Best For	Production operations	Development/prototyping

Architecture for Production¶

Hector's design makes production deployment natural:

┌─────────────────────────────────────────┐
│         Load Balancer                    │
└─────────────┬───────────────────────────┘
              │
        ┌─────┴─────┬─────────┐
        ▼           ▼         ▼
   ┏━━━━━━━┓   ┏━━━━━━━┓   ┏━━━━━━━┓
   ┃Hector ┃   ┃Hector ┃   ┃Hector ┃
   ┃ Pod 1 ┃   ┃ Pod 2 ┃   ┃ Pod 3 ┃
   ┗━━━┬━━━┛   ┗━━━┬━━━┛   ┗━━━┬━━━┛
        │           │           │
        └───────────┴───────────┘
                    │
        ┌───────────┼──────────┬──────────┐
        ▼           ▼          ▼          ▼
   ┌────────┐  ┌────────┐ ┌────────┐ ┌────────┐
   │ Consul │  │Postgres│ │ Qdrant │ │ OTEL   │
   │ Config │  │Sessions│ │  RAG   │ │Collector│
   └────────┘  └────────┘ └────────┘ └────────┘

Key Characteristics:

Stateless servers: Scale horizontally
Distributed config: Centralized, hot-reloadable
Session persistence: SQL-backed continuity
Observability: Metrics and traces to collectors

Real-World Use Cases¶

1. Enterprise Agent Infrastructure¶

Platform teams deploying multi-agent systems across organizations with A2A v0.3.0 compliance, security, and observability requirements.

Example: Central platform team provides agent infrastructure via YAML configs to 50+ product teams. JWT auth ensures proper access control, Prometheus metrics track usage across teams, hot reload enables rapid iteration without downtime.

2. High-Throughput Production Services¶

Customer-facing AI services needing low latency, high availability, and efficient resource usage.

Example: E-commerce company runs 100+ concurrent agents handling customer inquiries. 128MB footprint per pod enables dense packing on Kubernetes, <100ms startup enables fast autoscaling, OpenTelemetry traces track requests across agent orchestration.

3. Edge/IoT Deployments¶

Running agents on resource-constrained devices.

Example: Manufacturing company deploys agents on edge devices (10MB binary, 128MB RAM) for real-time quality analysis. Single binary simplifies deployment, Go efficiency enables running on ARM processors.

4. Regulated Industries¶

Financial services, healthcare, etc. requiring audit trails, RBAC, and security controls.

Example: Financial institution uses Hector for internal analyst agents. JWT integration with corporate IdP, command sandboxing restricts file system access, OpenTelemetry provides audit trails for compliance.

5. Multi-Team Agent Platforms¶

Central platform teams providing agent infrastructure to product teams via declarative config.

Example: SaaS company platform team maintains Hector infrastructure. Product teams submit YAML configs via GitOps pipeline, platform team reviews and deploys via hot reload, Prometheus provides centralized monitoring dashboard.

Target Audience¶

Primary: Platform Engineers & SREs¶

You manage infrastructure and need to deploy AI agents with the same operational standards as other production services:

Observable (metrics, traces, logs)
Secure (auth, authorization, sandboxing)
Reliable (health checks, graceful shutdown, zero-downtime updates)
Efficient (low resource usage, fast startup)

Secondary: AI Product Teams¶

You build AI-powered products and need production infrastructure without managing complexity:

No code required (pure YAML)
Fast iteration (hot reload)
Multi-agent orchestration (A2A native)
Flexible deployment modes

Performance Benefits¶

Resource Efficiency¶

Metric	Hector	Python Frameworks	Difference
Binary Size	30MB (stripped)	200-500MB+ runtime	7-15x smaller
Startup Time	<100ms	2-10s	20-100x faster
Runtime Footprint	Minimal (Go)	200-500MB+ (Python)	10-20x less
Container Image	50MB (Alpine)	500MB-2GB+	10-40x smaller

Cost Implications¶

Example: 100 agent deployment

Python Framework: 100 pods × 500MB = 50GB RAM baseline
Hector: 100 pods × 50MB = 5GB RAM baseline
Savings: ~90% reduction in baseline resource usage

Edge Deployment¶

Hector's efficiency enables edge/IoT deployment impossible with Python frameworks:

Raspberry Pi 4 (4GB RAM): 10-20 Hector agents vs. 2-5 Python agents
AWS Lambda: Fast cold starts (<100ms) vs. multi-second Python cold starts
Edge devices: Single binary, no runtime dependencies

Developer Experience¶

Configuration-as-Code¶

# This is the entire configuration—no Python/JS required
agents:
  analyst:
    llm: gpt-4o
    tools: [search, write_file, agent_call]
    reasoning:
      engine: chain-of-thought
      max_iterations: 100
    memory:
      working:
        strategy: summary_buffer
        max_tokens: 4000
      long_term:
        type: vector
        database: qdrant-cluster

llms:
  gpt-4o:
    type: openai
    model: gpt-4o
    api_key: ${OPENAI_API_KEY}

databases:
  qdrant-cluster:
    type: qdrant
    host: qdrant.internal
    port: 6334

GitOps-Friendly¶

Store configurations in Git, deploy via CI/CD:

# Development
git checkout -b add-new-agent
vim agents.yaml  # Add agent config
git commit -m "Add customer support agent"
git push

# Production (via hot reload)
consul kv put hector/prod @agents.yaml
# Agents update automatically, no restart needed

Testing & Validation¶

# Validate configuration
hector validate --config agents.yaml

# Test locally
hector call "Test message" --agent analyst --config agents.yaml

# Integration test
hector serve --config agents.yaml &
curl -X POST http://localhost:8080/v1/agents/analyst/message:send \
  -d '{"message": {"parts": [{"text": "Test"}], "role": "user"}}'

Migration Path¶

From LangChain/LlamaIndex¶

Extract LLM configuration → Hector llms: section
Extract agent prompts → Hector prompt: section
Map tools → Hector built-in tools or MCP servers
Configure memory → Hector memory strategies
Deploy → Single binary, no refactoring

Example Translation:

LangChain (Python)Hector (YAML)

from langchain.agents import initialize_agent
from langchain.llms import OpenAI
from langchain.tools import Tool

llm = OpenAI(temperature=0.7, model="gpt-4")
tools = [search_tool, write_file_tool]

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description",
    verbose=True
)

agents:
  assistant:
    llm: gpt-4o
    tools: [search, write_file]
    reasoning:
      engine: chain-of-thought

llms:
  gpt-4o:
    type: openai
    model: gpt-4o
    temperature: 0.7
    api_key: ${OPENAI_API_KEY}

No Python code required. Clearer, more maintainable, production-ready.

Common Questions¶

"Can I still use custom code?"¶

Yes, via the plugin system:

// Custom Go plugin
package main

import "github.com/kadirpekel/hector/pkg/tool"

func MyCustomTool(input string) (string, error) {
    // Your logic here
    return result, nil
}

But most use cases are covered by built-in tools + MCP integration.

"How does it compare to Semantic Kernel?"¶

Aspect	Hector	Semantic Kernel
Language	Go (production)	C#/.NET
Configuration	Pure YAML	Code + config
A2A Protocol	Native	Not supported
Hot Reload	Yes	No
Observability	Built-in	Manual
Deployment	Single binary	.NET runtime required

"Is Python better for AI/ML?"¶

For research and prototyping, yes. For production deployment, Go offers:

10-20x less resource usage
20-100x faster startup
Single binary deployment (30MB vs 200-500MB+ runtime)
Better concurrency (goroutines vs. GIL)
Easier operations (no dependency hell)

Hector doesn't replace Python notebooks or research code—it provides production infrastructure for deploying agents.

"What about performance of LLM calls?"¶

LLM API latency dominates performance, not the framework. Hector's efficiency matters for:

Infrastructure costs (100x less memory)
Startup time (autoscaling, edge deployment)
Concurrent agents (goroutines vs. threads)

Getting Started¶

Ready to deploy production-ready agents?

Quick Start Guide → Production Deployment →

Next Steps¶

Core Concepts - Understand Hector's architecture
Observability - Setup metrics and tracing
Security - Configure auth and authorization
Multi-Agent Systems - Build agent orchestration
Configuration Reference - Complete configuration guide