Architecture Reference¶
Hector's architecture is designed for production deployments with observability, security, and A2A-native federation.
High-Level Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ HTTP/gRPC Server │
│ ┌────────────┬────────────┬────────────┬──────────────┐ │
│ │ Discovery │ A2A API │ Metrics │ Health │ │
│ └────────────┴────────────┴────────────┴──────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────▼───────────────────────────────────┐
│ Runtime │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Agent Registry │ LLM Providers │ Tool Registry │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Session Service │ Memory Index │ Checkpoint Mgr │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌───────▼────────┐ ┌──────▼──────┐ ┌───────▼────────┐
│ LLM Providers │ │ Vector DBs │ │ Databases │
│ - OpenAI │ │ - Chromem │ │ - SQLite │
│ - Anthropic │ │ - Qdrant │ │ - Postgres │
│ - Gemini │ │ - Pinecone │ │ - MySQL │
│ - Ollama │ └─────────────┘ └────────────────┘
└────────────────┘
Core Components¶
Runtime¶
The runtime is the central component that manages agent lifecycle:
Responsibilities:
- Build LLM providers from configuration
- Create embedders for semantic search
- Initialize tools and toolsets
- Construct agents with dependencies
- Manage RAG document stores
- Setup session and memory services
- Coordinate hot reload
Lifecycle: 1. Load configuration 2. Create LLM providers 3. Initialize embedders 4. Build tools and toolsets 5. Create agents 6. Setup persistence (sessions, tasks) 7. Start observability
Dependency Graph:
Agents
├─> LLMs
├─> Tools
│ └─> Toolsets
│ └─> MCP Servers (optional)
├─> Document Stores
│ ├─> Vector Providers
│ ├─> Embedders
│ └─> LLMs (for query processing)
└─> Sub-agents (recursive)
Session Service ← Sessions, Checkpoints
Index Service ← Embedders
Data Flow:
Configuration → Runtime → Components → Agents
Persistence Model:
Session Service (SOURCE OF TRUTH)
│
├─ Messages (conversation history)
├─ State (key-value store)
└─ Artifacts (files)
Index Service (SEARCH INDEX)
│
└─ Built from session events
(can be rebuilt at any time)
Server¶
HTTP/gRPC server exposing A2A protocol endpoints:
Endpoints:
| Endpoint | Description |
|---|---|
/.well-known/agent-card.json |
Default agent card |
/agents |
Multi-agent discovery (Hector extension) |
/agents/{name} |
Agent card (GET) / JSON-RPC (POST) |
/health |
Health check and auth discovery |
/metrics |
Prometheus metrics (when enabled) |
/api/schema |
JSON Schema for configuration |
/api/config |
Config read/write (studio mode) |
/api/tasks/.../cancel |
Cancel tool execution |
Transports:
- JSON-RPC over HTTP (default)
- gRPC (optional)
Agents¶
Three agent types:
LLM Agent (pkg/agent/llmagent):
- LLM-powered reasoning
- Tool execution
- Multi-turn conversations
- Memory and context management
Remote Agent (pkg/agent/remoteagent):
- Proxy to external A2A services
- Fetches agent card
- Forwards requests
- Federation support
Workflow Agent (pkg/agent/workflowagent):
- Sequential execution
- Parallel execution
- Loop/iteration
- Orchestrates sub-agents
Session Service¶
Manages conversation history and state:
Storage Backends:
| Backend | Persistence | Use Case |
|---|---|---|
| In-memory | Ephemeral | Development |
| SQL | Persistent | Production |
Responsibilities:
- Store messages
- Manage session state
- Track artifacts
- Persist across restarts
Architecture:
Session Service (SOURCE OF TRUTH)
│
├─ Messages (conversation history)
├─ State (key-value pairs)
└─ Artifacts (files, images)
Memory Index¶
Searchable index over conversation history:
Index Types:
| Type | Description | Requirements |
|---|---|---|
| Keyword | Simple word matching | None |
| Vector | Semantic similarity | Embedder required |
Use Cases:
- Search past conversations
- Find relevant context
- Knowledge retrieval
Architecture:
Session Service → Index Service → Vector Provider
(data) (search) (storage)
Index can be rebuilt from session data.
Checkpoint Manager¶
Execution state checkpointing for recovery:
Strategies:
| Strategy | When | Description |
|---|---|---|
| Event | Tool execution, LLM calls | Checkpoint at specific events |
| Interval | Every N iterations | Checkpoint at regular intervals |
| Hybrid | Both | Events and intervals combined |
Configuration:
storage:
checkpoint:
enabled: true
strategy: hybrid
after_tools: true
before_llm: true
interval: 5 # every 5 iterations
Storage:
- Checkpoints stored in session service
- Auto-cleanup of expired checkpoints
Recovery:
- Auto-resume on startup
- Manual recovery via API
- HITL approval for sensitive tasks
Recovery Configuration:
storage:
checkpoint:
recovery:
auto_resume: true
auto_resume_hitl: false
timeout: 3600 # 1h (in seconds)
Data Flow¶
Message Flow¶
1. Client Request
│
├─> HTTP Server
│ │
│ ├─> Authentication (if enabled)
│ └─> Rate Limiting (if enabled)
│
2. Runtime
│
├─> Agent Resolution
│ │
│ └─> Visibility Check
│
3. Agent Execution
│
├─> Session Load (from session service)
├─> Context Preparation
├─> LLM Call
│ │
│ ├─> Tool Execution (if tool calls)
│ │ │
│ │ ├─> Approval Check (HITL)
│ │ └─> Tool Result
│ │
│ └─> Response Generation
│
4. Persistence
│
├─> Session Update
├─> Memory Index Update
└─> Checkpoint Save (if enabled)
│
5. Response
│
└─> Client (streaming or complete)
RAG Flow¶
1. Document Ingestion
│
├─> Document Source (directory, SQL, API)
├─> MCP Parser (optional, for PDF/DOCX)
├─> Chunking Strategy
├─> Embedding Generation
└─> Vector Storage
2. Query Processing
│
├─> User Message
├─> Embedding Generation
├─> Vector Search
├─> Reranking (optional)
└─> Top K Results
3. Context Injection
│
├─> Retrieved Documents
├─> Format as Context
└─> Inject into System Prompt
Component Interactions¶
Agent + Tools¶
Agent
│
├─ Calls Tool
│ │
│ ├─> Approval Check (if required)
│ ├─> Tool Execution
│ │ │
│ │ ├─ Built-in Function
│ │ ├─ MCP Server Call
│ │ └─ Command Execution
│ │
│ └─> Result
│
└─ Processes Result
Multi-Agent Patterns¶
Pattern 1: Transfer (Sub-Agents)
Coordinator Agent
│
├─ Calls transfer_to_specialist
│
└─> Control Transferred
│
Specialist Agent
│
└─ Continues Conversation
Pattern 2: Delegation (Agent Tools)
Parent Agent
│
├─ Calls agent_tool
│
└─> Tool Execution
│
Agent Tool
│
├─ Executes Task
└─ Returns Result
│
Parent Agent
│
└─ Processes Result
Observability Integration¶
Every Request
│
├─> Start Trace Span
│ │
│ ├─ Agent Span
│ │ │
│ │ ├─ LLM Span
│ │ │ └─ Record: tokens, latency
│ │ │
│ │ ├─ Tool Span
│ │ │ └─ Record: duration, result
│ │ │
│ │ └─ Database Span
│ │ └─ Record: query, duration
│ │
│ └─> End Trace Span
│
└─> Update Metrics
│
├─ Counters (requests, tokens, errors)
├─ Histograms (latency)
└─ Gauges (active sessions)
Metrics (exposed at /metrics):
| Metric | Type | Description |
|---|---|---|
hector_llm_requests_total |
Counter | Total LLM requests |
hector_llm_tokens_total |
Counter | Total tokens used |
hector_tool_calls_total |
Counter | Total tool calls |
hector_agent_requests_total |
Counter | Total agent requests |
Traces (sent to OTLP endpoint):
Invocation Span
├─ Agent Span
│ ├─ LLM Span
│ ├─ Tool Span
│ └─ Database Span
└─ ...
Configuration System¶
Configuration Loading¶
1. Load Phase
│
├─> File Provider (YAML)
│ └─ Load from disk
│
├─> Environment Variables
│ └─ Interpolate ${VAR}
│
├─> Validation
└─ Schema check
├─> SKILL.md Detection
└─ Auto-configure instruction & tools
2. Runtime Phase
│
├─> Create Components
│ │
│ ├─ LLM Providers
│ ├─ Embedders
│ ├─ Tools
│ └─ Agents
│
└─> Watch Mode (optional)
│
└─> Hot Reload on Change
Hot Reload¶
Config File Change
│
├─> Detect Change (file watcher)
│
├─> Load New Config
│ │
│ └─> Validation
│
├─> Reload Runtime
│ │
│ ├─ Rebuild LLMs
│ ├─ Rebuild Tools
│ └─ Rebuild Agents
│
└─> Swap Components
│
└─> Active sessions preserved
What Reloads:
- LLM configurations
- Agent definitions
- Tool configurations
- RAG document stores
- Embedder settings
What Doesn't Reload:
- Active sessions (preserved)
- Session service (retained)
- Index service (retained)
- Server port/TLS (requires restart)
Persistence Architecture¶
Three-Layer Storage¶
┌──────────────────────────────────────────────┐
│ Application Layer (Runtime) │
└────────────┬─────────────────────────────────┘
│
┌────────────▼─────────────────────────────────┐
│ Session Service (SOURCE OF TRUTH) │
│ - Messages │
│ - State │
│ - Artifacts │
│ Backend: InMemory or SQL │
└────────────┬─────────────────────────────────┘
│
┌────────────▼─────────────────────────────────┐
│ Storage Layer │
│ - SQLite (embedded) │
│ - PostgreSQL (production) │
│ - MySQL (alternative) │
└───────────────────────────────────────────────┘
Checkpoint/Recovery¶
Execution State
│
├─> Event Trigger (tool execution, LLM call)
│ │
│ └─> Create Checkpoint
│ │
│ ├─ Serialize State
│ └─ Save to Session
│
├─> Interval Trigger (30s)
│ │
│ └─> Create Checkpoint
│
└─> Recovery (on restart)
│
├─> Load Checkpoints
├─> Filter Expired
├─> Auto-Resume (non-HITL)
└─> Await Approval (HITL)
Security Architecture¶
Authentication Flow¶
Request
│
├─> Extract Token (Authorization header)
│
├─> Validate Token
│ │
│ ├─ JWT: Verify signature with JWKS
│ └─ API Key: Compare with configured keys
│
├─> Extract Claims
│ │
│ └─ User ID, Email, Roles
│
└─> Attach to Context
Authorization¶
Agent Request
│
├─> Check Agent Visibility
│ │
│ ├─ Public: Allow (with auth if enabled)
│ ├─ Internal: Require auth
│ └─ Private: Deny HTTP access
│
└─> Check User Permissions (future)
Tool Security¶
Tool Call
│
├─> Check Approval Requirement
│ │
│ ├─ Required: Pause & Request Approval
│ └─ Not Required: Continue
│
├─> Check Sandboxing (commands)
│ │
│ ├─ Whitelist Check
│ ├─ Blacklist Check
│ └─ Working Directory Check
│
└─> Execute Tool
Scalability¶
Horizontal Scaling¶
Load Balancer
│
├─> Hector Instance 1
├─> Hector Instance 2
└─> Hector Instance 3
│
└─> Shared Database (PostgreSQL)
│
├─ Sessions (shared state)
└─ Tasks (distributed)
Stateless Design:
- No in-process state
- All state in database
- Instances interchangeable
Resource Efficiency¶
| Metric | Value |
|---|---|
| Binary Size | 30MB (stripped) |
| Memory Footprint | ~50-100MB baseline |
| Startup Time | <100ms |
| Goroutines | Efficient concurrency |
Resource Profile:
- Low memory footprint (~50-100MB)
- Fast startup (~100ms)
- Single binary deployment
Performance Characteristics¶
Request Latency:
| Component | Latency |
|---|---|
| Hector Overhead | <10ms (routing, parsing) |
| LLM Call | 500ms - 10s (dominates) |
| Tool Execution | 10ms - 1s (varies) |
| Database Query | 1-10ms (local) |
Throughput:
| Scenario | Requests/sec |
|---|---|
| Non-LLM bottleneck | 100-1000 req/s |
| LLM-limited | ~10-50 req/s (per provider) |
| Horizontal scaling | Linear with instances |
Resource Usage:
| Resource | Notes |
|---|---|
| CPU | Low baseline, spikes during LLM |
| Memory | ~50MB + sessions + vector data |
| Network | LLM API dominant |
| Disk | SQLite or logs only |
Design Principles¶
- Config-First: Agents defined declaratively, not in code
- A2A-Native: Full A2A v1.0 (DRAFT) protocol compliance
- Batteries Included: Observability, security, persistence built-in
- Resource Efficient: Go implementation, minimal footprint
- Extensible: Programmatic API, custom tools, custom LLMs
- Stateless: Horizontal scaling via shared database
- Standards-Based: OpenTelemetry, Prometheus, JWKS
Build Phases¶
Runtime builds components in dependency order:
| Phase | Components |
|---|---|
| 1 | Observability (tracing & metrics) |
| 2 | Session Service (data persistence) |
| 3 | LLM Providers (language models) |
| 4 | Embedders (semantic embeddings) |
| 5 | Vector Stores (vector databases) |
| 6 | Toolsets (tools for agents) |
| 7 | Document Stores (RAG sources) |
| 8 | Index Service (search capability) |
| 9 | Agents (configured agents) |
Supported Providers¶
LLM Providers¶
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4-turbo, gpt-4o-mini, etc. |
| Anthropic | claude-sonnet-4, claude-opus-4, etc. |
| Google Gemini | gemini-2.5-pro, etc. |
| Ollama | Local models |
Embedders¶
| Provider | Models |
|---|---|
| OpenAI | text-embedding-3-small, text-embedding-3-large |
| Ollama | Local embedding models |
| Cohere | embed-english-v3.0, etc. |
Vector Stores¶
| Provider | Type |
|---|---|
| Chromem | Embedded, file-based |
| Qdrant | Production vector database |
| Chroma | Open-source embedding database |
| Pinecone | Managed service |
| Weaviate | Open-source vector database |
| Milvus | Distributed vector database |
Tool Types¶
| Type | Description |
|---|---|
| Function | Built-in Go functions (text_editor, bash, etc.) |
| MCP | Model Context Protocol servers |
| Command | Shell command execution |
| Search | Web search tools |
Document Sources¶
| Type | Description |
|---|---|
| Directory | Local files |
| SQL | Database query results |
| URLs | Web pages |
| S3 | Cloud storage |