Architecture Reference¶

Hector's architecture is designed for production deployments with observability, security, and A2A-native federation.

High-Level Architecture¶

┌─────────────────────────────────────────────────────────────┐
│                      HTTP/gRPC Server                        │
│  ┌────────────┬────────────┬────────────┬──────────────┐   │
│  │ Discovery  │  A2A API   │  Metrics   │  Health      │   │
│  └────────────┴────────────┴────────────┴──────────────┘   │
└─────────────────────────┬───────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────┐
│                        Runtime                               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Agent Registry  │  LLM Providers  │  Tool Registry  │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Session Service  │  Memory Index  │  Checkpoint Mgr │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────┬───────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        │                 │                 │
┌───────▼────────┐ ┌──────▼──────┐ ┌───────▼────────┐
│  LLM Providers │ │ Vector DBs  │ │  Databases     │
│  - OpenAI      │ │ - Chromem   │ │  - SQLite      │
│  - Anthropic   │ │ - Qdrant    │ │  - Postgres    │
│  - Gemini      │ │ - Pinecone  │ │  - MySQL       │
│  - Ollama      │ └─────────────┘ └────────────────┘
└────────────────┘

Core Components¶

Runtime¶

The runtime is the central component that manages agent lifecycle:

Responsibilities:

Build LLM providers from configuration
Create embedders for semantic search
Initialize tools and toolsets
Construct agents with dependencies
Manage RAG document stores
Setup session and memory services
Coordinate hot reload

Lifecycle: 1. Load configuration 2. Create LLM providers 3. Initialize embedders 4. Build tools and toolsets 5. Create agents 6. Setup persistence (sessions, tasks) 7. Start observability

Dependency Graph:

Agents
  ├─> LLMs
  ├─> Tools
  │    └─> Toolsets
  │         └─> MCP Servers (optional)
  ├─> Document Stores
  │    ├─> Vector Providers
  │    ├─> Embedders
  │    └─> LLMs (for query processing)
  └─> Sub-agents (recursive)

Session Service ← Sessions, Checkpoints
Index Service   ← Embedders

Data Flow:

Configuration → Runtime → Components → Agents

Persistence Model:

Session Service (SOURCE OF TRUTH)
    │
    ├─ Messages (conversation history)
    ├─ State (key-value store)
    └─ Artifacts (files)

Index Service (SEARCH INDEX)
    │
    └─ Built from session events
       (can be rebuilt at any time)

Server¶

HTTP/gRPC server exposing A2A protocol endpoints:

Endpoints:

Endpoint	Description
`/.well-known/agent-card.json`	Default agent card
`/agents`	Multi-agent discovery (Hector extension)
`/agents/{name}`	Agent card (GET) / JSON-RPC (POST)
`/health`	Health check and auth discovery
`/metrics`	Prometheus metrics (when enabled)
`/api/schema`	JSON Schema for configuration
`/api/config`	Config read/write (studio mode)
`/api/tasks/.../cancel`	Cancel tool execution

Transports:

JSON-RPC over HTTP (default)
gRPC (optional)

Agents¶

Three agent types:

LLM Agent (pkg/agent/llmagent):

LLM-powered reasoning
Tool execution
Multi-turn conversations
Memory and context management

Remote Agent (pkg/agent/remoteagent):

Proxy to external A2A services
Fetches agent card
Forwards requests
Federation support

Workflow Agent (pkg/agent/workflowagent):

Sequential execution
Parallel execution
Loop/iteration
Orchestrates sub-agents

Session Service¶

Manages conversation history and state:

Storage Backends:

Backend	Persistence	Use Case
In-memory	Ephemeral	Development
SQL	Persistent	Production

Responsibilities:

Store messages
Manage session state
Track artifacts
Persist across restarts

Architecture:

Session Service (SOURCE OF TRUTH)
    │
    ├─ Messages (conversation history)
    ├─ State (key-value pairs)
    └─ Artifacts (files, images)

Memory Index¶

Searchable index over conversation history:

Index Types:

Type	Description	Requirements
Keyword	Simple word matching	None
Vector	Semantic similarity	Embedder required

Use Cases:

Search past conversations
Find relevant context
Knowledge retrieval

Architecture:

Session Service → Index Service → Vector Provider
     (data)          (search)        (storage)

Index can be rebuilt from session data.

Checkpoint Manager¶

Execution state checkpointing for recovery:

Strategies:

Strategy	When	Description
Event	Tool execution, LLM calls	Checkpoint at specific events
Interval	Every N iterations	Checkpoint at regular intervals
Hybrid	Both	Events and intervals combined

Configuration:

storage:
  checkpoint:
    enabled: true
    strategy: hybrid
    after_tools: true
    before_llm: true
    interval: 5  # every 5 iterations

Storage:

Checkpoints stored in session service
Auto-cleanup of expired checkpoints

Recovery:

Auto-resume on startup
Manual recovery via API
HITL approval for sensitive tasks

Recovery Configuration:

storage:
  checkpoint:
    recovery:
      auto_resume: true
      auto_resume_hitl: false
      timeout: 3600  # 1h (in seconds)

Data Flow¶

Message Flow¶

1. Client Request
   │
   ├─> HTTP Server
   │      │
   │      ├─> Authentication (if enabled)
   │      └─> Rate Limiting (if enabled)
   │
2. Runtime
   │
   ├─> Agent Resolution
   │      │
   │      └─> Visibility Check
   │
3. Agent Execution
   │
   ├─> Session Load (from session service)
   ├─> Context Preparation
   ├─> LLM Call
   │      │
   │      ├─> Tool Execution (if tool calls)
   │      │      │
   │      │      ├─> Approval Check (HITL)
   │      │      └─> Tool Result
   │      │
   │      └─> Response Generation
   │
4. Persistence
   │
   ├─> Session Update
   ├─> Memory Index Update
   └─> Checkpoint Save (if enabled)
   │
5. Response
   │
   └─> Client (streaming or complete)

RAG Flow¶

1. Document Ingestion
   │
   ├─> Document Source (directory, SQL, API)
   ├─> MCP Parser (optional, for PDF/DOCX)
   ├─> Chunking Strategy
   ├─> Embedding Generation
   └─> Vector Storage

2. Query Processing
   │
   ├─> User Message
   ├─> Embedding Generation
   ├─> Vector Search
   ├─> Reranking (optional)
   └─> Top K Results

3. Context Injection
   │
   ├─> Retrieved Documents
   ├─> Format as Context
   └─> Inject into System Prompt

Component Interactions¶

Agent + Tools¶

Agent
  │
  ├─ Calls Tool
  │     │
  │     ├─> Approval Check (if required)
  │     ├─> Tool Execution
  │     │      │
  │     │      ├─ Built-in Function
  │     │      ├─ MCP Server Call
  │     │      └─ Command Execution
  │     │
  │     └─> Result
  │
  └─ Processes Result

Multi-Agent Patterns¶

Pattern 1: Transfer (Sub-Agents)

Coordinator Agent
  │
  ├─ Calls transfer_to_specialist
  │
  └─> Control Transferred
        │
        Specialist Agent
          │
          └─ Continues Conversation

Pattern 2: Delegation (Agent Tools)

Parent Agent
  │
  ├─ Calls agent_tool
  │
  └─> Tool Execution
        │
        Agent Tool
          │
          ├─ Executes Task
          └─ Returns Result
        │
  Parent Agent
    │
    └─ Processes Result

Observability Integration¶

Every Request
  │
  ├─> Start Trace Span
  │      │
  │      ├─ Agent Span
  │      │     │
  │      │     ├─ LLM Span
  │      │     │    └─ Record: tokens, latency
  │      │     │
  │      │     ├─ Tool Span
  │      │     │    └─ Record: duration, result
  │      │     │
  │      │     └─ Database Span
  │      │          └─ Record: query, duration
  │      │
  │      └─> End Trace Span
  │
  └─> Update Metrics
         │
         ├─ Counters (requests, tokens, errors)
         ├─ Histograms (latency)
         └─ Gauges (active sessions)

Metrics (exposed at /metrics):

Metric	Type	Description
`hector_llm_requests_total`	Counter	Total LLM requests
`hector_llm_tokens_total`	Counter	Total tokens used
`hector_tool_calls_total`	Counter	Total tool calls
`hector_agent_requests_total`	Counter	Total agent requests

Traces (sent to OTLP endpoint):

Invocation Span
  ├─ Agent Span
  │   ├─ LLM Span
  │   ├─ Tool Span
  │   └─ Database Span
  └─ ...

Configuration System¶

Configuration Loading¶

1. Load Phase
   │
   ├─> File Provider (YAML)
   │      └─ Load from disk
   │
   ├─> Environment Variables
   │      └─ Interpolate ${VAR}
   │
   ├─> Validation
          └─ Schema check

   ├─> SKILL.md Detection
          └─ Auto-configure instruction & tools

2. Runtime Phase
   │
   ├─> Create Components
   │      │
   │      ├─ LLM Providers
   │      ├─ Embedders
   │      ├─ Tools
   │      └─ Agents
   │
   └─> Watch Mode (optional)
          │
          └─> Hot Reload on Change

Hot Reload¶

Config File Change
  │
  ├─> Detect Change (file watcher)
  │
  ├─> Load New Config
  │      │
  │      └─> Validation
  │
  ├─> Reload Runtime
  │      │
  │      ├─ Rebuild LLMs
  │      ├─ Rebuild Tools
  │      └─ Rebuild Agents
  │
  └─> Swap Components
         │
         └─> Active sessions preserved

What Reloads:

LLM configurations
Agent definitions
Tool configurations
RAG document stores
Embedder settings

What Doesn't Reload:

Active sessions (preserved)
Session service (retained)
Index service (retained)
Server port/TLS (requires restart)

Persistence Architecture¶

Three-Layer Storage¶

┌──────────────────────────────────────────────┐
│  Application Layer (Runtime)                  │
└────────────┬─────────────────────────────────┘
             │
┌────────────▼─────────────────────────────────┐
│  Session Service (SOURCE OF TRUTH)            │
│  - Messages                                   │
│  - State                                      │
│  - Artifacts                                  │
│  Backend: InMemory or SQL                     │
└────────────┬─────────────────────────────────┘
             │
┌────────────▼─────────────────────────────────┐
│  Storage Layer                                │
│  - SQLite (embedded)                          │
│  - PostgreSQL (production)                    │
│  - MySQL (alternative)                        │
└───────────────────────────────────────────────┘

Checkpoint/Recovery¶

Execution State
  │
  ├─> Event Trigger (tool execution, LLM call)
  │      │
  │      └─> Create Checkpoint
  │            │
  │            ├─ Serialize State
  │            └─ Save to Session
  │
  ├─> Interval Trigger (30s)
  │      │
  │      └─> Create Checkpoint
  │
  └─> Recovery (on restart)
         │
         ├─> Load Checkpoints
         ├─> Filter Expired
         ├─> Auto-Resume (non-HITL)
         └─> Await Approval (HITL)

Security Architecture¶

Authentication Flow¶

Request
  │
  ├─> Extract Token (Authorization header)
  │
  ├─> Validate Token
  │      │
  │      ├─ JWT: Verify signature with JWKS
  │      └─ API Key: Compare with configured keys
  │
  ├─> Extract Claims
  │      │
  │      └─ User ID, Email, Roles
  │
  └─> Attach to Context

Authorization¶

Agent Request
  │
  ├─> Check Agent Visibility
  │      │
  │      ├─ Public: Allow (with auth if enabled)
  │      ├─ Internal: Require auth
  │      └─ Private: Deny HTTP access
  │
  └─> Check User Permissions (future)

Tool Security¶

Tool Call
  │
  ├─> Check Approval Requirement
  │      │
  │      ├─ Required: Pause & Request Approval
  │      └─ Not Required: Continue
  │
  ├─> Check Sandboxing (commands)
  │      │
  │      ├─ Whitelist Check
  │      ├─ Blacklist Check
  │      └─ Working Directory Check
  │
  └─> Execute Tool

Scalability¶

Horizontal Scaling¶

Load Balancer
  │
  ├─> Hector Instance 1
  ├─> Hector Instance 2
  └─> Hector Instance 3
       │
       └─> Shared Database (PostgreSQL)
              │
              ├─ Sessions (shared state)
              └─ Tasks (distributed)

Stateless Design:

No in-process state
All state in database
Instances interchangeable

Resource Efficiency¶

Metric	Value
Binary Size	30MB (stripped)
Memory Footprint	~50-100MB baseline
Startup Time	<100ms
Goroutines	Efficient concurrency

Resource Profile:

Low memory footprint (~50-100MB)
Fast startup (~100ms)
Single binary deployment

Performance Characteristics¶

Request Latency:

Component	Latency
Hector Overhead	<10ms (routing, parsing)
LLM Call	500ms - 10s (dominates)
Tool Execution	10ms - 1s (varies)
Database Query	1-10ms (local)

Throughput:

Scenario	Requests/sec
Non-LLM bottleneck	100-1000 req/s
LLM-limited	~10-50 req/s (per provider)
Horizontal scaling	Linear with instances

Resource Usage:

Resource	Notes
CPU	Low baseline, spikes during LLM
Memory	~50MB + sessions + vector data
Network	LLM API dominant
Disk	SQLite or logs only

Design Principles¶

Config-First: Agents defined declaratively, not in code
A2A-Native: Full A2A v1.0 (DRAFT) protocol compliance
Batteries Included: Observability, security, persistence built-in
Resource Efficient: Go implementation, minimal footprint
Extensible: Programmatic API, custom tools, custom LLMs
Stateless: Horizontal scaling via shared database
Standards-Based: OpenTelemetry, Prometheus, JWKS

Build Phases¶

Runtime builds components in dependency order:

Phase	Components
1	Observability (tracing & metrics)
2	Session Service (data persistence)
3	LLM Providers (language models)
4	Embedders (semantic embeddings)
5	Vector Stores (vector databases)
6	Toolsets (tools for agents)
7	Document Stores (RAG sources)
8	Index Service (search capability)
9	Agents (configured agents)

Supported Providers¶

LLM Providers¶

Provider	Models
OpenAI	gpt-4o, gpt-4-turbo, gpt-4o-mini, etc.
Anthropic	claude-sonnet-4, claude-opus-4, etc.
Google Gemini	gemini-2.5-pro, etc.
Ollama	Local models

Embedders¶

Provider	Models
OpenAI	text-embedding-3-small, text-embedding-3-large
Ollama	Local embedding models
Cohere	embed-english-v3.0, etc.

Vector Stores¶

Provider	Type
Chromem	Embedded, file-based
Qdrant	Production vector database
Chroma	Open-source embedding database
Pinecone	Managed service
Weaviate	Open-source vector database
Milvus	Distributed vector database

Tool Types¶

Type	Description
Function	Built-in Go functions (text_editor, bash, etc.)
MCP	Model Context Protocol servers
Command	Shell command execution
Search	Web search tools

Document Sources¶

Type	Description
Directory	Local files
SQL	Database query results
URLs	Web pages
S3	Cloud storage