Agents¶
Agents are the core building blocks of Hector. An Agent is an autonomous entity that combines an LLM, Tools, and Instructions to solve tasks.
Agent Types¶
| Type | Description |
|---|---|
llm |
LLM-backed intelligent agent (default) |
sequential |
Runs sub-agents in order |
parallel |
Runs sub-agents concurrently |
loop |
Iterates until condition met |
conditional |
Routes based on evaluation |
runner |
Deterministic tool pipeline (no LLM) |
remote |
Proxies to external A2A agent |
LLM Agents¶
The default agent type uses an LLM for reasoning.
agents:
assistant:
name: "Assistant"
description: "A helpful AI assistant"
llm: claude
instruction: "You are a helpful assistant."
tools: [search, calculator]
Key Components¶
| Component | Description |
|---|---|
| llm | Model powering the agent (e.g., claude, gpt4) |
| instruction | System prompt defining behavior |
| tools | Capabilities the agent can call |
Reasoning Loop¶
When an agent receives a task:
- Observe: Read conversation history and input
- Think: LLM generates decision
- Act: Emit tool call if needed
- Result: Execute tool, return result
- Repeat: Until final answer
Instructions¶
Inline or from file:
# Inline
instruction: "You are a concise chatbot."
# From file
instruction_file: "./prompts/researcher.md"
Template variables for dynamic context:
- {user:name} - User-scoped
- {app:config} - App-scoped
- {artifact.data} - File content
Reasoning Configuration¶
agents:
coder:
reasoning:
max_iterations: 50
enable_exit_tool: true
enable_escalate_tool: true
Context & Memory¶
Control conversation history to fit LLM context limits. See the Context & Memory Strategies section below for full details.
agents:
chatbot:
context:
strategy: token_window
budget: 8000
Multi-Agent Orchestration¶
Compose agents into complex systems using workflow types.
Sequential¶
Run sub-agents in strict order:
agents:
blog_pipeline:
type: sequential
sub_agents: [researcher, writer, editor]
researcher:
llm: claude
instruction: "Find facts about the topic."
writer:
llm: claude
instruction: "Write a draft."
editor:
llm: gpt4
instruction: "Fix grammar and tone."
Parallel¶
Run sub-agents concurrently:
agents:
consensus:
type: parallel
sub_agents: [analyst_a, analyst_b, analyst_c]
Loop¶
Iterate until condition or max:
agents:
refinement:
type: loop
sub_agents: [coder, reviewer]
max_iterations: 3
Conditional¶
Route based on evaluation:
agents:
safe_assistant:
type: conditional
condition_agent: moderator
condition_field: "safe"
on_true_agent: helper
on_false_response: "I cannot help with that."
Runner Agents¶
Deterministic tool pipelines with no LLM involvement.
agents:
etl_job:
type: runner
tools: [fetch_api, transform, save_data]
How It Works¶
- Input parsed as JSON
- Tool 1 receives input, returns output
- Output of Tool N becomes input of Tool N+1
- Final tool output is the agent response
Use Cases¶
- Data fetching pipelines
- ETL workflows
- CI/CD automation
- Format conversion
Combining with LLM Agents¶
Use runners as sub-agents for reliable data fetching:
agents:
analyst:
llm: claude
sub_agents: [data_fetcher]
data_fetcher:
type: runner
tools: [stock_api, news_api]
Agent Composition¶
Sub-Agents (Transfer)¶
Control transfers to sub-agent. The parent hands off the conversation. The sub-agent takes over, including the user interaction.
agents:
manager:
sub_agents: [researcher, writer]
Runtime provides transfer_to_researcher and transfer_to_writer tools.
Agent Tools (Delegation)¶
Parent stays in control. The child agent executes in an isolated session and returns a result. The parent decides what to do with it.
agents:
assistant:
agent_tools: [fact_checker]
When to Use Which¶
| Sub-Agents (Transfer) | Agent Tools (Delegation) | |
|---|---|---|
| Control | Child takes over conversation | Parent stays in control |
| Session | Shared session with parent | Isolated session (no state bleed) |
| Best for | Routing/triage, specialized handlers | Helper tasks, data enrichment |
| Example | "Transfer to billing department" | "Ask the fact-checker about this claim" |
Rule of thumb: Use sub_agents when the child should talk directly to the user. Use agent_tools when the parent needs to process the child's output.
Triggers¶
Run agents automatically.
Scheduled¶
trigger:
type: schedule
cron: "0 9 * * *"
timezone: America/New_York
Webhook¶
trigger:
type: webhook
path: /webhooks/github
secret: ${GITHUB_SECRET}
See Triggers Guide for details.
Notifications¶
Outbound webhooks on agent events:
notifications:
- id: slack
events: [task_completed, task_failed]
url: https://hooks.slack.com/...
Visibility¶
Control which agents are discoverable and accessible via the A2A protocol:
| Visibility | Discoverable | HTTP Accessible | Use Case |
|---|---|---|---|
public |
Yes | Yes (auth if enabled) | Customer-facing agents |
internal |
Only when authenticated | Yes | Admin/internal tools |
private |
No | No (internal calls only) | Sub-agents, helper agents |
agents:
customer_agent:
visibility: public # Listed in agent card, anyone can call
# ...
admin_tools:
visibility: internal # Only authenticated users can discover & call
# ...
classifier:
visibility: private # Only callable by other agents, never via HTTP
# ...
The /.well-known/agent-card.json and /agents endpoints only list agents matching the caller's access level. Private agents are invisible to all external consumers.
Context & Memory Strategies¶
Control how conversation history is managed to stay within LLM context limits.
Strategy Overview¶
| Strategy | Description | Best For |
|---|---|---|
none |
Keep all messages (default) | Short conversations |
buffer_window |
Keep last N messages | Simple chat UIs |
token_window |
Keep messages within token budget | Cost control |
summary_buffer |
Summarize older messages with LLM | Long conversations |
Buffer Window¶
Keep a fixed number of recent messages:
agents:
chatbot:
context:
strategy: buffer_window
window_size: 20 # Keep last 20 messages
Token Window¶
Keep recent messages within a token budget:
agents:
chatbot:
context:
strategy: token_window
budget: 8000 # Max 8000 tokens of history
Summary Buffer (Autonomous Summarization)¶
When conversation history exceeds the token threshold, Hector automatically summarizes older messages using the agent's LLM. The summary preserves key facts, names, decisions, and context while reducing token count.
agents:
chatbot:
context:
strategy: summary_buffer
budget: 8000 # Summarize when exceeding this
How it works:
- History grows normally until the token budget is reached
- Older messages are passed to the LLM with a summarization prompt
- The summary replaces those messages, preserving facts and context
- New messages continue to accumulate until the next summarization
This happens transparently. Users don't see the summarization, only its effects (the agent maintains long-term context without running out of tokens).