Observability¶
Monitor Hector with Prometheus metrics and OpenTelemetry tracing.
Quick Start¶
Enable observability with zero-config:
hector serve --model gpt-5 --observe
This enables:
- Prometheus metrics at
/metrics - OTLP tracing to
localhost:4317
Metrics¶
Enable Metrics¶
Zero-config:
hector serve --model gpt-5 --observe
Configuration file:
server:
observability:
metrics:
enabled: true
Access Metrics¶
Metrics endpoint:
http://localhost:8080/metrics
Example output:
# HELP hector_llm_requests_total Total LLM requests
# TYPE hector_llm_requests_total counter
hector_llm_requests_total{agent="assistant",provider="openai",model="gpt-4o"} 42
# HELP hector_llm_request_duration_seconds LLM request duration
# TYPE hector_llm_request_duration_seconds histogram
hector_llm_request_duration_seconds_bucket{agent="assistant",le="0.5"} 10
hector_llm_request_duration_seconds_bucket{agent="assistant",le="1"} 25
hector_llm_request_duration_seconds_bucket{agent="assistant",le="5"} 40
hector_llm_request_duration_seconds_sum{agent="assistant"} 89.4
hector_llm_request_duration_seconds_count{agent="assistant"} 42
# HELP hector_llm_tokens_total Total tokens used
# TYPE hector_llm_tokens_total counter
hector_llm_tokens_total{agent="assistant",type="prompt"} 12500
hector_llm_tokens_total{agent="assistant",type="completion"} 8200
Available Metrics¶
Request Metrics
hector_llm_requests_total- Total LLM requests (counter)hector_llm_request_duration_seconds- LLM latency (histogram)hector_agent_requests_total- Total agent requests (counter)hector_agent_request_duration_seconds- Agent latency (histogram)
Token Metrics
hector_llm_tokens_total- Token usage (counter)- Labels:
agent,type(prompt/completion)
Tool Metrics
hector_tool_calls_total- Tool invocations (counter)- Labels:
agent,tool,status(success/error) hector_tool_call_duration_seconds- Tool execution time (histogram)
Error Metrics
hector_errors_total- Total errors (counter)- Labels:
component,error_type
Custom Labels¶
Add constant labels to all metrics:
server:
observability:
metrics:
enabled: true
const_labels:
environment: production
region: us-east-1
team: ai-platform
Metrics include these labels:
hector_llm_requests_total{environment="production",region="us-east-1",...} 42
Metric Namespace¶
Customize metric prefix:
server:
observability:
metrics:
enabled: true
namespace: mycompany # Prefix: mycompany_
subsystem: agents # Becomes: mycompany_agents_
Metrics:
mycompany_agents_llm_requests_totalmycompany_agents_llm_tokens_total
Custom Endpoint¶
Change metrics path:
server:
observability:
metrics:
enabled: true
endpoint: /custom-metrics
Access at: http://localhost:8080/custom-metrics
Tracing¶
Enable Tracing¶
Zero-config (OTLP to localhost):
hector serve --model gpt-5 --observe
Configuration file:
server:
observability:
tracing:
enabled: true
exporter: otlp
endpoint: localhost:4317
Exporters¶
OTLP (OpenTelemetry Protocol)¶
Default exporter for Jaeger, Tempo, etc:
server:
observability:
tracing:
enabled: true
exporter: otlp
endpoint: localhost:4317 # gRPC
insecure: true
OTLP over HTTP:
server:
observability:
tracing:
enabled: true
exporter: otlp
endpoint: localhost:4318 # HTTP
insecure: true
Jaeger¶
Direct Jaeger exporter:
server:
observability:
tracing:
enabled: true
exporter: jaeger
endpoint: http://localhost:14268/api/traces
Zipkin¶
server:
observability:
tracing:
enabled: true
exporter: zipkin
endpoint: http://localhost:9411/api/v2/spans
Stdout (Debug)¶
Print traces to console:
server:
observability:
tracing:
enabled: true
exporter: stdout
Sampling¶
Control what fraction of traces are sampled:
server:
observability:
tracing:
enabled: true
sampling_rate: 1.0 # Sample all traces
Sampling rates:
1.0- All traces (100%)0.1- 10% of traces0.01- 1% of traces
Use lower rates in high-traffic production:
# Development: sample all
sampling_rate: 1.0
# Production: sample 10%
sampling_rate: 0.1
Service Identification¶
Identify your service in traces:
server:
observability:
tracing:
enabled: true
service_name: hector-production
service_version: 2.0.0
Trace Context¶
Traces include:
Agent Spans
- Agent name
- Request/response
- Duration
LLM Spans
- Provider (openai, anthropic)
- Model (gpt-4o, claude-sonnet-4)
- Token counts
- Latency
Tool Spans
- Tool name
- Parameters
- Result
- Duration
Database Spans
- Operation (query, insert, update)
- Table
- Duration
Payload Capture¶
Capture full LLM requests/responses (debug only):
server:
observability:
tracing:
enabled: true
capture_payloads: true
Warning: Produces large spans. Use only for debugging.
Authentication Headers¶
Send headers with trace exports:
server:
observability:
tracing:
enabled: true
endpoint: traces.example.com:4317
headers:
Authorization: Bearer ${TRACE_API_KEY}
X-Custom-Header: value
Prometheus Setup¶
Scrape Configuration¶
Add Hector to Prometheus:
# prometheus.yml
scrape_configs:
- job_name: hector
static_configs:
- targets:
- localhost:8080
metrics_path: /metrics
scrape_interval: 15s
Kubernetes Service Monitor¶
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: hector
namespace: hector
spec:
selector:
matchLabels:
app: hector
endpoints:
- port: http
path: /metrics
interval: 15s
Jaeger Setup¶
Local Jaeger¶
Run Jaeger locally:
docker run -d \
--name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 4317:4317 \
-p 4318:4318 \
-p 16686:16686 \
jaegertracing/all-in-one:latest
Configure Hector:
server:
observability:
tracing:
enabled: true
exporter: otlp
endpoint: localhost:4317
View traces: http://localhost:16686
Kubernetes Jaeger¶
Deploy Jaeger Operator:
kubectl create namespace observability
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml -n observability
Create Jaeger instance:
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: production
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch:9200
Configure Hector:
server:
observability:
tracing:
enabled: true
exporter: otlp
endpoint: jaeger-collector.observability.svc.cluster.local:4317
Grafana Dashboards¶
Metrics Dashboard¶
Create dashboard for Hector metrics:
{
"dashboard": {
"title": "Hector Metrics",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(hector_llm_requests_total[5m])"
}
]
},
{
"title": "Latency (p95)",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(hector_llm_request_duration_seconds_bucket[5m]))"
}
]
},
{
"title": "Token Usage",
"targets": [
{
"expr": "rate(hector_llm_tokens_total[1h])"
}
]
}
]
}
}
Trace Visualization¶
Connect Grafana to Jaeger:
- Add Jaeger data source
- Set URL:
http://jaeger-query:16686 - Explore traces in Grafana
Alerting¶
Prometheus Alerting Rules¶
groups:
- name: hector_alerts
rules:
- alert: HighErrorRate
expr: rate(hector_errors_total[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "Hector error rate above 10%"
- alert: HighLatency
expr: histogram_quantile(0.95, rate(hector_llm_request_duration_seconds_bucket[5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "95th percentile latency above 10s"
- alert: HighTokenUsage
expr: rate(hector_llm_tokens_total[1h]) > 1000000
for: 1h
labels:
severity: warning
annotations:
summary: "Token usage exceeds 1M/hour"
- alert: ToolFailureRate
expr: rate(hector_tool_calls_total{status="error"}[5m]) / rate(hector_tool_calls_total[5m]) > 0.2
for: 5m
labels:
severity: warning
annotations:
summary: "Tool failure rate above 20%"
Production Configuration¶
Development¶
Sample all traces, enable all metrics:
server:
observability:
tracing:
enabled: true
exporter: otlp
endpoint: localhost:4317
sampling_rate: 1.0
capture_payloads: true
metrics:
enabled: true
Production¶
Lower sampling, essential metrics:
server:
observability:
tracing:
enabled: true
exporter: otlp
endpoint: otlp-collector.observability.svc.cluster.local:4317
sampling_rate: 0.1 # Sample 10%
service_name: hector-production
service_version: 2.0.0
headers:
Authorization: Bearer ${OTLP_API_KEY}
metrics:
enabled: true
const_labels:
environment: production
cluster: us-east-1
Examples¶
Complete Observability Stack¶
# Hector config
server:
port: 8080
observability:
tracing:
enabled: true
exporter: otlp
endpoint: jaeger-collector:4317
sampling_rate: 1.0
metrics:
enabled: true
const_labels:
environment: staging
# docker-compose.yml
version: '3.8'
services:
hector:
image: ghcr.io/kadirpekel/hector:latest
ports:
- "8080:8080"
volumes:
- ./config.yaml:/app/config.yaml
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "4317:4317" # OTLP gRPC
environment:
- COLLECTOR_OTLP_ENABLED=true
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- --config.file=/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
Access:
- Hector:
http://localhost:8080 - Metrics:
http://localhost:8080/metrics - Jaeger:
http://localhost:16686 - Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3000
Best Practices¶
Sampling Strategy¶
# ✅ Good - Adaptive sampling
production:
sampling_rate: 0.1 # 10% for normal traffic
development:
sampling_rate: 1.0 # 100% for debugging
Label Cardinality¶
Avoid high-cardinality labels:
# ❌ Bad - User ID in labels (unbounded)
const_labels:
user_id: ${USER_ID}
# ✅ Good - Environment in labels (bounded)
const_labels:
environment: production
region: us-east-1
Payload Capture¶
Use only for debugging:
# ✅ Good - Disabled in production
production:
capture_payloads: false
# ✅ Good - Enabled for debugging
development:
capture_payloads: true
Next Steps¶
- Deployment Guide - Deploy with monitoring
- Security Guide - Secure observability endpoints
- Persistence Guide - Monitor database performance