Observability¶

Monitor your Hector deployment with structured logs, Prometheus metrics, and OpenTelemetry tracing.

Logging¶

Hector uses structured logging for production observability.

Configuration¶

hector serve \
  --log-level info \
  --log-format json \
  --log-file /var/log/hector.log

Flag	Env Variable	Default	Options
`--log-level`	`HECTOR_LOG_LEVEL`	`info`	`debug`, `info`, `warn`, `error`
`--log-format`	`HECTOR_LOG_FORMAT`	`text`	`text`, `json`
`--log-file`	`HECTOR_LOG_FILE`	stdout	File path

JSON Log Format¶

Production-ready for log aggregators (Elasticsearch, Loki, Splunk):

{
  "time": "2026-01-20T18:00:00Z",
  "level": "INFO",
  "msg": "Agent invocation completed",
  "agent": "assistant",
  "session_id": "sess_123",
  "duration_ms": 1250,
  "tokens_used": 450
}

Metrics (Prometheus)¶

Enable the /metrics endpoint:

hector serve --metrics

Available Metrics¶

Metric	Type	Description
`hector_requests_total`	Counter	Total HTTP requests
`hector_request_duration_seconds`	Histogram	Request latency
`hector_agent_invocations_total`	Counter	Agent executions
`hector_llm_tokens_total`	Counter	Token usage (prompt/completion)
`hector_tool_calls_total`	Counter	Tool invocations
`hector_scheduler_triggers_total`	Counter	Scheduled trigger firings
`hector_notifications_total`	Counter	Outbound webhook notifications
`hector_guardrail_violations_total`	Counter	Blocked inputs/outputs

Labels¶

Common labels across metrics:

Label	Description
`app`	App/tenant name
`agent`	Agent name
`status`	`success`, `error`
`tool`	Tool name (for tool metrics)

Prometheus Scrape Config¶

scrape_configs:
  - job_name: 'hector'
    static_configs:
      - targets: ['hector:8080']
    metrics_path: /metrics

Tracing (OpenTelemetry)¶

Enable distributed tracing:

hector serve --tracing-endpoint "jaeger:4317"

Trace Spans¶

Span	Description
`http.request`	Full request lifecycle
`agent.run`	Agent execution
`llm.generate`	LLM API call
`tool.call`	Tool invocation
`vectorstore.search`	RAG retrieval
`guardrail.check`	Input/output validation

Configuration¶

Flag	Env Variable	Default
`--tracing-endpoint`	`HECTOR_TRACING_ENDPOINT`	`localhost:4317`

Jaeger Setup¶

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  jaegertracing/all-in-one:latest

hector serve --tracing-endpoint "localhost:4317"

Access UI at http://localhost:16686.

Grafana Dashboard¶

Import metrics into Grafana using the hector_* namespace.

Key Panels¶

Panel	Query
Request Rate	`rate(hector_requests_total[5m])`
Agent Latency P95	`histogram_quantile(0.95, rate(hector_request_duration_seconds_bucket[5m]))`
Token Usage	`sum(rate(hector_llm_tokens_total[1h])) by (agent)`
Error Rate	`rate(hector_requests_total{status="error"}[5m])`
Tool Usage	`topk(10, sum by (tool) (hector_tool_calls_total))`

Alerting Examples¶

# High error rate
- alert: HectorHighErrorRate
  expr: rate(hector_requests_total{status="error"}[5m]) > 0.1
  for: 5m
  labels:
    severity: warning

# Token budget exhaustion
- alert: HectorHighTokenUsage
  expr: sum(increase(hector_llm_tokens_total[1h])) > 100000
  for: 1h
  labels:
    severity: warning

Health Endpoint¶

curl http://localhost:8080/health

Response:

{
  "status": "ok",
  "version": "v1.20.0",
  "database": "connected"
}

Observability¶

Logging¶

Configuration¶

JSON Log Format¶

Metrics (Prometheus)¶

Available Metrics¶

Labels¶

Prometheus Scrape Config¶

Tracing (OpenTelemetry)¶

Trace Spans¶

Configuration¶

Jaeger Setup¶

Grafana Dashboard¶

Key Panels¶

Alerting Examples¶

Health Endpoint¶

Next Steps¶