Knowledge Management¶

Hector provides a powerful, built-in system for Knowledge Management. You can connect your agents to external data sources (your codebase, docs, or database) without writing any glue code.

Key Concepts¶

Document Stores: Sources of data (files, APIs, SQL).
Vector Stores: Where embeddings are saved (Chroma, Pinecone, etc.).
Embedders: Models that convert text to vectors (OpenAI, Cohere).
Context Strategy: How retrieved documents are injected into the agent's prompt.

Minimal Configuration¶

To add RAG to an agent, you first define a Document Store and then attach it to the agent.

# 1. Define where data comes from
document_stores:
  my_files:
    source:
      type: blob
      blob:
        url: "file://./docs"
      include: ["**/*.md", "**/*.go"]

# 2. Attach it to an agent
agents:
  researcher:
    llm: claude
    document_stores: [my_files]
    include_context: true # Auto-inject relevant chunks

By default, Hector uses a local vector store (Chroma) and a default embedder, so the above is all you need for a local setup.

Advanced Configuration¶

For production, you'll want to configure specific providers.

1. Configure Embedder¶

embedders:
  openai_v3:
    provider: openai
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}

2. Configure Vector Store¶

vector_stores:
  prod_db:
    type: pinecone
    api_key: ${PINECONE_API_KEY}
    environment: us-east-1
    index_name: hector-index

3. Configure Document Store¶

Link the store to your specific embedder and vector DB.

document_stores:
  company_wiki:
    source:
      type: blob
      blob:
        url: "file://./wiki"
    embedder: openai_v3
    vector_store: prod_db
    chunking:
      strategy: recursive
      size: 512
      overlap: 50
    watch: true # Auto-reindex on file variations

Context Strategies¶

How does the agent use this knowledge?

Automatic Injection (`include_context: true`)¶

Hector listens to the user's message, automatically queries the document store for relevant chunks, and injects them into the system prompt before the agent thinks.

agents:
  helper:
    include_context: true
    include_context_limit: 5 # Max 5 chunks

Tool-Based Search¶

If you set document_stores: [...] but include_context: false, the agent receives a search tool. It can decide when to look up information.

agents:
  detective:
    include_context: false
    document_stores: [evidence_db]
    # Agent will see a 'search_evidence_db' tool

MCP Document Parsing¶

Hector can use MCP servers to parse complex file types (PDFs, PPTs) before indexing.

tools:
  docling:
    type: mcp
    command: npx
    args: ["-y", "@verikod/docling-mcp"]

document_stores:
  library:
    source:
      type: blob
      blob:
        url: "file://./books"
      include: ["**/*.pdf"]
    mcp_parsers:
      tool_names: [docling]
      extensions: [pdf]

Incremental Indexing¶

Hector tracks indexed files via checksums to avoid re-indexing unchanged content.

document_stores:
  code:
    source:
      type: blob
      blob:
        url: "file://./src"
      include: ["**/*.go"]
    incremental_indexing: true  # Skip unchanged files
    watch: true                 # Auto-reindex on changes

How It Works¶

On startup, load existing file checksums from checkpoints
Compare current files against stored checksums
Index only new or modified files
Update checkpoints after successful indexing

This significantly reduces startup time for large document sets.

Advanced Retrieval¶

HyDE (Hypothetical Document Embeddings)¶

Generate a hypothetical answer to improve query relevance:

document_stores:
  knowledge:
    search:
      enable_hyde: true
      hyde_llm: claude  # LLM to generate hypothetical doc

HyDE works by: 1. Generating a hypothetical answer to the query 2. Embedding the hypothetical answer 3. Searching for similar real documents

Multi-Query Retrieval¶

Generate multiple query variations for broader coverage:

document_stores:
  research:
    search:
      enable_multi_query: true
      multi_query_llm: claude
      multi_query_count: 3  # Generate 3 query variations

Reranking¶

Use an LLM to reorder results by relevance:

document_stores:
  docs:
    search:
      enable_rerank: true
      rerank_llm: claude
      rerank_max_results: 5  # Return top 5 after reranking

Complete Search Configuration¶

document_stores:
  production:
    search:
      top_k: 20                # Initial retrieval count
      threshold: 0.7           # Minimum similarity score
      enable_hyde: true
      hyde_llm: claude
      enable_rerank: true
      rerank_llm: gpt4
      rerank_max_results: 5

Choosing a Vector Store¶

Provider	Type	Best For	Persistence
chromem	Embedded	Development, single-instance, small datasets (<100k docs)	File-based
Qdrant	External	Production, large datasets, filtering	Server
Pinecone	Cloud	Managed infrastructure, global scale	Cloud
Weaviate	External	Hybrid search (vector + keyword)	Server
Milvus	External	Kubernetes-native, high throughput	Server
Chroma	External	Python ecosystem integration	Server/file

Recommendations:

Starting out? Use chromem (default). Zero setup, works immediately.
Production single-instance? chromem with file persistence is sufficient for most apps.
Production multi-instance? Use Qdrant or Pinecone for shared vector storage across replicas.
Need keyword + vector search? Use Weaviate for built-in hybrid search.

Choosing an Embedder¶

Provider	Models	Notes
OpenAI	`text-embedding-3-small`, `text-embedding-3-large`	Best quality/cost ratio. Recommended default.
Ollama	`nomic-embed-text`, `mxbai-embed-large`, others	Fully local, no API costs. Slower.
Cohere	`embed-english-v3.0`, `embed-multilingual-v3.0`	Strong multilingual support.

Mix Providers

You can use different providers for LLMs and embeddings. A common pattern: Anthropic Claude for the agent, OpenAI for embeddings.