Knowledge Management¶
Hector provides a powerful, built-in system for Knowledge Management. You can connect your agents to external data sources (your codebase, docs, or database) without writing any glue code.
Key Concepts¶
- Document Stores: Sources of data (files, APIs, SQL).
- Vector Stores: Where embeddings are saved (Chroma, Pinecone, etc.).
- Embedders: Models that convert text to vectors (OpenAI, Cohere).
- Context Strategy: How retrieved documents are injected into the agent's prompt.
Minimal Configuration¶
To add RAG to an agent, you first define a Document Store and then attach it to the agent.
# 1. Define where data comes from
document_stores:
my_files:
source:
type: directory
include: ["./docs/**/*.md", "./src/**/*.go"]
# 2. Attach it to an agent
agents:
researcher:
llm: claude
document_stores: [my_files]
include_context: true # Auto-inject relevant chunks
By default, Hector uses a local vector store (Chroma) and a default embedder, so the above is all you need for a local setup.
Advanced Configuration¶
For production, you'll want to configure specific providers.
1. Configure Embedder¶
embedders:
openai_v3:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
2. Configure Vector Store¶
vector_stores:
prod_db:
type: pinecone
api_key: ${PINECONE_API_KEY}
environment: us-east-1
index_name: hector-index
3. Configure Document Store¶
Link the store to your specific embedder and vector DB.
document_stores:
company_wiki:
source:
type: directory
include: ["./wiki/**/*"]
embedder: openai_v3
vector_store: prod_db
chunking:
strategy: recursive
size: 512
overlap: 50
watch: true # Auto-reindex on file variations
Context Strategies¶
How does the agent use this knowledge?
Automatic Injection (include_context: true)¶
Hector listens to the user's message, automatically queries the document store for relevant chunks, and injects them into the system prompt before the agent thinks.
agents:
helper:
include_context: true
include_context_limit: 5 # Max 5 chunks
Tool-Based Search¶
If you set document_stores: [...] but include_context: false, the agent receives a search tool. It can decide when to look up information.
agents:
detective:
include_context: false
document_stores: [evidence_db]
# Agent will see a 'search_evidence_db' tool
MCP Document Parsing¶
Hector can use MCP servers to parse complex file types (PDFs, PPTs) before indexing.
tools:
docling:
type: mcp
command: npx
args: ["-y", "@verikod/docling-mcp"]
document_stores:
library:
source:
type: directory
include: ["./books/**/*.pdf"]
mcp_parsers:
tool_names: [docling]
extensions: [pdf]
Incremental Indexing¶
Hector tracks indexed files via checksums to avoid re-indexing unchanged content.
document_stores:
code:
source:
type: directory
include: ["./src/**/*.go"]
incremental_indexing: true # Skip unchanged files
watch: true # Auto-reindex on changes
How It Works¶
- On startup, load existing file checksums from checkpoints
- Compare current files against stored checksums
- Index only new or modified files
- Update checkpoints after successful indexing
This significantly reduces startup time for large document sets.
Advanced Retrieval¶
HyDE (Hypothetical Document Embeddings)¶
Generate a hypothetical answer to improve query relevance:
document_stores:
knowledge:
search:
enable_hyde: true
hyde_llm: claude # LLM to generate hypothetical doc
HyDE works by: 1. Generating a hypothetical answer to the query 2. Embedding the hypothetical answer 3. Searching for similar real documents
Multi-Query Retrieval¶
Generate multiple query variations for broader coverage:
document_stores:
research:
search:
enable_multi_query: true
multi_query_llm: claude
multi_query_count: 3 # Generate 3 query variations
Reranking¶
Use an LLM to reorder results by relevance:
document_stores:
docs:
search:
enable_rerank: true
rerank_llm: claude
rerank_max_results: 5 # Return top 5 after reranking
Complete Search Configuration¶
document_stores:
production:
search:
top_k: 20 # Initial retrieval count
threshold: 0.7 # Minimum similarity score
enable_hyde: true
hyde_llm: claude
enable_rerank: true
rerank_llm: gpt4
rerank_max_results: 5
Choosing a Vector Store¶
| Provider | Type | Best For | Persistence |
|---|---|---|---|
| chromem | Embedded | Development, single-instance, small datasets (<100k docs) | File-based |
| Qdrant | External | Production, large datasets, filtering | Server |
| Pinecone | Cloud | Managed infrastructure, global scale | Cloud |
| Weaviate | External | Hybrid search (vector + keyword) | Server |
| Milvus | External | Kubernetes-native, high throughput | Server |
| Chroma | External | Python ecosystem integration | Server/file |
Recommendations:
- Starting out? Use
chromem(default). Zero setup, works immediately. - Production single-instance?
chromemwith file persistence is sufficient for most apps. - Production multi-instance? Use
QdrantorPineconefor shared vector storage across replicas. - Need keyword + vector search? Use
Weaviatefor built-in hybrid search.
Choosing an Embedder¶
| Provider | Models | Notes |
|---|---|---|
| OpenAI | text-embedding-3-small, text-embedding-3-large |
Best quality/cost ratio. Recommended default. |
| Ollama | nomic-embed-text, mxbai-embed-large, others |
Fully local, no API costs. Slower. |
| Cohere | embed-english-v3.0, embed-multilingual-v3.0 |
Strong multilingual support. |
Mix Providers
You can use different providers for LLMs and embeddings. A common pattern: Anthropic Claude for the agent, OpenAI for embeddings.