RAG & Semantic Search¶
RAG (Retrieval-Augmented Generation) gives agents the ability to search through documents semantically—finding information by meaning, not just keywords.
What is RAG?¶
Traditional search: "Find files containing 'authentication'"
Semantic search: "Find code related to user login"
RAG allows agents to:
- Search codebases by meaning
- Find relevant documentation
- Discover similar patterns
- Answer questions from knowledge bases
Prerequisites¶
RAG requires two components:
- Vector Database - Stores document embeddings (Qdrant)
- Embedder - Converts text to vectors (Ollama)
Quick Setup¶
1. Start Qdrant¶
docker run -d \
--name qdrant \
-p 6334:6334 \
-p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
Verify: http://localhost:6334/dashboard
2. Start Ollama¶
# Install Ollama (macOS/Linux)
curl https://ollama.ai/install.sh | sh
# Pull embedding model
ollama pull nomic-embed-text
3. Configure Hector¶
# Vector store
vector_stores:
qdrant:
type: "qdrant"
host: "localhost"
port: 6334
# Embedder
embedders:
embedder:
type: "ollama"
host: "http://localhost:11434"
model: "nomic-embed-text"
# Agent with semantic search
agents:
coder:
vector_store: "qdrant"
embedder: "embedder"
tools: ["search"]
document_stores:
- name: "codebase"
source: "directory"
path: "./src/"
4. Test It¶
hector call coder "How does authentication work in this codebase?"
The agent will semantically search your code and answer!
Document Stores¶
Document stores define what gets indexed for search. Hector supports three source types:
- Directory - Index files from local filesystem
- SQL - Index data from SQL databases (PostgreSQL, MySQL, SQLite)
- API - Index data from REST API endpoints
Document Parsing¶
Hector includes native parsers for common formats: - ✅ PDF - Basic text extraction - ✅ DOCX - Word document parsing - ✅ XLSX - Excel spreadsheet parsing
For advanced parsing needs, you can use MCP (Model Context Protocol) tools (e.g., Docling) for: - Better PDF parsing (layout detection, table extraction, OCR) - Additional formats (PPTX, HTML, audio, images) - Enhanced metadata extraction
Basic Configuration (Directory Source)¶
vector_stores:
qdrant:
type: "qdrant"
host: "localhost"
port: 6334
agents:
assistant:
vector_store: "qdrant" # Reference to vector_stores section
embedder: "embedder"
document_stores:
- name: "docs"
source: "directory" # Optional: defaults to "directory"
path: "./documentation/"
# Note: Defaults to parseable file types (text + .pdf/.docx/.xlsx)
# To restrict further: include_patterns: ["*.md", "*.txt"]
SQL Database Source¶
Index content from SQL databases:
document_stores:
database_content:
name: "database_content"
source: "sql"
sql:
driver: "sqlite3" # "postgres", "mysql", or "sqlite3"
database: "./data/content.db"
sql_tables:
- table: "articles"
columns: ["title", "content"]
id_column: "id"
updated_column: "updated_at"
metadata_columns: ["author", "category"]
REST API Source¶
Index content from REST API endpoints:
document_stores:
api_content:
name: "api_content"
source: "api"
api:
base_url: "https://api.example.com"
auth:
type: "bearer"
token: "${API_TOKEN}"
endpoints:
- path: "/articles"
id_field: "id"
content_field: "title,content"
metadata_fields: ["author", "published_at"]
Multiple Document Stores¶
agents:
researcher:
vector_store: "qdrant"
embedder: "embedder"
document_stores:
- name: "codebase"
source: "directory"
path: "./src/"
chunk_size: 512
- name: "documentation"
source: "directory"
path: "./docs/"
include_patterns: ["*.md"]
chunk_size: 1024
- name: "database_content"
source: "sql"
sql:
driver: "sqlite3"
database: "./data/content.db"
sql_tables:
- table: "articles"
columns: ["title", "content"]
id_column: "id"
chunk_size: 800
- name: "api_content"
source: "api"
api:
base_url: "https://api.example.com"
endpoints:
- path: "/articles"
id_field: "id"
content_field: "title,content"
chunk_size: 800
Configuration Options¶
document_stores:
- name: "my_store"
source: "directory" # "directory", "sql", or "api"
path: "./path1/" # For directory source
include_patterns: ["*.ext"] # Optional: defaults to common text files + .pdf/.docx/.xlsx
# Chunking
chunk_size: 512 # Characters per chunk
chunk_overlap: 50 # Overlap between chunks
# Parsing
parser: "native" # native|custom|plugin
# Indexing
collection: "my_collection" # Qdrant collection name
batch_size: 100 # Documents per batch
# Filtering (directory source only)
exclude_patterns: ["*_test.go", "*.min.js"]
How RAG Works¶
Indexing Phase¶
1. Hector reads documents from paths
├─ ./src/auth.go
├─ ./src/user.go
└─ ./src/db.go
2. Documents split into chunks
├─ Chunk 1: "package auth..."
├─ Chunk 2: "func Login..."
└─ Chunk 3: "func Validate..."
3. Each chunk converted to embedding
├─ [0.23, -0.45, 0.67, ...] (768 dimensions)
├─ [0.12, -0.34, 0.56, ...]
└─ [-0.45, 0.23, 0.78, ...]
4. Embeddings stored in Qdrant
├─ Collection: "codebase"
└─ Indexed for fast similarity search
Search Phase¶
1. User asks: "How does authentication work?"
2. Query embedded: [0.25, -0.43, 0.69, ...]
3. Vector database finds similar chunks (cosine similarity)
├─ auth.go chunk (similarity: 0.92)
├─ user.go chunk (similarity: 0.85)
└─ db.go chunk (similarity: 0.78)
4. Top chunks injected into agent context
5. Agent answers using retrieved context
Vector Databases¶
Hector supports multiple vector databases. Choose based on your needs:
| Database | Hybrid Search | Best For |
|---|---|---|
| Qdrant | ✅ (RRF) | Self-hosted, production-ready |
| Pinecone | ✅ (RRF) | Managed cloud service |
| Weaviate | ✅ (Native) | Native hybrid search, GraphQL API |
| Milvus | ✅ (RRF) | Large-scale, high-performance |
| Chroma | ✅ (RRF) | Simple, lightweight, embedded |
Qdrant (Recommended for Self-Hosted)¶
vector_stores:
qdrant:
type: "qdrant"
host: "localhost"
port: 6334 # gRPC port (default: 6334)
api_key: "" # Optional for Qdrant Cloud
enable_tls: false # Enable for cloud
Docker:
docker run -d \
--name qdrant \
-p 6334:6334 \
-v qdrant_data:/qdrant/storage \
qdrant/qdrant
Qdrant Cloud:
vector_stores:
qdrant_cloud:
type: "qdrant"
host: "your-cluster.qdrant.io"
port: 6334
api_key: "${QDRANT_API_KEY}"
enable_tls: true
Pinecone (Managed Cloud)¶
vector_stores:
pinecone:
type: "pinecone"
api_key: "${PINECONE_API_KEY}"
environment: "us-east-1" # Your Pinecone environment
Setup: 1. Create account at pinecone.io 2. Create an index 3. Get your API key and environment 4. Configure in Hector
Weaviate (Native Hybrid Search)¶
vector_stores:
weaviate:
type: "weaviate"
host: "localhost"
port: 8080 # Default Weaviate port
api_key: "" # Optional API key
enable_tls: false
Docker:
docker run -d \
--name weaviate \
-p 8080:8080 \
-e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
-v weaviate_data:/var/lib/weaviate \
semitechnologies/weaviate:latest
Features: - Native hybrid search support (no fallback needed) - GraphQL API - Built-in vectorization options
Milvus (High-Performance)¶
vector_stores:
milvus:
type: "milvus"
host: "localhost"
port: 19530 # Default Milvus port
api_key: "" # Optional
enable_tls: false
Docker (Standalone):
docker run -d \
--name milvus-standalone \
-p 19530:19530 \
-p 9091:9091 \
milvusdb/milvus:latest
Features: - Optimized for large-scale deployments - High-performance vector search - Supports distributed deployments
Chroma (Lightweight)¶
vector_stores:
chroma:
type: "chroma"
host: "localhost"
port: 8000 # Default Chroma port
api_key: "" # Optional
enable_tls: false
Docker:
docker run -d \
--name chroma \
-p 8000:8000 \
chromadb/chroma:latest
Features: - Simple and lightweight - Good for development and small deployments - Easy to set up
Custom Vector Databases (Plugins)¶
plugins:
databases:
- name: "my-vector-db"
protocol: "grpc"
path: "/path/to/plugin"
vector_stores:
custom:
type: "plugin:my-vector-db"
# Custom configuration
Embedders¶
Ollama (Recommended)¶
embedders:
embedder:
type: "ollama"
host: "http://localhost:11434"
model: "nomic-embed-text" # Best for code
timeout: 30
Available Models:
- nomic-embed-text - General purpose, 768 dimensions (recommended)
- all-minilm - Lightweight, 384 dimensions
- mxbai-embed-large - Large, 1024 dimensions
Setup:
ollama pull nomic-embed-text
Custom Embedders (Plugins)¶
plugins:
embedders:
- name: "my-embedder"
protocol: "grpc"
path: "/path/to/plugin"
embedders:
custom:
type: "plugin:my-embedder"
# Custom configuration
Advanced Configuration¶
Chunking Strategy¶
Balance between context and precision:
# Small chunks (precise, less context)
document_stores:
- name: "precise"
chunk_size: 256
chunk_overlap: 25
# Good for: Code snippets, specific facts
# Medium chunks (balanced)
document_stores:
- name: "balanced"
chunk_size: 512
chunk_overlap: 50
# Good for: General purpose
# Large chunks (more context, less precise)
document_stores:
- name: "contextual"
chunk_size: 2048
chunk_overlap: 200
# Good for: Documentation, narratives
Search Configuration¶
Configure how documents are searched and ranked:
agents:
searcher:
vector_store: "qdrant"
embedder: "embedder"
document_stores:
- name: "docs"
paths: ["./"]
search:
top_k: 10 # Default number of results (default: 10)
threshold: 0.5 # Minimum similarity score 0.0-1.0 (default: 0.5)
preserve_case: true # Don't lowercase queries (default: true)
search_mode: "hybrid" # "vector", "hybrid", "keyword", "multi_query", or "hyde"
hybrid_alpha: 0.5 # Blending factor for hybrid search (0.0-1.0)
# Optional: LLM-based re-ranking
rerank:
enabled: true
llm: "gpt-4o-mini"
max_results: 20
# Optional: Multi-query expansion
multi_query:
enabled: false
llm: "gpt-4o-mini"
num_variations: 3
Basic Parameters:
- top_k: Maximum number of results to return (default: 10)
- threshold: Minimum cosine similarity score (0.0-1.0). Results below this are filtered out.
- preserve_case: Whether to preserve query case (useful for code search)
Advanced Search Modes:
- search_mode: "vector" - Pure semantic search (default, fastest)
- search_mode: "hybrid" - Combines keyword and vector search (better recall)
- search_mode: "keyword" - Keyword-focused search
- search_mode: "multi_query" - Expands query into multiple variations (improves recall)
- search_mode: "hyde" - Uses hypothetical document embeddings
Re-ranking:
- rerank.enabled: true - Enable LLM-based re-ranking for better result quality
- Requires an LLM provider configured
- Only reranks top N results (configurable via max_results)
See Search Architecture for complete documentation on all search features.
MCP Document Parsing¶
Use MCP (Model Context Protocol) tools for advanced document parsing. This allows you to use services like Docling for better parsing quality or additional formats.
Prerequisites:
1. Configure MCP tools in your tools section
2. Start your MCP server (e.g., Docling MCP server)
Quick Start (Shortcut):
Use the mcp_parser_tool shortcut with docs_folder to auto-configure MCP parsers:
agents:
assistant:
docs_folder: "./documents"
mcp_parser_tool: "convert_document_into_docling_document" # Docling tool name
Or via CLI:
hector serve --docs-folder ./documents --mcp-parser-tool "convert_document_into_docling_document"
Note: Tool names vary by MCP server. For Docling, use convert_document_into_docling_document. Check available tools if you see a warning message.
Basic Configuration (Explicit):
tools:
mcp_tools:
- server:
url: "http://localhost:3000"
protocol: "mcp"
tools:
- "convert_document_into_docling_document"
document_stores:
knowledge_base:
path: "./documents"
mcp_parsers:
tool_names: ["convert_document_into_docling_document"]
Advanced Configuration:
document_stores:
research_papers:
path: "./papers"
mcp_parsers:
tool_names: ["convert_document_into_docling_document"] # Docling tool name
extensions: [".pdf", ".pptx", ".html"] # Only these formats
priority: 10 # Override native parsers
prefer_native: false # Use MCP first
Configuration Options:
| Option | Type | Default | Description |
|---|---|---|---|
tool_names |
[]string |
Required | MCP tool names to try (in order) |
extensions |
[]string |
[] (all) |
File extensions to handle (empty = all binary files) |
priority |
int |
8 |
Extractor priority (higher = preferred) |
prefer_native |
bool |
false |
Use native parsers first, MCP as fallback |
Use Cases:
-
Override Native Parsers (better quality):
mcp_parsers: tool_names: ["parse_document"] priority: 10 # Higher than native (5) -
Use as Fallback (when native fails):
mcp_parsers: tool_names: ["parse_document"] prefer_native: true priority: 4 # Lower than native (5) -
Format-Specific (unsupported formats):
mcp_parsers: tool_names: ["parse_document"] extensions: [".pptx", ".html"] # Formats not supported natively
Benefits: - ✅ Better PDF parsing (layout, tables, OCR) - ✅ Additional formats (PPTX, HTML, audio, images) - ✅ Enhanced metadata extraction - ✅ Works with any MCP service (not just Docling)
Custom Parsers¶
Parse non-standard formats:
plugins:
parsers:
- name: "pdf-parser"
protocol: "grpc"
path: "/path/to/parser"
document_stores:
- name: "pdfs"
paths: ["./documents/"]
include_patterns: ["*.pdf"]
parser: "plugin:pdf-parser"
Performance Optimization¶
Indexing Performance¶
document_stores:
- name: "large_codebase"
paths: ["./"]
batch_size: 100 # Index 100 docs at a time
parallel: true # Parallel processing
cache_embeddings: true # Cache for faster re-indexing
Search Performance¶
agents:
fast_search:
document_stores:
- name: "optimized"
search_config:
top_k: 3 # Fewer results = faster
threshold: 0.85 # Default: 0.85, higher threshold = fewer candidates
Resource Management¶
# Ollama configuration
embedders:
embedder:
type: "ollama"
host: "http://localhost:11434"
timeout: 30
batch_size: 32 # Embed 32 chunks at once
# Vector store configuration
vector_stores:
qdrant:
type: "qdrant"
host: "localhost"
port: 6334
Use Cases¶
Code Search¶
agents:
code_assistant:
vector_store: "qdrant"
embedder: "embedder"
tools: ["search", "write_file"]
search:
search_mode: "hybrid" # Better for code with specific terms
hybrid_alpha: 0.6 # Favor vector but include keywords
preserve_case: true # Important for code identifiers
document_stores:
- name: "codebase"
paths: ["./src/", "./lib/"]
include_patterns: ["*.go", "*.py", "*.js", "*.ts"]
chunk_size: 512
prompt:
system_role: |
You are a code assistant. Use semantic search to find
relevant code before answering questions or making changes.
Documentation Assistant¶
agents:
docs_bot:
vector_store: "qdrant"
embedder: "embedder"
document_stores:
- name: "documentation"
paths: ["./docs/"]
include_patterns: ["*.md", "*.rst"]
chunk_size: 1024
prompt:
system_role: |
Answer questions based on the documentation.
Always cite your sources.
Research Assistant¶
agents:
researcher:
vector_store: "qdrant"
embedder: "embedder"
document_stores:
- name: "papers"
paths: ["./research/"]
include_patterns: ["*.pdf", "*.md"]
chunk_size: 2048
- name: "notes"
paths: ["./notes/"]
include_patterns: ["*.md"]
chunk_size: 512
Monitoring & Debugging¶
Check Indexing Status¶
# View Qdrant dashboard
open http://localhost:6334/dashboard
# Check collection info
curl http://localhost:6334/collections/codebase
Debug Search Results¶
agents:
debug:
reasoning:
document_stores:
- name: "test"
paths: ["./"]
debug: true # Log search results
Re-index Documents¶
# Delete collection and re-index
curl -X DELETE http://localhost:6334/collections/codebase
# Restart Hector to trigger re-indexing
hector serve --config config.yaml
Troubleshooting¶
"Qdrant connection failed"¶
# Check if Qdrant is running
docker ps | grep qdrant
# Check logs
docker logs qdrant
# Verify port
curl http://localhost:6334/
"Ollama not responding"¶
# Check if Ollama is running
ollama list
# Pull model if missing
ollama pull nomic-embed-text
# Check service
curl http://localhost:11434/api/tags
"Search returns no results"¶
- Check documents are indexed: View Qdrant dashboard
- Verify file patterns match your files
- Lower
thresholdin search config - Check chunk sizes aren't too large
Best Practices¶
1. Choose the Right Chunk Size¶
# Code: Small chunks for precision
chunk_size: 512
# Docs: Medium chunks for balance
chunk_size: 1024
# Narratives: Large chunks for context
chunk_size: 2048
2. Use Appropriate Overlap¶
# Small chunks: 10-20% overlap
chunk_size: 256
chunk_overlap: 25
# Large chunks: 5-10% overlap
chunk_size: 2048
chunk_overlap: 200
3. Filter Irrelevant Files¶
Hector automatically excludes common files that shouldn't be indexed. You can also add custom exclusions.
Default Exclusions¶
Hector automatically skips:
Version Control:
**/.git/**, **/.svn/**, **/.hg/**, **/.bzr/**
Dependencies:
**/node_modules/**, **/vendor/**, **/venv/**, **/__pycache__/**
**/.npm/**, **/.yarn/**, **/gems/**, **/.bundle/**
Build Artifacts:
**/dist/**, **/build/**, **/target/**, **/.next/**, **/bin/**
**/.cache/**, **/.parcel-cache/**
IDE & Editor:
**/.vscode/**, **/.idea/**, **/.DS_Store
**/*.swp, **/*~
Binary Files:
*.exe, *.dll, *.so, *.pyc, *.o
*.png, *.jpg, *.mp4, *.mp3
*.zip, *.tar, *.gz
Logs & Temp:
*.log, *.tmp, *.cache
**/logs/**, **/tmp/**
Lock Files:
**/package-lock.json, **/yarn.lock
**/Gemfile.lock, **/Cargo.lock
Empty Files:
All files with 0 bytes are automatically skipped
See the full list: 112 default exclusions
Custom Exclusions¶
Add your own patterns:
document_stores:
- name: "filtered_codebase"
paths: ["./"]
include_patterns: ["*.go", "*.py"]
exclude_patterns: [
# Custom exclusions (in addition to defaults)
"*_test.go", # Test files
"**/*_mock.go", # Mock files
"**/testdata/**", # Test data directories
"*.generated.go", # Generated code
"**/experiments/**" # Experimental code
]
Pattern Syntax¶
Hector supports flexible glob patterns:
# Directory patterns
"**/node_modules/**" # Any node_modules directory
"**/.git/**" # Any .git directory
# File extension patterns
"*.log" # All .log files
"*.pyc" # All .pyc files
# Specific files
"**/.DS_Store" # .DS_Store anywhere
"**/package-lock.json" # Lock files
# Combined patterns
"**/*.min.js" # Minified JS anywhere
"**/dist/*.map" # Source maps in dist
Override Defaults¶
To use ONLY your patterns (no defaults):
document_stores:
- name: "minimal"
paths: ["./"]
# Explicitly set empty to disable defaults
exclude_patterns: []
# Now only your include_patterns apply
include_patterns: ["*.md"]
⚠️ Warning: Disabling default exclusions may index binary files, node_modules, etc.
Performance Tips¶
DO:
- ✅ Exclude large directories (node_modules, vendor)
- ✅ Exclude binary files (images, videos, archives)
- ✅ Exclude build artifacts (dist, build)
- ✅ Use specific patterns (*.test.js vs **test**)
DON'T:
- ❌ Index empty files (auto-skipped)
- ❌ Index minified files (*.min.js)
- ❌ Index compiled files (*.pyc, *.o)
- ❌ Use overly broad patterns (**/*)
Example: Production Setup¶
document_stores:
- name: "production_codebase"
paths: ["./src/", "./lib/"]
# Explicit inclusions
include_patterns: [
"*.go", "*.py", "*.js", "*.ts",
"*.md", "*.yaml", "*.json"
]
# Additional exclusions (on top of defaults)
exclude_patterns: [
# Project-specific
"*_test.go",
"**/*_mock.py",
"**/fixtures/**",
# Generated code
"*.pb.go",
"*.generated.*",
# Docs we don't want
"**/node_modules/**/@types/**",
"**/vendor/github.com/**/testdata/**"
]
# Skip large files
max_file_size: 10485760 # 10MB
4. Organize by Type¶
document_stores:
- name: "source_code"
paths: ["./src/"]
chunk_size: 512
- name: "documentation"
paths: ["./docs/"]
chunk_size: 1024
- name: "configs"
paths: ["./config/"]
chunk_size: 256
Next Steps¶
- Building Enterprise RAG Systems - Complete step-by-step guide
- Tools - Using the search tool
- Memory - Long-term memory with vectors
- Building a Coding Assistant - Full tutorial
Related Topics¶
- Agent Overview - Understanding agents
- Configuration Reference - All RAG options
- Architecture - How RAG works internally