Skip to content

RAG (Retrieval-Augmented Generation)

RAG enhances agents with document search capabilities, enabling knowledge retrieval from your data sources.

Quick Start

Zero-Config RAG

hector serve \
  --model gpt-4o \
  --docs-folder ./documents \
  --tools

This automatically:

  • Creates embedded vector database (chromem)
  • Configures embedder (auto-detected from LLM provider, or OpenAI/Ollama fallback)
  • Indexes documents (including PDF, DOCX, XLSX via native parsers)
  • Adds search tool
  • Watches for file changes

Zero-Config with Auto-Context

For automatic context injection (no need for agents to call the search tool):

hector serve \
  --model gpt-4o \
  --docs-folder ./documents \
  --include-context

When --include-context is enabled: - Relevant documents are automatically retrieved based on user queries - Context is injected into the system prompt before LLM calls - The agent doesn't need to explicitly call the search tool

Advanced Zero-Config Options

With external vector database:

hector serve \
  --model gpt-4o \
  --docs-folder ./documents \
  --vector-type qdrant \
  --vector-host localhost:6333 \
  --tools

With custom embedder:

hector serve \
  --model gpt-4o \
  --docs-folder ./documents \
  --embedder-provider ollama \
  --embedder-url http://localhost:11434 \
  --embedder-model nomic-embed-text \
  --tools

With Docling for advanced document parsing:

hector serve \
  --model gpt-4o \
  --docs-folder ./documents \
  --mcp-url http://localhost:8000/mcp \
  --mcp-parser-tool convert_document_into_docling_document \
  --tools

With Docker path mapping (for containerized MCP services):

# Syntax: --docs-folder local_path:remote_path
hector serve \
  --model gpt-4o \
  --docs-folder ./documents:/docs \
  --mcp-url http://localhost:8000/mcp \
  --mcp-parser-tool convert_document_into_docling_document \
  --tools

The local:remote syntax maps local paths to container paths when using Docker-based MCP parsers like Docling.

Config File RAG

version: "2"

vector_stores:
  local:
    type: chromem
    persist_path: .hector/vectors
    compress: true

embedders:
  default:
    provider: openai
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}

document_stores:
  docs:
    source:
      type: directory
      path: ./documents
    vector_store: local
    embedder: default
    watch: true

agents:
  assistant:
    llm: default
    document_stores: [docs]
    tools: [search]

Vector Stores

Chromem (Embedded)

No external dependencies, persists to disk:

vector_stores:
  local:
    type: chromem
    persist_path: .hector/vectors
    compress: true  # Gzip compression

Qdrant

External vector database:

vector_stores:
  qdrant:
    type: qdrant
    host: localhost
    port: 6333
    api_key: ${QDRANT_API_KEY}
    enable_tls: true
    collection: hector_docs

Pinecone

Cloud vector database:

vector_stores:
  pinecone:
    type: pinecone
    api_key: ${PINECONE_API_KEY}
    environment: us-east-1-aws
    index_name: hector-docs

Weaviate

vector_stores:
  weaviate:
    type: weaviate
    host: localhost
    port: 8080
    api_key: ${WEAVIATE_API_KEY}

Milvus

vector_stores:
  milvus:
    type: milvus
    host: localhost
    port: 19530

Embedders

OpenAI

embedders:
  openai:
    provider: openai
    model: text-embedding-3-small  # or text-embedding-3-large
    api_key: ${OPENAI_API_KEY}

Ollama (Local)

embedders:
  ollama:
    provider: ollama
    model: nomic-embed-text  # or mxbai-embed-large
    base_url: http://localhost:11434

Cohere

embedders:
  cohere:
    provider: cohere
    model: embed-english-v3.0
    api_key: ${COHERE_API_KEY}

Document Sources

Directory Source

Index files from a folder:

document_stores:
  docs:
    source:
      type: directory
      path: ./documents
      include:
        - "*.md"
        - "*.txt"
        - "*.pdf"
      exclude:
        - .git
        - node_modules
        - "*.tmp"
      max_file_size: 10485760  # 10MB

Default excludes: .*, node_modules, __pycache__, vendor, .git

SQL Source

Index data from database:

document_stores:
  knowledge_base:
    source:
      type: sql
      sql:
        database: main  # References databases config
        query: SELECT id, title, content FROM articles
        id_column: id
        content_columns: [title, content]
        metadata_columns: [category, author]

API Source

Fetch documents from API:

document_stores:
  external_docs:
    source:
      type: api
      api:
        url: https://api.example.com/documents
        method: GET
        headers:
          Authorization: Bearer ${API_TOKEN}
        response_path: documents
        id_field: id
        content_fields: [title, body]
        metadata_fields: [category, tags]

Collection Source

Use existing vector collection:

document_stores:
  existing:
    source:
      type: collection
      collection: pre_populated_collection
    vector_store: qdrant
    embedder: default

Chunking Strategies

Simple Chunking

Fixed-size chunks with overlap:

document_stores:
  docs:
    chunking:
      strategy: simple
      size: 1000      # Characters per chunk
      overlap: 200    # Overlap between chunks

Best for: General text, documentation

Semantic Chunking

Chunk by semantic boundaries:

document_stores:
  docs:
    chunking:
      strategy: semantic
      size: 1000
      overlap: 100

Best for: Natural language content

Sentence Chunking

Chunk by sentences:

document_stores:
  docs:
    chunking:
      strategy: sentence
      sentences_per_chunk: 5

Best for: Precise retrieval, Q&A

Search Configuration

document_stores:
  docs:
    search:
      top_k: 10           # Return top 10 results
      threshold: 0.5      # Minimum similarity score
document_stores:
  docs:
    search:
      top_k: 10
      threshold: 0.5
      rerank: true        # Enable reranking
      rerank_top_k: 3     # Return top 3 after reranking

Reranking improves relevance by rescoring initial results.

Watch Mode

Auto-reindex on file changes:

document_stores:
  docs:
    watch: true                  # Enable file watching
    incremental_indexing: true   # Only reindex changed files

When files change:

  • Added files: indexed immediately
  • Modified files: re-indexed
  • Deleted files: removed from index

Indexing Configuration

Control indexing behavior:

document_stores:
  docs:
    indexing:
      max_concurrent: 8        # Parallel workers
      retry:
        max_retries: 3         # Retry failed documents
        base_delay: 1s         # Delay between retries
        max_delay: 30s

Document Parsing

Hector supports multiple document parsers with automatic fallback:

Parser Priority Formats When Used
MCP Parser (e.g., Docling) 8 PDF, DOCX, PPTX, XLSX, HTML When --mcp-parser-tool configured
Native Parsers 5 PDF, DOCX, XLSX Built-in, always available
Text Extractor 1 Plain text, code files Fallback for text-based files

Native Document Parsers

Hector includes built-in parsers for common document formats:

Supported formats:

  • PDF - Text extraction with page markers
  • DOCX - Word document content extraction
  • XLSX - Excel spreadsheet with cell references (max 1000 cells/sheet)

Native parsers work automatically for ~70% of documents. For complex layouts, tables, or scanned documents, use MCP parsers like Docling.

Default include patterns:

  • Text/code: *.md, *.txt, *.rst, *.go, *.py, *.js, *.ts, *.json, *.yaml, etc.
  • Binary documents: *.pdf, *.docx, *.xlsx

MCP Document Parsing

For advanced parsing (OCR, complex layouts, tables), use MCP tools like Docling:

tools:
  docling:
    type: mcp
    url: http://localhost:8000/mcp
    transport: streamable-http

document_stores:
  docs:
    source:
      type: directory
      path: ./documents
    mcp_parsers:
      tool_names:
        - convert_document_into_docling_document
      extensions:
        - .pdf
        - .docx
        - .pptx
        - .xlsx
      priority: 8  # Higher than native (5)
      path_prefix: "/docs"  # For Docker path mapping

Zero-config with Docling:

hector serve \
  --model gpt-4o \
  --docs-folder ./documents \
  --mcp-url http://localhost:8000/mcp \
  --mcp-parser-tool convert_document_into_docling_document

With Docker path mapping:

When Docling runs in Docker, use path mapping to remap local paths:

# Docker mount: -v $(pwd)/documents:/docs:ro
hector serve \
  --model gpt-4o \
  --docs-folder ./documents:/docs \
  --mcp-url http://localhost:8000/mcp \
  --mcp-parser-tool convert_document_into_docling_document

The local:remote syntax ensures paths are correctly translated for the container.

See Using Docling with Hector for a complete tutorial.

Agent Integration

Agent calls search tool explicitly:

agents:
  assistant:
    llm: default
    document_stores: [docs]  # Access to docs store
    tools: [search]          # Search tool
    instruction: |
      Use the search tool to find relevant information.
      Always cite sources in your responses.

Auto-Injected Context

Automatically inject relevant context:

agents:
  assistant:
    llm: default
    document_stores: [docs]
    include_context: true            # Auto-inject
    include_context_limit: 5         # Max 5 documents
    include_context_max_length: 500  # Max 500 chars per doc

When enabled:

  • User message triggers search
  • Top K documents retrieved
  • Injected into system prompt
  • Agent receives context automatically

Scoped Access

Limit document store access per agent:

document_stores:
  internal_docs:
    source:
      type: directory
      path: ./internal

  public_docs:
    source:
      type: directory
      path: ./public

agents:
  # Public agent: public docs only
  public_assistant:
    document_stores: [public_docs]

  # Internal agent: all docs
  internal_assistant:
    document_stores: [internal_docs, public_docs]

  # No RAG access
  restricted:
    document_stores: []

Multi-Store Configuration

Configure multiple document stores:

vector_stores:
  chromem:
    type: chromem
    persist_path: .hector/vectors

embedders:
  default:
    provider: openai
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}

document_stores:
  codebase:
    source:
      type: directory
      path: ./src
      include: ["*.go", "*.ts", "*.py"]
    chunking:
      strategy: simple
      size: 1500
    vector_store: chromem
    embedder: default

  documentation:
    source:
      type: directory
      path: ./docs
      include: ["*.md"]
    chunking:
      strategy: simple
      size: 1000
    vector_store: chromem
    embedder: default

  knowledge_base:
    source:
      type: sql
      sql:
        database: main
        query: SELECT * FROM articles
        id_column: id
        content_columns: [title, content]
    vector_store: chromem
    embedder: default

agents:
  assistant:
    document_stores: [codebase, documentation, knowledge_base]
    tools: [search]

Examples

Documentation Assistant

vector_stores:
  local:
    type: chromem
    persist_path: .hector/vectors

embedders:
  default:
    provider: openai
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}

document_stores:
  docs:
    source:
      type: directory
      path: ./documentation
      include: ["*.md", "*.txt"]
    chunking:
      strategy: simple
      size: 1000
      overlap: 200
    vector_store: local
    embedder: default
    watch: true
    search:
      top_k: 5
      threshold: 0.6

agents:
  docs_assistant:
    llm: default
    document_stores: [docs]
    tools: [search]
    instruction: |
      You help users find information in documentation.
      Always search before answering questions.
      Cite document sources in responses.

Code Search Agent

document_stores:
  codebase:
    source:
      type: directory
      path: ./src
      include:
        - "*.go"
        - "*.ts"
        - "*.py"
        - "*.java"
    chunking:
      strategy: simple
      size: 1500  # Larger chunks for code
      overlap: 300
    vector_store: local
    embedder: default
    watch: true

agents:
  code_assistant:
    llm: default
    document_stores: [codebase]
    tools: [search, read_file]
    instruction: |
      You help developers understand the codebase.
      Use search to find relevant code.
      Use read_file to view complete files.

Multi-Source RAG

document_stores:
  docs:
    source:
      type: directory
      path: ./docs

  database:
    source:
      type: sql
      sql:
        database: main
        query: SELECT id, title, content FROM kb_articles
        id_column: id
        content_columns: [title, content]

  api:
    source:
      type: api
      api:
        url: https://api.example.com/docs
        response_path: data
        id_field: id
        content_fields: [content]

agents:
  research_assistant:
    document_stores: [docs, database, api]
    tools: [search]
    instruction: |
      Search across all available knowledge sources.
      Synthesize information from multiple sources.

Performance Optimization

Embedding Cache

Reuse embeddings for unchanged documents:

document_stores:
  docs:
    incremental_indexing: true  # Only reindex changed files

Parallel Indexing

Index multiple files concurrently:

document_stores:
  docs:
    indexing:
      max_concurrent: 16  # Increase for faster indexing

Collection Persistence

Use persistent vector stores:

vector_stores:
  qdrant:
    type: qdrant
    host: localhost
    port: 6333
    collection: my_docs  # Persistent collection

Best Practices

Chunk Size

Choose appropriate chunk sizes:

# Small chunks (500-800): Precise retrieval, Q&A
chunking:
  size: 500

# Medium chunks (1000-1500): General purpose
chunking:
  size: 1000

# Large chunks (2000-3000): Context-rich, code
chunking:
  size: 2000

Overlap

Use overlap to preserve context across chunks:

chunking:
  size: 1000
  overlap: 200  # 20% overlap

File Filters

Exclude non-content files:

source:
  exclude:
    - .git
    - node_modules
    - __pycache__
    - "*.log"
    - "*.tmp"

Search Threshold

Balance precision vs recall:

search:
  threshold: 0.7  # High precision, may miss some results
  threshold: 0.5  # Balanced
  threshold: 0.3  # High recall, may include irrelevant results

Next Steps