Skip to content

RAG & Semantic Search

RAG (Retrieval-Augmented Generation) gives agents the ability to search through documents semantically—finding information by meaning, not just keywords.

What is RAG?

Traditional search: "Find files containing 'authentication'"
Semantic search: "Find code related to user login"

RAG allows agents to: - Search codebases by meaning - Find relevant documentation - Discover similar patterns - Answer questions from knowledge bases


Prerequisites

RAG requires two components:

  1. Vector Database - Stores document embeddings (Qdrant)
  2. Embedder - Converts text to vectors (Ollama)

Quick Setup

1. Start Qdrant

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

Verify: http://localhost:6334/dashboard

2. Start Ollama

# Install Ollama (macOS/Linux)
curl https://ollama.ai/install.sh | sh

# Pull embedding model
ollama pull nomic-embed-text

3. Configure Hector

# Vector database
databases:
  qdrant:
    type: "qdrant"
    host: "localhost"
    port: 6334

# Embedder
embedders:
  embedder:
    type: "ollama"
    host: "http://localhost:11434"
    model: "nomic-embed-text"

# Agent with semantic search
agents:
  coder:
    database: "qdrant"
    embedder: "embedder"
    tools: ["search"]
    document_stores:
      - name: "codebase"
        paths: ["./src/"]
        include_patterns: ["*.go", "*.py", "*.js"]

4. Test It

hector call coder "How does authentication work in this codebase?"

The agent will semantically search your code and answer!


Document Stores

Document stores define what gets indexed for search.

Basic Configuration

agents:
  assistant:
    database: "qdrant"
    embedder: "embedder"
    document_stores:
      - name: "docs"
        paths: ["./documentation/"]
        # Note: Defaults to parseable file types (text + .pdf/.docx/.xlsx)
        # To restrict further: include_patterns: ["*.md", "*.txt"]

Multiple Document Stores

agents:
  researcher:
    database: "qdrant"
    embedder: "embedder"
    document_stores:
      - name: "codebase"
        paths: ["./src/", "./lib/"]
        # Note: Defaults to common text files + .pdf/.docx/.xlsx
        # include_patterns: ["*.go", "*.py"]  # Optional: restrict to specific types
        chunk_size: 512

      - name: "documentation"
        paths: ["./docs/"]
        include_patterns: ["*.md"]
        chunk_size: 1024

      - name: "configs"
        paths: ["./configs/"]
        include_patterns: ["*.yaml", "*.json"]
        chunk_size: 256

Configuration Options

document_stores:
  - name: "my_store"
    paths: ["./path1/", "./path2/"]
    include_patterns: ["*.ext"]  # Optional: defaults to common text files + .pdf/.docx/.xlsx

    # Chunking
    chunk_size: 512           # Characters per chunk
    chunk_overlap: 50         # Overlap between chunks

    # Parsing
    parser: "native"          # native|custom|plugin

    # Indexing
    collection: "my_collection"  # Qdrant collection name
    batch_size: 100           # Documents per batch

    # Filtering
    exclude_patterns: ["*_test.go", "*.min.js"]

How RAG Works

Indexing Phase

1. Hector reads documents from paths
   ├─ ./src/auth.go
   ├─ ./src/user.go
   └─ ./src/db.go

2. Documents split into chunks
   ├─ Chunk 1: "package auth..."
   ├─ Chunk 2: "func Login..."
   └─ Chunk 3: "func Validate..."

3. Each chunk converted to embedding
   ├─ [0.23, -0.45, 0.67, ...] (768 dimensions)
   ├─ [0.12, -0.34, 0.56, ...]
   └─ [-0.45, 0.23, 0.78, ...]

4. Embeddings stored in Qdrant
   ├─ Collection: "codebase"
   └─ Indexed for fast similarity search

Search Phase

1. User asks: "How does authentication work?"

2. Query embedded: [0.25, -0.43, 0.69, ...]

3. Qdrant finds similar chunks (cosine similarity)
   ├─ auth.go chunk (similarity: 0.92)
   ├─ user.go chunk (similarity: 0.85)
   └─ db.go chunk (similarity: 0.78)

4. Top chunks injected into agent context

5. Agent answers using retrieved context

Vector Databases

databases:
  qdrant:
    type: "qdrant"
    host: "localhost"
    port: 6334           # gRPC port (default: 6334)
    grpc_port: 6334      # REST dashboard port (default: 6334)
    api_key: ""          # Optional for Qdrant Cloud
    use_https: false     # Enable for cloud

Docker:

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  -p 6334:6334 \
  -v qdrant_data:/qdrant/storage \
  qdrant/qdrant

Qdrant Cloud:

databases:
  qdrant_cloud:
    type: "qdrant"
    host: "your-cluster.qdrant.io"
    port: 6334
    api_key: "${QDRANT_API_KEY}"
    use_https: true

Custom Vector Databases (Plugins)

plugins:
  databases:
    - name: "my-vector-db"
      protocol: "grpc"
      path: "/path/to/plugin"

databases:
  custom:
    type: "plugin:my-vector-db"
    # Custom configuration

Embedders

embedders:
  embedder:
    type: "ollama"
    host: "http://localhost:11434"
    model: "nomic-embed-text"  # Best for code
    timeout: 30

Available Models: - nomic-embed-text - General purpose, 768 dimensions (recommended) - all-minilm - Lightweight, 384 dimensions - mxbai-embed-large - Large, 1024 dimensions

Setup:

ollama pull nomic-embed-text

Custom Embedders (Plugins)

plugins:
  embedders:
    - name: "my-embedder"
      protocol: "grpc"
      path: "/path/to/plugin"

embedders:
  custom:
    type: "plugin:my-embedder"
    # Custom configuration

Advanced Configuration

Chunking Strategy

Balance between context and precision:

# Small chunks (precise, less context)
document_stores:
  - name: "precise"
    chunk_size: 256
    chunk_overlap: 25
    # Good for: Code snippets, specific facts

# Medium chunks (balanced)
document_stores:
  - name: "balanced"
    chunk_size: 512
    chunk_overlap: 50
    # Good for: General purpose

# Large chunks (more context, less precise)
document_stores:
  - name: "contextual"
    chunk_size: 2048
    chunk_overlap: 200
    # Good for: Documentation, narratives

Search Configuration

agents:
  searcher:
    database: "qdrant"
    embedder: "embedder"
    document_stores:
      - name: "docs"
        paths: ["./"]
        search_config:
          limit: 5              # Top 5 results
          score_threshold: 0.7  # Minimum similarity score
          filter: {}            # Optional metadata filters

Custom Parsers

Parse non-standard formats:

plugins:
  parsers:
    - name: "pdf-parser"
      protocol: "grpc"
      path: "/path/to/parser"

document_stores:
  - name: "pdfs"
    paths: ["./documents/"]
    include_patterns: ["*.pdf"]
    parser: "plugin:pdf-parser"

Performance Optimization

Indexing Performance

document_stores:
  - name: "large_codebase"
    paths: ["./"]
    batch_size: 100        # Index 100 docs at a time
    parallel: true         # Parallel processing
    cache_embeddings: true # Cache for faster re-indexing

Search Performance

agents:
  fast_search:
    document_stores:
      - name: "optimized"
        search_config:
          limit: 3           # Fewer results = faster
          score_threshold: 0.8  # Higher threshold = fewer candidates
          use_cache: true    # Cache frequent queries

Resource Management

# Ollama configuration
embedders:
  embedder:
    type: "ollama"
    host: "http://localhost:11434"
    timeout: 30
    batch_size: 32  # Embed 32 chunks at once

# Qdrant configuration
databases:
  qdrant:
    type: "qdrant"
    host: "localhost"
    port: 6334
    connection_pool_size: 10  # Connection pooling

Use Cases

agents:
  code_assistant:
    database: "qdrant"
    embedder: "embedder"
    tools: ["search", "write_file"]
    document_stores:
      - name: "codebase"
        paths: ["./src/", "./lib/"]
        include_patterns: ["*.go", "*.py", "*.js", "*.ts"]
        chunk_size: 512

    prompt:
      system_role: |
        You are a code assistant. Use semantic search to find
        relevant code before answering questions or making changes.

Documentation Assistant

agents:
  docs_bot:
    database: "qdrant"
    embedder: "embedder"
    document_stores:
      - name: "documentation"
        paths: ["./docs/"]
        include_patterns: ["*.md", "*.rst"]
        chunk_size: 1024

    prompt:
      system_role: |
        Answer questions based on the documentation.
        Always cite your sources.

Research Assistant

agents:
  researcher:
    database: "qdrant"
    embedder: "embedder"
    document_stores:
      - name: "papers"
        paths: ["./research/"]
        include_patterns: ["*.pdf", "*.md"]
        chunk_size: 2048

      - name: "notes"
        paths: ["./notes/"]
        include_patterns: ["*.md"]
        chunk_size: 512

Monitoring & Debugging

Check Indexing Status

# View Qdrant dashboard
open http://localhost:6334/dashboard

# Check collection info
curl http://localhost:6334/collections/codebase

Debug Search Results

agents:
  debug:
    reasoning:
      show_debug_info: true
    document_stores:
      - name: "test"
        paths: ["./"]
        debug: true  # Log search results

Re-index Documents

# Delete collection and re-index
curl -X DELETE http://localhost:6334/collections/codebase

# Restart Hector to trigger re-indexing
hector serve --config config.yaml

Troubleshooting

"Qdrant connection failed"

# Check if Qdrant is running
docker ps | grep qdrant

# Check logs
docker logs qdrant

# Verify port
curl http://localhost:6334/

"Ollama not responding"

# Check if Ollama is running
ollama list

# Pull model if missing
ollama pull nomic-embed-text

# Check service
curl http://localhost:11434/api/tags

"Search returns no results"

  • Check documents are indexed: View Qdrant dashboard
  • Verify file patterns match your files
  • Lower score_threshold in search config
  • Check chunk sizes aren't too large

Best Practices

1. Choose the Right Chunk Size

# Code: Small chunks for precision
chunk_size: 512

# Docs: Medium chunks for balance
chunk_size: 1024

# Narratives: Large chunks for context
chunk_size: 2048

2. Use Appropriate Overlap

# Small chunks: 10-20% overlap
chunk_size: 256
chunk_overlap: 25

# Large chunks: 5-10% overlap
chunk_size: 2048
chunk_overlap: 200

3. Filter Irrelevant Files

document_stores:
  - name: "clean_codebase"
    paths: ["./"]
    include_patterns: ["*.go"]
    exclude_patterns: [
      "*_test.go",     # Test files
      "*.min.js",      # Minified files
      "vendor/*",      # Dependencies
      "node_modules/*"
    ]

4. Organize by Type

document_stores:
  - name: "source_code"
    paths: ["./src/"]
    chunk_size: 512

  - name: "documentation"
    paths: ["./docs/"]
    chunk_size: 1024

  - name: "configs"
    paths: ["./config/"]
    chunk_size: 256

Next Steps