Skip to content

How to Set Up RAG & Semantic Search

Give your agents the ability to search through code and documents semantically—finding information by meaning, not just keywords.

Time: 20 minutes
Difficulty: Intermediate


What You'll Achieve

Agents that can: - Search code semantically - "Find authentication logic" instead of grep for "auth" - Discover patterns - Find similar code across the codebase - Understand context - Retrieve relevant documentation automatically - Answer questions - Query knowledge bases intelligently


Prerequisites

✅ Hector installed (Installation Guide)
✅ Docker (for Qdrant)
✅ Basic understanding of RAG concepts


Step 1: Start Qdrant (Vector Database)

Qdrant stores vector embeddings of your documents.

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  -p 6334:6334 \
  -v qdrant_data:/qdrant/storage \
  qdrant/qdrant

Ports: - 6334 - gRPC API (used by Hector) - 6334 - REST API + Dashboard

Verify Installation

# Check if running
docker ps | grep qdrant

# Access dashboard
open http://localhost:6334/dashboard

You should see the Qdrant web interface.


Step 2: Start Ollama (Embeddings)

Ollama generates vector embeddings from text.

Install Ollama

curl https://ollama.ai/install.sh | sh

Download from https://ollama.ai

Pull Embedding Model

ollama pull nomic-embed-text

This downloads the embedding model (~274MB).

Verify Installation

# List models
ollama list

# Should show:
# nomic-embed-text:latest

# Test embeddings
ollama run nomic-embed-text "test"

Step 3: Configure Hector

Create config-with-rag.yaml:

# Vector Database
databases:
  qdrant:
    type: "qdrant"
    host: "localhost"
    port: 6334

# Embedder
embedders:
  embedder:
    type: "ollama"
    host: "http://localhost:11434"
    model: "nomic-embed-text"

# LLM
llms:
  gpt-4o:
    type: "openai"
    model: "gpt-4o-mini"
    api_key: "${OPENAI_API_KEY}"

# Document Stores (what to index)
document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/", "./lib/"]

# Agent with Semantic Search
agents:
  coder:
    llm: "gpt-4o"
    database: "qdrant"
    embedder: "embedder"
    document_stores: ["codebase"]

    tools:
      - "search"  # Enable semantic search tool

Key components:

  1. database: "qdrant" - Connect to vector database
  2. embedder: "embedder" - Use Ollama for embeddings
  3. tools: ["search"] - Enable search tool
  4. document_stores - Define what to index

Step 4: Start Hector and Index

export OPENAI_API_KEY="sk-..."
hector serve --config config-with-rag.yaml

On first run, Hector automatically indexes your codebase:

Hector server listening on :8080
Indexing document store: codebase
  Reading files from ./src/
  Found 156 files
  Creating 1,234 chunks
  Generating embeddings... 
  Storing in Qdrant...
Indexing complete: 1,234 chunks indexed
Agent registered: coder

This may take a few minutes for large codebases.


Interactive Chat

hector chat --config config-with-rag.yaml coder

Try these queries:

> How does authentication work in this codebase?
[Agent uses semantic search to find auth-related code]

> Where is the database connection configured?
[Agent finds db config files semantically]

> Show me examples of error handling
[Agent finds error handling patterns across codebase]

Single Query

hector call --config config-with-rag.yaml coder "Explain how the API routes are structured"

Agent will: 1. Use semantic search to find routing code 2. Analyze the patterns 3. Provide explanation with examples


Step 6: Verify It Works

Check Qdrant Dashboard

Visit http://localhost:6334/dashboard

You should see: - Collection: codebase (or your document store name) - Vectors: Number of chunks indexed - Dimensions: 768 (for nomic-embed-text)

Test Search Directly

# Search via Qdrant API
curl -X POST http://localhost:6334/collections/codebase/points/search \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.1, 0.2, ...],  # Would be actual embedding
    "limit": 5
  }'

Customizing Your Setup

Multiple Document Stores

Index different types of content with different settings:

document_stores:
  # Source code - small chunks for precision
  source_code:
    name: "source_code"
    paths: ["./src/"]
    chunk_size: 512

  # Documentation - large chunks for context
  documentation:
    name: "documentation"
    paths: ["./docs/"]
    chunk_size: 2048

  # Configuration files - small chunks
  configs:
    name: "configs"
    paths: ["./config/"]
    chunk_size: 256

Exclude Files

document_stores:
  clean_code:
    name: "clean_code"
    paths: ["./"]

### Adjust Chunk Sizes

Balance between precision and context:

```yaml
# Precise but less context
chunk_size: 256
chunk_overlap: 25

# Balanced (recommended)
chunk_size: 512
chunk_overlap: 50

# More context but less precise
chunk_size: 2048
chunk_overlap: 200

Performance Tuning

document_stores:
  optimized:
    name: "optimized"
    paths: ["./src/"]

    # Indexing performance
    batch_size: 100        # Process 100 docs at a time
    parallel: true         # Parallel processing
    cache_embeddings: true # Cache for re-indexing

    # Search performance
    search_config:
      limit: 5             # Return top 5 results
      score_threshold: 0.7 # Minimum similarity score

Re-Indexing

Manual Re-Index

# Delete collection
curl -X DELETE http://localhost:6334/collections/codebase

# Restart Hector to trigger re-indexing
hector serve --config config-with-rag.yaml

Auto Re-Index on Changes

Coming soon: File watcher for automatic re-indexing.

Workaround: Restart Hector after code changes:

# In development
while true; do
  hector serve --config config-with-rag.yaml
  sleep 5
done

Advanced Configurations

Qdrant Cloud

Use hosted Qdrant instead of local:

databases:
  qdrant_cloud:
    type: "qdrant"
    host: "your-cluster.qdrant.io"
    port: 6334
    api_key: "${QDRANT_API_KEY}"
    use_https: true

Different Embedding Models

embedders:
  # Fast, smaller embeddings (384 dimensions)
  fast:
    type: "ollama"
    model: "all-minilm"

  # Better quality, larger embeddings (1024 dimensions)
  quality:
    type: "ollama"
    model: "mxbai-embed-large"

  # Best for code (768 dimensions, recommended)
  code:
    type: "ollama"
    model: "nomic-embed-text"

agents:
  coder:
    embedder: "code"  # Use code-optimized embeddings

Multiple Collections

agents:
  fullstack_dev:
    database: "qdrant"
    embedder: "embedder"
    document_stores: ["frontend", "backend", "docs"]

document_stores:
  frontend:
    name: "frontend"
    paths: ["./frontend/"]
    collection: "frontend_code"

  backend:
    name: "backend"
    paths: ["./backend/"]
    collection: "backend_code"

  docs:
    name: "docs"
    paths: ["./docs/"]
    collection: "documentation"

Each gets its own Qdrant collection.


Troubleshooting

"Qdrant connection failed"

Check if running:

docker ps | grep qdrant

Check logs:

docker logs qdrant

Test connectivity:

curl http://localhost:6334/
# Should return Qdrant info

Fix:

# Restart Qdrant
docker restart qdrant

# Or start if not running
docker start qdrant

"Ollama not responding"

Check if running:

ollama list

Test service:

curl http://localhost:11434/api/tags

Fix:

# Restart Ollama service
# macOS/Linux:
sudo systemctl restart ollama

# Or reinstall
curl https://ollama.ai/install.sh | sh

"Search returns no results"

Verify indexing: - Check Qdrant dashboard: http://localhost:6334/dashboard - Look for your collection - Check vector count

Lower threshold:

document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/"]
    search_config:
      score_threshold: 0.5  # Lower from 0.7

Check file patterns:

document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/"]

"Indexing is slow"

Optimize batch size:

document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/"]
    batch_size: 50  # Increase for better performance
    parallel: true

Use smaller chunks:

chunk_size: 256  # Faster than 512 or 1024


Production Considerations

Persistent Storage

Mount Qdrant data directory:

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  -v /path/to/qdrant_data:/qdrant/storage \
  qdrant/qdrant

Resource Allocation

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  --memory="2g" \
  --cpus="2" \
  -v qdrant_data:/qdrant/storage \
  qdrant/qdrant

Backup Strategy

# Backup Qdrant data
docker exec qdrant tar czf /tmp/qdrant-backup.tar.gz /qdrant/storage
docker cp qdrant:/tmp/qdrant-backup.tar.gz ./backups/

# Restore
docker cp ./backups/qdrant-backup.tar.gz qdrant:/tmp/
docker exec qdrant tar xzf /tmp/qdrant-backup.tar.gz -C /

Monitoring

# Enable debug logging
logging:
  level: "info"
  format: "json"

agents:
  coder:
    reasoning:
      show_debug_info: true  # See search performance

Verification Checklist

✅ Qdrant running and accessible
✅ Ollama installed with nomic-embed-text
✅ Hector configured with database and embedder
✅ Document stores defined
✅ Indexing completed successfully
✅ Search tool enabled in agent
✅ Agent can find relevant code semantically


Next Steps