How to Set Up RAG & Semantic Search¶

Give your agents the ability to search through code and documents semantically—finding information by meaning, not just keywords.

Time: 20 minutes
Difficulty: Intermediate

What You'll Achieve¶

Agents that can: - Search code semantically - "Find authentication logic" instead of grep for "auth" - Discover patterns - Find similar code across the codebase - Understand context - Retrieve relevant documentation automatically - Answer questions - Query knowledge bases intelligently

Prerequisites¶

✅ Hector installed (Installation Guide)
✅ Docker (for Qdrant)
✅ Basic understanding of RAG concepts

Step 1: Start Qdrant (Vector Database)¶

Qdrant stores vector embeddings of your documents.

Using Docker (Recommended)¶

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  -p 6334:6334 \
  -v qdrant_data:/qdrant/storage \
  qdrant/qdrant

Ports: - 6334 - gRPC API (used by Hector) - 6334 - REST API + Dashboard

Verify Installation¶

# Check if running
docker ps | grep qdrant

# Access dashboard
open http://localhost:6334/dashboard

You should see the Qdrant web interface.

Step 2: Start Ollama (Embeddings)¶

Ollama generates vector embeddings from text.

Install Ollama¶

macOS/LinuxWindows

curl https://ollama.ai/install.sh | sh

Download from https://ollama.ai

Pull Embedding Model¶

ollama pull nomic-embed-text

This downloads the embedding model (~274MB).

Verify Installation¶

# List models
ollama list

# Should show:
# nomic-embed-text:latest

# Test embeddings
ollama run nomic-embed-text "test"

Step 3: Configure Hector¶

Create config-with-rag.yaml:

# Vector Database
databases:
  qdrant:
    type: "qdrant"
    host: "localhost"
    port: 6334

# Embedder
embedders:
  embedder:
    type: "ollama"
    host: "http://localhost:11434"
    model: "nomic-embed-text"

# LLM
llms:
  gpt-4o:
    type: "openai"
    model: "gpt-4o-mini"
    api_key: "${OPENAI_API_KEY}"

# Document Stores (what to index)
document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/", "./lib/"]

# Agent with Semantic Search
agents:
  coder:
    llm: "gpt-4o"
    database: "qdrant"
    embedder: "embedder"
    document_stores: ["codebase"]

    tools:
      - "search"  # Enable semantic search tool

Key components:

database: "qdrant" - Connect to vector database
embedder: "embedder" - Use Ollama for embeddings
tools: ["search"] - Enable search tool
document_stores - Define what to index

Step 4: Start Hector and Index¶

export OPENAI_API_KEY="sk-..."
hector serve --config config-with-rag.yaml

On first run, Hector automatically indexes your codebase:

Hector server listening on :8080
Indexing document store: codebase
  Reading files from ./src/
  Found 156 files
  Creating 1,234 chunks
  Generating embeddings... 
  Storing in Qdrant...
Indexing complete: 1,234 chunks indexed
Agent registered: coder

This may take a few minutes for large codebases.

Step 5: Test Semantic Search¶

Interactive Chat¶

hector chat --config config-with-rag.yaml coder

Try these queries:

> How does authentication work in this codebase?
[Agent uses semantic search to find auth-related code]

> Where is the database connection configured?
[Agent finds db config files semantically]

> Show me examples of error handling
[Agent finds error handling patterns across codebase]

Single Query¶

hector call --config config-with-rag.yaml coder "Explain how the API routes are structured"

Agent will: 1. Use semantic search to find routing code 2. Analyze the patterns 3. Provide explanation with examples

Step 6: Verify It Works¶

Check Qdrant Dashboard¶

Visit http://localhost:6334/dashboard

You should see: - Collection: codebase (or your document store name) - Vectors: Number of chunks indexed - Dimensions: 768 (for nomic-embed-text)

Test Search Directly¶

# Search via Qdrant API
curl -X POST http://localhost:6334/collections/codebase/points/search \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.1, 0.2, ...],  # Would be actual embedding
    "limit": 5
  }'

Customizing Your Setup¶

Multiple Document Stores¶

Index different types of content with different settings:

document_stores:
  # Source code - small chunks for precision
  source_code:
    name: "source_code"
    paths: ["./src/"]
    chunk_size: 512

  # Documentation - large chunks for context
  documentation:
    name: "documentation"
    paths: ["./docs/"]
    chunk_size: 2048

  # Configuration files - small chunks
  configs:
    name: "configs"
    paths: ["./config/"]
    chunk_size: 256

Exclude Files¶

document_stores:
  clean_code:
    name: "clean_code"
    paths: ["./"]

### Adjust Chunk Sizes

Balance between precision and context:

```yaml
# Precise but less context
chunk_size: 256
chunk_overlap: 25

# Balanced (recommended)
chunk_size: 512
chunk_overlap: 50

# More context but less precise
chunk_size: 2048
chunk_overlap: 200

Performance Tuning¶

document_stores:
  optimized:
    name: "optimized"
    paths: ["./src/"]

    # Indexing performance
    batch_size: 100        # Process 100 docs at a time
    parallel: true         # Parallel processing
    cache_embeddings: true # Cache for re-indexing

    # Search performance
    search_config:
      limit: 5             # Return top 5 results
      score_threshold: 0.7 # Minimum similarity score

Re-Indexing¶

Manual Re-Index¶

# Delete collection
curl -X DELETE http://localhost:6334/collections/codebase

# Restart Hector to trigger re-indexing
hector serve --config config-with-rag.yaml

Auto Re-Index on Changes¶

Coming soon: File watcher for automatic re-indexing.

Workaround: Restart Hector after code changes:

# In development
while true; do
  hector serve --config config-with-rag.yaml
  sleep 5
done

Advanced Configurations¶

Qdrant Cloud¶

Use hosted Qdrant instead of local:

databases:
  qdrant_cloud:
    type: "qdrant"
    host: "your-cluster.qdrant.io"
    port: 6334
    api_key: "${QDRANT_API_KEY}"
    use_https: true

Different Embedding Models¶

embedders:
  # Fast, smaller embeddings (384 dimensions)
  fast:
    type: "ollama"
    model: "all-minilm"

  # Better quality, larger embeddings (1024 dimensions)
  quality:
    type: "ollama"
    model: "mxbai-embed-large"

  # Best for code (768 dimensions, recommended)
  code:
    type: "ollama"
    model: "nomic-embed-text"

agents:
  coder:
    embedder: "code"  # Use code-optimized embeddings

Multiple Collections¶

agents:
  fullstack_dev:
    database: "qdrant"
    embedder: "embedder"
    document_stores: ["frontend", "backend", "docs"]

document_stores:
  frontend:
    name: "frontend"
    paths: ["./frontend/"]
    collection: "frontend_code"

  backend:
    name: "backend"
    paths: ["./backend/"]
    collection: "backend_code"

  docs:
    name: "docs"
    paths: ["./docs/"]
    collection: "documentation"

Each gets its own Qdrant collection.

Troubleshooting¶

"Qdrant connection failed"¶

Check if running:

docker ps | grep qdrant

Check logs:

docker logs qdrant

Test connectivity:

curl http://localhost:6334/
# Should return Qdrant info

Fix:

# Restart Qdrant
docker restart qdrant

# Or start if not running
docker start qdrant

"Ollama not responding"¶

Check if running:

ollama list

Test service:

curl http://localhost:11434/api/tags

Fix:

# Restart Ollama service
# macOS/Linux:
sudo systemctl restart ollama

# Or reinstall
curl https://ollama.ai/install.sh | sh

"Search returns no results"¶

Verify indexing: - Check Qdrant dashboard: http://localhost:6334/dashboard - Look for your collection - Check vector count

Lower threshold:

document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/"]
    search_config:
      score_threshold: 0.5  # Lower from 0.7

Check file patterns:

document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/"]

"Indexing is slow"¶

Optimize batch size:

document_stores:
  codebase:
    name: "codebase"
    paths: ["./src/"]
    batch_size: 50  # Increase for better performance
    parallel: true

Use smaller chunks:

chunk_size: 256  # Faster than 512 or 1024

Production Considerations¶

Persistent Storage¶

Mount Qdrant data directory:

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  -v /path/to/qdrant_data:/qdrant/storage \
  qdrant/qdrant

Resource Allocation¶

docker run -d \
  --name qdrant \
  -p 6334:6334 \
  --memory="2g" \
  --cpus="2" \
  -v qdrant_data:/qdrant/storage \
  qdrant/qdrant

Backup Strategy¶

# Backup Qdrant data
docker exec qdrant tar czf /tmp/qdrant-backup.tar.gz /qdrant/storage
docker cp qdrant:/tmp/qdrant-backup.tar.gz ./backups/

# Restore
docker cp ./backups/qdrant-backup.tar.gz qdrant:/tmp/
docker exec qdrant tar xzf /tmp/qdrant-backup.tar.gz -C /

Monitoring¶

# Enable debug logging
logging:
  level: "info"
  format: "json"

agents:
  coder:
    reasoning:
      show_debug_info: true  # See search performance

Verification Checklist¶

✅ Qdrant running and accessible
✅ Ollama installed with nomic-embed-text
✅ Hector configured with database and embedder
✅ Document stores defined
✅ Indexing completed successfully
✅ Search tool enabled in agent
✅ Agent can find relevant code semantically

Next Steps¶

Build a Coding Assistant - Use RAG in practice
RAG & Semantic Search - Understand the concepts
Tools - Learn about the search tool
Configuration Reference - All RAG options

Memory - Long-term memory also uses vectors
Agent Overview - Understanding agents
Architecture - How RAG works internally

How to Set Up RAG & Semantic Search¶

What You'll Achieve¶

Prerequisites¶

Step 1: Start Qdrant (Vector Database)¶

Using Docker (Recommended)¶

Verify Installation¶

Step 2: Start Ollama (Embeddings)¶

Install Ollama¶

Pull Embedding Model¶

Verify Installation¶

Step 3: Configure Hector¶

Step 4: Start Hector and Index¶

Step 5: Test Semantic Search¶

Interactive Chat¶

Single Query¶

Step 6: Verify It Works¶

Check Qdrant Dashboard¶

Test Search Directly¶

Customizing Your Setup¶

Multiple Document Stores¶

Exclude Files¶

Performance Tuning¶

Re-Indexing¶

Manual Re-Index¶

Auto Re-Index on Changes¶

Advanced Configurations¶

Qdrant Cloud¶

Different Embedding Models¶

Multiple Collections¶

Troubleshooting¶

"Qdrant connection failed"¶

"Ollama not responding"¶

"Search returns no results"¶

"Indexing is slow"¶

Production Considerations¶

Persistent Storage¶

Resource Allocation¶

Backup Strategy¶

Monitoring¶

Verification Checklist¶

Next Steps¶

Related Topics¶