Quick Start: Vector Database Embeddings#

This guide helps you get started with vector database embeddings in terraform-ingest in under 5 minutes.

1. Install Dependencies#

# Install ChromaDB for vector database
pip install chromadb

# Optional: Install sentence-transformers for better local embeddings
pip install sentence-transformers

2. Enable Embeddings in Config#

Edit your config.yaml and add:

repositories:
  - url: https://github.com/terraform-aws-modules/terraform-aws-vpc
    name: terraform-aws-vpc
    branches:
      - main
    include_tags: true
    max_tags: 5

output_dir: ./output
clone_dir: ./repos

# Add this section
embedding:
  enabled: true
  strategy: chromadb-default  # Easiest to start with
  chromadb_path: ./chromadb
  collection_name: terraform_modules

3. Ingest Modules#

terraform-ingest ingest config.yaml

You should see output like:

Processing repository: https://github.com/terraform-aws-modules/terraform-aws-vpc
Saved summary to ./output/terraform-aws-vpc_main.json
Upserted to vector database with ID: abc123...

4. Search for Modules#

# Basic search
terraform-ingest search "vpc module for aws"

# Filter by provider
terraform-ingest search "kubernetes cluster" --provider aws

# Limit results
terraform-ingest search "security group" --limit 3

Example Output#

Searching for: vpc module for aws

Found 1 result(s):

1. https://github.com/terraform-aws-modules/terraform-aws-vpc
   Ref: main
   Path: .
   Provider: aws
   Relevance: 0.850

Advanced: Use Different Embedding Strategies#

OpenAI (Best Quality)#

embedding:
  enabled: true
  strategy: openai
  openai_api_key: sk-...  # Or set OPENAI_API_KEY env var
  openai_model: text-embedding-3-small
  chromadb_path: ./chromadb
  collection_name: terraform_modules

pip install openai

Local Sentence Transformers (Free, No API)#

embedding:
  enabled: true
  strategy: sentence-transformers
  sentence_transformers_model: all-MiniLM-L6-v2
  chromadb_path: ./chromadb
  collection_name: terraform_modules

pip install sentence-transformers

Using from Python#

from terraform_ingest import TerraformIngest

# Load and ingest
ingester = TerraformIngest.from_yaml('config.yaml')
summaries = ingester.ingest()

# Search
results = ingester.search_vector_db(
    "vpc module with private subnets",
    filters={"provider": "aws"},
    n_results=5
)

for result in results:
    print(f"Found: {result['metadata']['repository']}")

Using the API#

Start the server:

terraform-ingest serve

Search via HTTP:

curl -X POST http://localhost:8000/search/vector \
  -H "Content-Type: application/json" \
  -d '{
    "query": "vpc module with public and private subnets",
    "provider": "aws",
    "limit": 5,
    "config_file": "config.yaml"
  }'

Troubleshooting#

"chromadb not found"#

pip install chromadb

"Vector database is not enabled"#

Make sure embedding.enabled: true is in your config.yaml

Model download takes a while#

First run downloads models (~100MB). This is normal.

Search returns no results#

Make sure you've run ingestion first
Check that modules were upserted: look for "Upserted to vector database" messages
Try a broader query

Common Use Cases#

Find modules for a specific cloud provider#

terraform-ingest search "storage" --provider gcp

Find modules in a specific repository#

terraform-ingest search "networking" --repository https://github.com/terraform-aws-modules/terraform-aws-vpc

Natural language queries#

terraform-ingest search "module for creating kubernetes clusters with autoscaling"
terraform-ingest search "database with automated backups and replication"
terraform-ingest search "vpc with vpn and direct connect support"

Performance Tips#

Start with ChromaDB default - easiest to set up
Use filters - narrow results with --provider or --repository
Adjust result limit - use --limit 3 for faster results
Use local models - sentence-transformers avoids API costs
Enable only needed content - set include_readme: false if not needed