March 27, 202611 min read

Vector Databases Explained: Pinecone, Chroma, and Weaviate for AI Apps

Learn vector databases from scratch -- embeddings, similarity search, Chroma, Pinecone, Weaviate, and how to build semantic search and RAG applications.

vector-database ai embeddings python tutorial

Traditional databases are built for exact matches. You query WHERE name = 'Alice' and get rows that match exactly. But what if you want to find things that are similar? Documents that mean roughly the same thing as a query? Images that look like a reference image? Products that match a vague description?

That's what vector databases do. They store data as high-dimensional vectors (embeddings) and find the nearest neighbors. This is the foundation of semantic search, recommendation systems, and the Retrieval-Augmented Generation (RAG) pattern that makes AI applications actually useful.

What Are Vector Embeddings?

An embedding is a numerical representation of data -- a list of floating-point numbers that captures the meaning of text, images, or other content.

# The sentence "How do I reset my password?" might become:
[0.023, -0.456, 0.789, 0.012, ..., -0.234]  # 1536 dimensions for OpenAI's model

The magic: similar content produces similar vectors. "How do I reset my password?" and "I forgot my login credentials" will be close together in vector space, even though they share almost no words. "What's the weather today?" will be far away.

You generate embeddings using embedding models:

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?",
)

embedding = response.data[0].embedding
print(len(embedding))  # 1536 dimensions
print(embedding[:5])   # [0.023, -0.456, 0.789, 0.012, -0.345]

Other embedding models: Cohere's embed-v3, Google's text-embedding-004, or open-source options like sentence-transformers that run locally.

Why Traditional Databases Fail Here

PostgreSQL can store arrays, and you could technically compute cosine similarity with a query. But:

-- This works but is catastrophically slow at scale
SELECT id, content,
  1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY similarity DESC
LIMIT 10;

With 1 million documents at 1536 dimensions, this query compares the query vector against every single row. That's billions of floating-point operations per query. Linear scan doesn't scale.

Vector databases use approximate nearest neighbor (ANN) algorithms -- data structures like HNSW (Hierarchical Navigable Small World) graphs or IVF (Inverted File) indexes that find similar vectors without comparing against every single one. They trade a tiny bit of accuracy (99%+ recall) for massive speed improvements (milliseconds instead of minutes).

There's also pgvector, a PostgreSQL extension that adds vector operations and ANN indexes. It's a good middle ground if you don't want a separate database, but dedicated vector databases offer better performance and features at scale.

Chroma: Local and Easy

Chroma is the simplest vector database to start with. It runs in-process (no server needed) and handles embedding generation for you.

pip install chromadb

import chromadb

# Create a client (in-memory by default)
client = chromadb.Client()

# Or persist to disk
client = chromadb.PersistentClient(path="./chroma_data")

# Create a collection
collection = client.create_collection(
    name="support_docs",
    metadata={"hnsw:space": "cosine"},  # similarity metric
)

# Add documents -- Chroma generates embeddings automatically
collection.add(
    documents=[
        "To reset your password, go to Settings > Security > Change Password",
        "You can update your email address in your profile settings",
        "Two-factor authentication can be enabled in Settings > Security",
        "To delete your account, contact support at help@example.com",
        "Billing information is managed in Settings > Billing",
    ],
    ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
    metadatas=[
        {"category": "account"},
        {"category": "account"},
        {"category": "security"},
        {"category": "account"},
        {"category": "billing"},
    ],
)

# Query -- find documents similar to a question
results = collection.query(
    query_texts=["I forgot my login credentials"],
    n_results=3,
)

print(results["documents"])
# [['To reset your password, go to Settings > Security > Change Password',
#   'Two-factor authentication can be enabled in Settings > Security',
#   'You can update your email address in your profile settings']]

print(results["distances"])
# [[0.234, 0.567, 0.789]]  # Lower = more similar with cosine

Notice that "I forgot my login credentials" matched the password reset document even though they share no keywords. That's semantic search.

Filtering with metadata:

results = collection.query(
    query_texts=["how do I change settings"],
    n_results=3,
    where={"category": "security"},  # Only search security docs
)

When to use Chroma:

Prototyping and development
Small to medium datasets (up to a few million vectors)
Applications where the database runs alongside your app
When you don't want to manage infrastructure

Pinecone: Managed and Scalable

Pinecone is a fully managed vector database. You don't run any infrastructure -- you get an API endpoint and it handles indexing, scaling, and replication.

pip install pinecone-client openai

from pinecone import Pinecone, ServerlessSpec
from openai import OpenAI

# Initialize Pinecone
pc = Pinecone(api_key="your-pinecone-api-key")

# Create an index
pc.create_index(
    name="support-docs",
    dimension=1536,  # Must match your embedding model's output
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

index = pc.Index("support-docs")

# Generate embeddings with OpenAI
openai_client = OpenAI()

documents = [
    "To reset your password, go to Settings > Security > Change Password",
    "You can update your email address in your profile settings",
    "Two-factor authentication can be enabled in Settings > Security",
    "To delete your account, contact support at help@example.com",
    "Billing information is managed in Settings > Billing",
]

# Generate embeddings for all documents
response = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=documents,
)

# Upsert vectors with metadata
vectors = [
    {
        "id": f"doc{i+1}",
        "values": response.data[i].embedding,
        "metadata": {
            "text": doc,
            "category": category,
        },
    }
    for i, (doc, category) in enumerate(
        zip(documents, ["account", "account", "security", "account", "billing"])
    )
]

index.upsert(vectors=vectors)

# Query
query_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input="I forgot my login credentials",
).data[0].embedding

results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True,
)

for match in results.matches:
    print(f"Score: {match.score:.3f} - {match.metadata['text']}")

Pinecone-specific features:

# Namespace isolation (like tables within an index)
index.upsert(vectors=vectors, namespace="english")
index.upsert(vectors=vectors_fr, namespace="french")

# Query specific namespace
results = index.query(
    vector=query_embedding,
    top_k=5,
    namespace="english",
)

# Metadata filtering
results = index.query(
    vector=query_embedding,
    top_k=5,
    filter={"category": {"$eq": "security"}},
)

# Delete vectors
index.delete(ids=["doc1", "doc2"])

When to use Pinecone:

Production applications with high query volume
When you don't want to manage infrastructure
Large-scale datasets (billions of vectors)
When you need guaranteed uptime and performance SLAs

Weaviate: Self-Hosted with Superpowers

Weaviate is an open-source vector database that you can self-host or use as a managed service. It has built-in vectorization (it can call embedding models for you) and supports hybrid search (combining vector and keyword search).

pip install weaviate-client

Run Weaviate locally with Docker:

docker run -d \
  -p 8080:8080 \
  -p 50051:50051 \
  --name weaviate \
  -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
  -e DEFAULT_VECTORIZER_MODULE=text2vec-openai \
  -e OPENAI_APIKEY=your-openai-key \
  cr.weaviate.io/semitechnologies/weaviate:latest

import weaviate
import weaviate.classes as wvc

# Connect to local Weaviate
client = weaviate.connect_to_local()

# Create a collection with auto-vectorization
collection = client.collections.create(
    name="SupportDoc",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small",
    ),
    properties=[
        wvc.config.Property(
            name="content",
            data_type=wvc.config.DataType.TEXT,
        ),
        wvc.config.Property(
            name="category",
            data_type=wvc.config.DataType.TEXT,
        ),
    ],
)

# Add documents -- Weaviate generates embeddings automatically
support_docs = client.collections.get("SupportDoc")

with support_docs.batch.dynamic() as batch:
    for doc, category in [
        ("To reset your password, go to Settings > Security > Change Password", "account"),
        ("You can update your email address in your profile settings", "account"),
        ("Two-factor authentication can be enabled in Settings > Security", "security"),
    ]:
        batch.add_object(
            properties={"content": doc, "category": category},
        )

# Semantic search
results = support_docs.query.near_text(
    query="I forgot my login credentials",
    limit=3,
)

for obj in results.objects:
    print(f"{obj.properties['content']}")

# Hybrid search (combines vector + keyword search)
results = support_docs.query.hybrid(
    query="password reset settings",
    limit=3,
    alpha=0.5,  # 0 = pure keyword, 1 = pure vector
)

# Filtered search
results = support_docs.query.near_text(
    query="security settings",
    limit=3,
    filters=wvc.query.Filter.by_property("category").equal("security"),
)

client.close()

When to use Weaviate:

You want to self-host (data sovereignty, compliance)
You need hybrid search (vector + keyword)
You want built-in vectorization without managing embedding calls
GraphQL API is a bonus for your stack

Building a Semantic Search App

Let's build something real: a documentation search engine using Chroma.

# search_app.py
import chromadb
from openai import OpenAI

# Initialize
chroma = chromadb.PersistentClient(path="./search_db")
openai_client = OpenAI()

collection = chroma.get_or_create_collection(
    name="documentation",
    metadata={"hnsw:space": "cosine"},
)

def index_documents(documents: list[dict]):
    """Index a batch of documents.

Each document should have: id, title, content, url
    """
    # Generate embeddings for the content
    texts = [f"{doc['title']}\n{doc['content']}" for doc in documents]

response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
    )

collection.add(
        ids=[doc["id"] for doc in documents],
        embeddings=[item.embedding for item in response.data],
        documents=texts,
        metadatas=[
            {"title": doc["title"], "url": doc["url"]}
            for doc in documents
        ],
    )
    print(f"Indexed {len(documents)} documents")

def search(query: str, n_results: int = 5) -> list[dict]:
    """Search documents by semantic similarity."""
    # Generate embedding for the query
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=query,
    )
    query_embedding = response.data[0].embedding

results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results,
    )

matches = []
    for i in range(len(results["ids"][0])):
        matches.append({
            "id": results["ids"][0][i],
            "title": results["metadatas"][0][i]["title"],
            "url": results["metadatas"][0][i]["url"],
            "distance": results["distances"][0][i],
            "snippet": results["documents"][0][i][:200],
        })

return matches

# Index some documentation
index_documents([
    {
        "id": "auth-1",
        "title": "Setting Up Authentication",
        "content": "This guide covers JWT-based authentication. First, install jsonwebtoken...",
        "url": "/docs/auth/setup",
    },
    {
        "id": "auth-2",
        "title": "OAuth2 Integration",
        "content": "To add Google or GitHub login, configure OAuth2 providers...",
        "url": "/docs/auth/oauth",
    },
    {
        "id": "deploy-1",
        "title": "Deploying to Production",
        "content": "Deploy your application using Docker. Create a Dockerfile...",
        "url": "/docs/deploy/production",
    },
])

# Search
results = search("how do I add social login to my app")
for r in results:
    print(f"[{r['distance']:.3f}] {r['title']} - {r['url']}")
    print(f"  {r['snippet']}...")
    print()

RAG: Retrieval-Augmented Generation

The killer use case for vector databases right now is RAG. Instead of hoping a language model knows the answer, you retrieve relevant documents and include them in the prompt.

def ask(question: str) -> str:
    """Answer a question using RAG."""
    # Step 1: Retrieve relevant documents
    results = search(question, n_results=3)

# Step 2: Build context from retrieved documents
    context = "\n\n".join([
        f"## {r['title']}\n{r['snippet']}"
        for r in results
    ])

# Step 3: Ask the LLM with context
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful documentation assistant. "
                    "Answer the user's question based on the provided context. "
                    "If the context doesn't contain the answer, say so."
                ),
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}",
            },
        ],
    )

return response.choices[0].message.content

answer = ask("How do I add Google login?")
print(answer)

RAG is why ChatGPT can answer questions about your specific documentation, your company's knowledge base, or your codebase. The vector database is the retrieval layer that makes it work.

Choosing the Right One

Feature	Chroma	Pinecone	Weaviate
Setup complexity	Minimal	None (managed)	Moderate
Self-hosted	Yes	No	Yes
Managed option	Chroma Cloud	Yes (primary)	Weaviate Cloud
Max scale	Millions	Billions	Billions
Built-in vectorization	Yes (basic)	No	Yes (extensive)
Hybrid search	No	No	Yes
Cost	Free (self-host)	Pay per usage	Free (self-host)
Best for	Prototyping, small apps	Production at scale	Self-hosted production

Start with Chroma if you're prototyping or building something small. The in-process model means zero infrastructure to manage. Use Pinecone if you need production reliability without managing servers. The managed service handles scaling, backups, and uptime. Choose Weaviate if you need self-hosting (data privacy requirements), hybrid search, or built-in vectorization across multiple modalities.

Indexing Strategies

The index type affects query speed and accuracy:

HNSW (Hierarchical Navigable Small World): best all-around. Fast queries, high recall, higher memory usage. Default in most vector databases.
IVF (Inverted File Index): lower memory usage, slightly lower recall. Good for very large datasets where memory is a concern.
Flat: exact nearest neighbor search. Perfect accuracy but slow at scale. Use for datasets under 100K vectors.

Most of the time, HNSW with default parameters is the right choice. Tune only when you have specific performance requirements.

Common Mistakes

Using the wrong embedding model for your data. A model trained on English text won't produce good embeddings for code or non-English languages. Choose a model that matches your content type. Not chunking documents properly. Embedding an entire 10-page document as one vector loses detail. Split documents into meaningful chunks (paragraphs, sections) of 200-500 tokens each. Overlap chunks by 50-100 tokens to preserve context at boundaries. Ignoring metadata filtering. Don't rely solely on vector similarity. Use metadata filters to narrow the search space (by category, date, user, etc.) before the vector search. Not evaluating retrieval quality. Build a test set of queries and expected results. Measure recall (did the relevant documents appear?) and precision (how much noise was there?). Without evaluation, you're guessing.

What's Next

Vector databases are evolving fast. Keep an eye on:

Multimodal embeddings that represent text, images, and audio in the same vector space
Sparse-dense hybrid approaches that combine traditional search with vector search
Quantization techniques that reduce memory usage by 4-8x with minimal accuracy loss
Serverless vector databases that scale to zero when unused

The field is moving toward making vector search a standard feature of every database, not a separate product. PostgreSQL with pgvector, MongoDB Atlas Vector Search, and Elasticsearch's vector capabilities are all heading in this direction.

For more AI and backend development tutorials, check out CodeUp.