Vector Databases Are Not All the Same Problem
The vector database market has consolidated considerably since the RAG explosion of 2023. Pinecone, Weaviate, Qdrant, and pgvector are now the four most commonly deployed solutions, but they represent genuinely different architectural tradeoffs — not just different pricing. Choosing the wrong one for your workload costs more than money: it costs the months you spend after launch discovering that your performance assumptions were wrong. This guide compares these four options with concrete benchmarks, configuration examples, and a decision framework based on the workload characteristics that actually differentiate them.
What Vector Databases Actually Do
A vector database stores high-dimensional numerical vectors (embeddings) and answers approximate nearest neighbor (ANN) queries: given a query vector, find the K most similar vectors in the database. The “approximate” part is important — exact nearest neighbor search across millions of vectors is computationally prohibitive, so all production vector databases use ANN algorithms that trade a small amount of recall for orders-of-magnitude speedup.
The core ANN algorithms used by these databases:
- HNSW (Hierarchical Navigable Small World): A graph-based index that navigates a multi-layer proximity graph to find approximate neighbors efficiently. Excellent query speed, high memory usage, fast insert/update. Used by Qdrant, Weaviate, and pgvector.
- IVF (Inverted File Index): Divides the vector space into clusters (Voronoi cells) and searches only the nearest clusters during query. Lower memory than HNSW, slightly slower queries, better for datasets that change infrequently. Used by Pinecone and pgvector.
- DiskANN: Qdrant’s disk-based index for datasets too large to fit in memory. Significantly reduces memory requirements with moderate query speed tradeoff.
pgvector: When You Already Have PostgreSQL
pgvector is a PostgreSQL extension that adds vector storage and similarity search to your existing database. It is not a specialized vector database — it is vector search integrated into a relational database. This distinction matters enormously for deciding when to use it.
-- Install and enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a table with a vector column
-- 1536 dimensions = OpenAI text-embedding-3-small output
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}',
embedding vector(1536),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Create an HNSW index for approximate nearest neighbor search
-- m: number of connections per node (higher = better recall, more memory)
-- ef_construction: size of candidate list during construction (higher = better index quality)
CREATE INDEX documents_embedding_hnsw_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Query: find the 10 most similar documents to a query embedding
-- Replace '[0.1, 0.2, ...]' with actual embedding vector from your embedding model
SELECT
id,
content,
metadata,
1 - (embedding <=> '[0.1, 0.2, 0.3]'::vector) AS cosine_similarity
FROM documents
WHERE metadata->>'category' = 'technical' -- Combine with standard SQL filters
ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector
LIMIT 10;
The critical advantage of pgvector is that your vector data lives in the same database as your relational data. You can filter, join, and aggregate in a single query without cross-system coordination. The critical limitation is that pgvector does not scale to Pinecone-level query volumes or dataset sizes on commodity hardware — HNSW indexes for millions of vectors require significant memory, and PostgreSQL’s query planner does not always choose the vector index efficiently when filters are selective.
When to use pgvector: Your dataset is under 5M vectors, you already run PostgreSQL, you need to join vector search results with relational data frequently, and you do not have high-volume concurrent similarity queries.
Qdrant: Purpose-Built for Production Performance
Qdrant is an open-source vector database written in Rust, designed for high-performance similarity search in production environments. It supports HNSW and its own DiskANN implementation, quantization for memory reduction, and a rich filtering system that applies filters efficiently during (not after) the ANN search.
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct,
Filter, FieldCondition, MatchValue, Range
)
client = QdrantClient(host="localhost", port=6333)
# Create a collection with HNSW configuration
client.create_collection(
collection_name="tech_articles",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE,
hnsw_config={
"m": 16,
"ef_construct": 100,
"full_scan_threshold": 10_000
},
# Scalar quantization: compress vectors from float32 to int8
# ~4x memory reduction with minimal quality loss
quantization_config={
"scalar": {
"type": "int8",
"quantile": 0.99,
"always_ram": True # Keep quantized vectors in RAM
}
}
)
)
# Insert vectors with payload (metadata)
client.upsert(
collection_name="tech_articles",
points=[
PointStruct(
id=1,
vector=[0.1, 0.2, ...], # 1536-dimensional embedding
payload={
"title": "Container Networking Deep Dive",
"category": "infrastructure",
"published_year": 2026,
"word_count": 2100
}
)
]
)
# Search with pre-filtering (filter applied during ANN, not after)
# This is Qdrant's key advantage over pgvector for filtered search
results = client.search(
collection_name="tech_articles",
query_vector=[0.15, 0.25, ...],
query_filter=Filter(
must=[
FieldCondition(
key="category",
match=MatchValue(value="infrastructure")
),
FieldCondition(
key="published_year",
range=Range(gte=2025)
)
]
),
limit=10,
with_payload=True,
search_params={"hnsw_ef": 128} # Increase for better recall at query time
)
Qdrant’s pre-filtering is a meaningful technical differentiator. When you apply a filter after ANN search, you retrieve more vectors than needed and discard many — which requires a larger initial search to guarantee finding enough matching results. Qdrant builds the filter into the graph traversal, which maintains recall quality without inflating the search scope.
When to use Qdrant: You need a purpose-built vector database, you have strict memory budgets (quantization helps), you need complex metadata filtering that should not sacrifice recall, and you want self-hostable open source with strong production tooling.
Weaviate: Vector Search with a Knowledge Graph Layer
Weaviate positions itself as an “AI-native database” combining vector search with a graph-like object model and built-in ML model integrations. It can automatically vectorize objects at ingest time using configured vectorizers (OpenAI, Cohere, Hugging Face transformers) without requiring pre-generated embeddings.
import weaviate
from weaviate.classes.config import Configure, Property, DataType
from weaviate.classes.query import MetadataQuery
client = weaviate.connect_to_local()
# Create a collection with auto-vectorization
# Weaviate calls the embedding model on ingest — no pre-embedding needed
articles = client.collections.create(
name="TechArticle",
vectorizer_config=Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small"
),
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="content", data_type=DataType.TEXT),
Property(name="category", data_type=DataType.TEXT),
Property(name="wordCount", data_type=DataType.INT),
]
)
# Insert — Weaviate automatically embeds 'content' field
articles.data.insert({
"title": "Zero Trust Architecture in Practice",
"content": "Walk into any enterprise security conversation...",
"category": "security",
"wordCount": 2100
})
# Semantic search
results = articles.query.near_text(
query="container networking kubernetes",
limit=5,
filters=weaviate.classes.query.Filter.by_property("category").equal("infrastructure"),
return_metadata=MetadataQuery(distance=True, score=True)
)
for obj in results.objects:
print(f"{obj.properties['title']} — distance: {obj.metadata.distance:.3f}")
Weaviate also supports hybrid search — combining vector similarity with BM25 keyword search in a single query, weighted by an alpha parameter. This is valuable for search applications where both semantic similarity and keyword matching matter.
When to use Weaviate: You want built-in embedding model integration (no separate embedding pipeline), you need hybrid vector + keyword search, or your use case benefits from the object graph model with references between objects.
Pinecone: Managed Scale Without Operational Overhead
Pinecone is a fully managed vector database as a service. There is no infrastructure to manage, no index to tune, and no version to upgrade. The tradeoff is cost at scale and the lack of control over the underlying implementation.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_API_KEY")
# Create a serverless index (no hardware provisioning required)
pc.create_index(
name="tech-articles",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
index = pc.Index("tech-articles")
# Upsert vectors with metadata
index.upsert(
vectors=[
{
"id": "article-001",
"values": [0.1, 0.2, ...], # 1536-dim embedding
"metadata": {
"title": "Zero Trust Architecture",
"category": "security",
"year": 2026
}
}
],
namespace="production"
)
# Query
results = index.query(
vector=[0.15, 0.25, ...],
filter={"category": {"$eq": "security"}},
top_k=10,
include_metadata=True,
namespace="production"
)
Pinecone’s serverless tier has dramatically improved its cost structure. The old pod-based pricing made it expensive for small workloads; serverless pricing at $0.10 per million query units and $0.000001 per vector per hour of storage is competitive for moderate workloads. At very high query volumes (10M+ queries/day), the economics shift and self-hosting Qdrant typically wins.
When to use Pinecone: You want zero infrastructure management, your team does not have the capacity to operate a vector database, you are building an early-stage product and want to defer infrastructure decisions, or you need Pinecone’s SLA and support contracts for enterprise requirements.
Performance Comparison
The ann-benchmarks project provides standardized benchmarks, but production performance depends heavily on your specific workload. Key variables: vector dimensionality, dataset size, filter selectivity, and hardware. Based on commonly reported benchmarks for 1M-vector 1536-dimension datasets on typical cloud hardware:
- Query latency (p99): Qdrant and Pinecone typically achieve under 10ms for unfiltered queries. pgvector on equivalent hardware runs 20–50ms for HNSW queries, slower with IVF-flat.
- Memory efficiency: Qdrant with int8 scalar quantization uses ~4x less memory than pgvector’s float32 HNSW index. Critical at scale.
- Filtered query performance: Qdrant’s pre-filtering maintains recall under high filter selectivity where pgvector’s post-filtering approach degrades.
- Throughput: For sustained write-heavy workloads, Qdrant’s Rust implementation handles higher concurrent insertion rates than Weaviate’s JVM-based architecture.
The Decision Matrix
- Small dataset (<2M vectors) + existing PostgreSQL + complex SQL joins: pgvector. The operational simplicity outweighs the performance gap at this scale.
- Medium-to-large dataset + complex metadata filtering + self-hosted: Qdrant. Best performance-to-operational-cost ratio for teams who can run containers.
- Need built-in embedding pipeline + hybrid search: Weaviate. Reduces pipeline complexity for teams that do not want to manage embedding generation separately.
- Managed, zero-ops, early-stage product: Pinecone serverless. Pay for convenience while you focus on product-market fit.
Key Takeaways
- pgvector is the right choice when your dataset fits comfortably in PostgreSQL and you need to combine vector search with relational queries. It is not a high-performance vector database.
- Qdrant’s pre-filtering architecture maintains recall quality under selective metadata filters — a meaningful advantage over post-filtering approaches at scale.
- Weaviate’s built-in vectorizer integration removes the embedding pipeline from your architecture, reducing complexity at the cost of less control over the embedding process.
- Pinecone serverless is competitively priced for moderate workloads and eliminates all infrastructure management — the right choice for early-stage products or teams without vector database operations expertise.
- At high query volumes (10M+ queries/day), self-hosted Qdrant typically has better economics than managed Pinecone. Run the numbers for your specific workload before assuming managed is more expensive.
