Vector Databases for Product Search: pgvector, Pinecone, and Weaviate Compared
Choosing a vector database for your product search system? Practical comparison of pgvector (self-hosted), Pinecone (managed), and Weaviate (open source). Trade-offs, benchmarks, and when to use each.
A vector database is the retrieval layer of a RAG system — it stores embeddings of your documents and retrieves the most relevant documents for a given query. The choice of vector database affects latency, cost, operational complexity, and what advanced features (filtering, re-ranking) are available to you.
This article evaluates three popular options for B2B product search, with real-world trade-offs and implementation guidance.
Quick Comparison Table
| Factor | pgvector | Pinecone | Weaviate |
|---|---|---|---|
| Type | Self-hosted (PostgreSQL ext) | Managed service | Open source / managed |
| Cost Model | Infra only | Per-million queries + storage | Infra + operational overhead |
| Setup Time | 1–2 hours | 5 minutes | 2–4 hours (self-hosted) |
| Latency | <50ms (p50) | 50–150ms (p50) | 20–100ms (p50) |
| Scaling | Manual (add PostgreSQL replicas) | Automatic | Manual (Kubernetes) |
| Metadata Filtering | Full SQL support | Scoped to metadata fields | GraphQL filters |
| Hybrid Search | BM25 built-in via pg_trgm | Via Pinecone + external BM25 | Hybrid queries native |
| Best For | Small–medium catalogs, PostgreSQL users | Managed simplicity, high traffic | Advanced filtering, custom deployments |
pgvector: Self-Hosted, Postgres-Native
pgvector is an extension for PostgreSQL that adds vector data type and approximate nearest-neighbor (ANN) search using the HNSW algorithm. If you're already running Postgres for your application database, adding pgvector is straightforward.
Setup and Integration
-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create embeddings table
CREATE TABLE product_embeddings (
id SERIAL PRIMARY KEY,
product_id VARCHAR(255) UNIQUE NOT NULL,
embedding vector(1536), -- For text-embedding-3-small
chunk_content TEXT,
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
-- Create index for ANN search (HNSW)
CREATE INDEX ON product_embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);Query:
SELECT
product_id,
chunk_content,
1 - (embedding <=> query_embedding) as similarity
FROM product_embeddings
WHERE metadata->>'category' = 'valves' -- Metadata filtering
ORDER BY embedding <=> query_embedding -- Cosine distance
LIMIT 10;Advantages
Cost: No per-query fees. You pay for your Postgres infrastructure, period. For a product catalog with 100K products generating 10K queries/day, you're saving thousands per month versus Pinecone.
Metadata Filtering: Full SQL expressiveness. Filter on nested JSON, date ranges, numeric comparisons — anything SQL can express.
Hybrid Search: Combine vector similarity with PostgreSQL's full-text search (using tsvector) in a single query. BM25 scoring is achievable with additional extensions.
-- Hybrid search: semantic + full-text
SELECT
product_id,
COALESCE(v.similarity, 0) * 0.7 + COALESCE(f.ts_rank, 0) * 0.3 as combined_score
FROM (
SELECT product_id, 1 - (embedding <=> $1) as similarity
FROM product_embeddings
LIMIT 100
) v
LEFT JOIN (
SELECT product_id, ts_rank(fts_index, query) as ts_rank
FROM product_fts
WHERE fts_index @@ query
) f USING (product_id)
ORDER BY combined_score DESC
LIMIT 10;Operational Control: Data stays in your infrastructure. No vendor lock-in, no API rate limits, no network calls for every query.
Disadvantages
Scaling Complexity: pgvector works well for catalogs up to ~10M vectors on a single Postgres instance (depending on hardware). Beyond that, you need sharding or read replicas, which adds operational complexity. Pinecone handles this transparently.
Latency: A single Postgres instance will hit latency issues before a managed service. Typical latency is 50–150ms for a retrieval query, compared to Pinecone's 50–150ms but with less predictability under concurrent load.
Maintenance Burden: You're responsible for backups, upgrades, security patches, replication setup, and disaster recovery. Pinecone handles all of this.
ANN Quality Trade-offs: HNSW (the default in pgvector) has tunable parameters (m and ef_construction). Tuning these for your workload takes empirical testing. Pinecone abstracts this away.
When to Use pgvector
- Existing Postgres users: If your application already runs on Postgres, adding pgvector to the same database is minimal friction.
- Small–medium catalogs: Up to ~5–10 million vectors, depending on query QPS.
- Cost-sensitive: You need to handle millions of queries and can't justify Pinecone's per-query pricing.
- Custom filtering: Your retrieval logic requires complex SQL filters that don't map cleanly to Pinecone's metadata field syntax.
Operational Checklist
- [ ] Postgres 11.0+ (pgvector requires 11+)
- [ ] Install pgvector extension
- [ ] Plan embedding dimension (1536 for text-embedding-3-small)
- [ ] Create HNSW index with tuned m and ef_construction
- [ ] Implement batch ingestion for embeddings
- [ ] Set up connection pooling (PgBouncer) for high-concurrency reads
- [ ] Monitor query latency, set up alerting for slow queries
- [ ] Plan replication/backup strategy
Pinecone: Managed Vector Database
Pinecone is a cloud-hosted vector database built from the ground up for semantic search. You create an index, send embeddings, and query it. Scaling and maintenance are handled.
Setup and Integration
import pinecone
# Initialize
pinecone.init(api_key="your-api-key", environment="us-west1-gcp")
# Create index
pinecone.create_index(
name="product-catalog",
dimension=1536, # For text-embedding-3-small
metric="cosine",
metadata_config={"indexed": ["category", "supplier", "price_range"]},
spec=ServerlessSpec(cloud="gcp", region="us-west1")
)
# Upsert vectors
index = pinecone.Index("product-catalog")
index.upsert(vectors=[
("product-1-chunk-0", embedding_1, {"product_id": "1", "category": "valves"}),
("product-1-chunk-1", embedding_2, {"product_id": "1", "category": "valves"}),
])
# Query
results = index.query(
vector=query_embedding,
top_k=10,
filter={"category": {"$eq": "valves"}}, # Metadata filter
include_metadata=True
)Advantages
Operational Simplicity: No infrastructure to manage. You create an index, configure it, and queries work. Pinecone handles replication, failover, and scaling.
Scaling: Pinecone automatically handles millions of vectors and thousands of QPS. You don't think about sharding or replication.
Built-in Hybrid Search: Pinecone's Hybrid Search API integrates BM25 lexical search with vector search, handling the merge and ranking for you.
# Hybrid search (requires text stored in metadata or a separate BM25 index)
results = index.query(
vector=query_embedding,
sparse_vector=bm25_sparse_vector, # From sparse_encode_batch_documents
alpha=0.7, # Weight 70% vector, 30% sparse
top_k=10
)Query Speed: Pinecone achieves impressive latency (p95: <100ms) through highly optimized infrastructure.
Disadvantages
Cost: Pinecone charges per million API calls, typically $0.50–$2.00 per million depending on the plan. A product catalog with 100K products fielding 10K queries/day at 5 documents retrieved per query = 1.5M monthly requests = $750–$3,000/month. At scale, this becomes expensive.
Metadata Filtering Limits: You can filter on indexed metadata fields, but the filtering syntax is restrictive compared to SQL. Complex logic (OR operations, nested conditions) are awkward.
No Pure Full-Text Search: If you want to run full-text-only queries without vectors, Pinecone isn't the right tool. You need a separate BM25 index (Elasticsearch) to implement hybrid search.
Vendor Lock-in: Your embeddings are in Pinecone's indexes. Exporting and migrating to another vector database is non-trivial.
When to Use Pinecone
- Managed simplicity: You don't want to operate Postgres or Kubernetes.
- High-traffic, predictable QPS: Pinecone shines at 1,000+ QPS with consistent latency.
- Hybrid search out of the box: If you want vector + BM25 in a single tool.
- Early-stage product: You're more concerned with speed to market than cost optimization.
Weaviate: Open Source + Managed Hybrid
Weaviate is an open-source vector database written in Go, available both as a self-hosted deployment (Kubernetes) and as a managed cloud service. It's known for strong hybrid search support.
Setup and Integration (Managed Cloud)
import weaviate
from weaviate.auth import AuthApiKey
# Connect to cloud instance
client = weaviate.connect_to_weaviate_cloud(
cluster_url="https://your-instance.weaviate.network",
auth_credentials=AuthApiKey("your-api-key"),
)
# Define schema with hybrid search enabled
collection_definition = {
"class": "ProductChunk",
"properties": [
{"name": "content", "dataType": ["text"]},
{"name": "product_id", "dataType": ["string"]},
{"name": "category", "dataType": ["string"]},
{"name": "price", "dataType": ["number"]},
],
"vectorizer": "text2vec-openai", # Use OpenAI embeddings or bring your own
"vectorizerConfig": {
"apiVersion": "v1",
"properties": ["content"]
}
}
client.collections.create(collection_definition)
# Query with hybrid search
results = client.collections.get("ProductChunk").query.hybrid(
query="valve for high pressure applications",
where={
"path": ["category"],
"operator": "Equal",
"valueString": "valves"
},
alpha=0.7, # 70% semantic, 30% full-text
limit=10
).objectsAdvantages
Hybrid Search Native: Weaviate's hybrid queries combine BM25 and vector search in a single retrieval operation, with configurable weighting (alpha parameter).
GraphQL Interface: Queries are expressed in GraphQL, which is more expressive than REST APIs for complex filtering and field selection.
Self-Hosted Option: Run it on your own Kubernetes cluster if you need full control. This eliminates vendor lock-in concerns and can be cost-effective at scale.
Open Source: The codebase is available, so you can audit it, contribute, or maintain a fork if needed.
Disadvantages
Operational Complexity (Self-Hosted): Running Weaviate on Kubernetes requires Kubernetes expertise. Managed Weaviate Cloud is simpler but less common than Pinecone.
Smaller Ecosystem: Fewer pre-built integrations and fewer examples than Pinecone or pgvector.
Latency: Weaviate tends to have slightly higher latency than optimized pgvector setups, though lower than untuned Postgres deployments.
Documentation: While good, not as polished as Pinecone's.
When to Use Weaviate
- Hybrid search is core: You want BM25 + vector search as a first-class feature.
- Self-hosted preference: You want to run it in your own infrastructure on Kubernetes.
- GraphQL preference: Your team is comfortable with GraphQL and wants that interface.
- Open source: You value having the source code accessible.
Practical Decision Framework
Start here if you...
| Scenario | Choice |
|---|---|
| Run PostgreSQL for your app, prefer single-system operations | pgvector |
| Need minimum time-to-market, willing to pay per query | Pinecone |
| Hybrid search is a must-have, want self-hosting option | Weaviate |
| Expect 10K+ queries/day, cost-conscious | pgvector |
| Expect 1,000+ concurrent users, low operational tolerance | Pinecone |
| Building an internal tool with complex filtering | pgvector |
Implementation Reality Check
Most production deployments actually combine multiple systems:
Common Architecture: Use Pinecone for the primary product search (fast, managed, reliable) + maintain PostgreSQL as the source of truth with pgvector as a secondary index for analytics/debugging. Cost trade-off: Pinecone for user-facing queries, pgvector for internal systems.
Advanced Setup: Elasticsearch for BM25 + pgvector for vector search + application-level merge (RRF) = maximum control, maximum complexity.
For most B2B product catalogs with 50K–500K products, a single vector database (Pinecone or pgvector) is sufficient. The choice is about operational preference and cost tolerance, not capability.
Turn your product catalog into an AI knowledge base
Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.
Related articles
Why Session Memory Matters for Repeat B2B Buyers, and How to Design It Without Breaking Trust
The strongest B2B product AI systems do not treat every conversation like a cold start. They use session memory to preserve buyer context, speed up repeat interactions, and improve recommendation quality, while staying grounded in live product data and clear trust boundaries.
Unit Normalization in B2B Product AI: Why 1/2 Inch, DN15, and 15 mm Should Mean the Same Thing
B2B product AI breaks fast when dimensions, thread sizes, pack quantities, and engineering units are stored in inconsistent formats. Here is how to design unit normalization that improves retrieval, filtering, substitutions, and answer accuracy.
Source-Aware RAG: How to Combine PIM, PDFs, ERP, and Policy Content Without Conflicting Answers
Most product AI failures are not caused by weak models, but by mixing sources with different authority levels. Here is how B2B teams design source-aware RAG that keeps specs, availability, pricing rules, and policy answers aligned.