Vector databases became critical with generative AI. Storing embeddings + fast similarity search = foundation of RAG, semantic search, recommendation. 3 dominant 2026 options: pgvector (Postgres extension), Pinecone (SaaS), Qdrant (open-source self-host).
TL;DR
- pgvector: Postgres extension, simplicity, $0 hosting if Postgres exists.
- Pinecone: managed SaaS, massive scalability, $70+/month.
- Qdrant: open-source self-host, performance + control.
Detailed comparison
| Criterion | pgvector | Pinecone | Qdrant |
|---|---|---|---|
| Model | Postgres extension | Cloud SaaS | Open-source + cloud |
| Setup | 30 sec | 5 min | 1-4h self-host |
| Starter cost | 0 (Postgres exists) | $0 (free tier 5 indexes) | 0 (self-host) |
| 1M vectors scaling | OK | OK | OK |
| 100M vectors scaling | Hard | Excellent | OK |
| 1B vectors scaling | No | OK ($$$) | OK |
| p95 latency | 50-200ms | 20-50ms | 30-100ms |
| Hybrid SQL filters | ✓ excellent | Limited | Good |
| API | SQL | REST | REST + gRPC |
| Lock-in | None | High | None |
When to pick pgvector
Ideal cases:
- Project already with Postgres
- <10M vectors
- Complex SQL filters (per tenant, date, etc.)
- Tight budget
Setup:
`sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536),
organization_id UUID,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX ON documents (organization_id);
`
`ts
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: documentContent,
});
await prisma.$executeRaw`
INSERT INTO documents (content, embedding, organization_id)
VALUES (${content}, ${embedding.data[0].embedding}::vector, ${orgId})
`;
const results = await prisma.$queryRaw`
SELECT id, content, 1 - (embedding <=> ${queryEmbedding}::vector) as similarity
FROM documents
WHERE organization_id = ${orgId}
ORDER BY embedding <=> ${queryEmbedding}::vector
LIMIT 10
`;
`
When to pick Pinecone
Ideal cases:
- Scale >50M vectors
- No Postgres
- Multi-region serverless
- Critical latency (<50ms)
Setup:
`ts
import { Pinecone } from '@pinecone-database/pinecone';
const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pc.index('kolonell-docs');
await index.namespace(orgId).upsert([
{
id: docId,
values: embedding,
metadata: { content, createdAt: new Date().toISOString() },
},
]);
const results = await index.namespace(orgId).query({
vector: queryEmbedding,
topK: 10,
includeMetadata: true,
filter: { createdAt: { '$gte': '2026-01-01' } },
});
`
2026 pricing:
- Free: 5 indexes, 100K vectors
- Standard: $70/month (1 pod)
- Enterprise: custom
When to pick Qdrant
Ideal cases:
- Open-source preference
- Self-host for data sovereignty
- Critical performance
- Hybrid search (vector + filters)
Self-host Hetzner setup:
`bash
docker run -d --name qdrant \
-p 6333:6333 \
-v qdrant_data:/qdrant/storage \
Need a professional website?
Kolonell builds websites that attract clients, optimized for the Sénégalese market. Free quote in 2 minutes.
qdrant/qdrant:latest
`
`ts
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'https://qdrant.kolonell.com' });
await client.createCollection('documents', {
vectors: {
size: 1536,
distance: 'Cosine',
},
optimizers_config: { default_segment_number: 2 },
});
await client.upsert('documents', {
points: [
{
id: docId,
vector: embedding,
payload: { content, organizationId: orgId },
},
],
});
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 10,
filter: {
must: [{ key: 'organizationId', match: { value: orgId } }],
},
});
`
Self-host cost: $20-50/month Hetzner CX31. 50M+ vectors capacity.
Standard RAG implementation
`ts
async function ragQuery(question: string, orgId: string) {
const questionEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: question,
});
const relevantDocs = await searchVector(questionEmbedding, orgId, 5);
const context = relevantDocs.map(d => d.content).join('\n\n');
const response = await anthropic.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
system: Answer based on context only. If not in context, say "Not in the documentation".,
messages: [{
role: 'user',
content: Context:\n${context}\n\nQuestion: ${question},
}],
});
return {
answer: response.content[0].text,
sources: relevantDocs.map(d => d.id),
};
}
`
Real case — Africa SaaS support RAG
1M vectors knowledge base perf comparison:
| Metric | pgvector | Pinecone | Qdrant |
|---|---|---|---|
| Setup time | 30 min | 1h | 4h |
| Monthly cost | $0 (Postgres exists) | $70 | $25 (Hetzner) |
| p95 latency | 95ms | 35ms | 60ms |
| Tenant filtering | Excellent | Limited | Good |
| Monthly maintenance | 0 (managed Postgres) | 0 | 1-2h |
Optimal choice this case: pgvector (Postgres + SQL filters).
Common pitfalls
- Wrong embedding model — text-embedding-3-small (1536d) good default. Don't mix models.
- No HNSW index — full scan 1M vectors = 5-10 seconds vs 50ms with index.
- No chunk strategy — split long docs in 500-1000 token chunks before embedding.
- No re-ranking — top 50 vector search → re-rank with cross-encoder = +30% quality.
- Embedding cost ignored — $0.10/1M tokens add up. Cache embedding by document hash.
FAQ
Q: Open-source embeddings vs OpenAI?
A: Sentence-transformers (bge-large-en) free and good. OpenAI simpler + slightly higher quality.
Q: Migrate between vector DBs?
A: Possible but costly. Re-embedding may be needed if dimensions differ.
Q: How many chunks?
A: 500-1000 tokens per chunk with 50-100 tokens overlap. Test for your domain.
Conclusion
2026 vector DB choice:
- Postgres exists + <10M vectors: pgvector (clear winner)
- Massive scale + multi-region: Pinecone
- Sovereignty + open-source: Qdrant self-host
Migration possible. Start well with pgvector if Postgres already exists.
Mohamed Bah
Fondateur, Kolonell
Passionate about digital and entrepreneurship in Africa, Mohamed has been helping Sénégalese businesses with their digital transformation since 2020. Founder of Kolonell, he believes every SME deserves a professional and accessible online présence.