What is pgvector in Supabase?

Quick Answer: pgvector is an open-source Postgres extension that adds a `vector` column type and similarity search operators (cosine, L2, inner product) for high-dimensional embeddings. Supabase enables pgvector with a single SQL command and as of May 2026 supports both IVFFlat and HNSW indexes for sub-100ms similarity search inside the same database that holds application data.

What pgvector Is

pgvector is an open-source Postgres extension, originally written by Andrew Kane and first released in 2021, that adds vector similarity search capabilities to Postgres. It introduces a vector(N) column type holding N-dimensional floating-point vectors, plus operators for cosine distance, L2 (Euclidean) distance, and inner product.

How Supabase Exposes It

Supabase enables the extension with a single SQL command:

create extension if not exists vector;

Once enabled, application tables can declare embedding columns:

create table documents (
  id bigserial primary key,
  content text,
  embedding vector(1536)
);

Index Types

As of May 2026, pgvector supports two index types:

  • IVFFlat: inverted file with flat compression. Fast to build, good recall on small-to-medium datasets
  • HNSW (Hierarchical Navigable Small World): slower to build, faster to query, default on new Supabase projects since 2025

For corpora under roughly 1M rows, IVFFlat is usually sufficient. Above that scale, HNSW typically delivers 3-10x lower query latency at the cost of higher index build time and memory usage.

Common Use Cases

Typical Supabase + pgvector applications include:

  • Semantic search across documentation or knowledge bases
  • Retrieval-augmented generation (RAG) for chat applications
  • Recommendation systems based on item embeddings
  • Duplicate detection across user-generated content

Why Use pgvector vs a Dedicated Vector DB

Keeping vectors in the same Postgres instance as application data simplifies operations. Backups, restores, and row-level security all use the same database. JOINs across embeddings and structured data (filtering by user_id, tenant_id, or product_category before similarity search) are first-class.

The trade-off appears at very large scale: at billions of vectors with sub-50ms latency requirements, dedicated vector databases like Pinecone, Weaviate, or Qdrant typically outperform pgvector. For the majority of production AI applications below that scale, pgvector is the simpler and cheaper choice.

Related Questions

Last updated: | By Rafal Fila

Related Tools

Related Rankings

Dive Deeper