Language:English VersionChinese Version

Why Postgres Might Already Contain Half Your Infrastructure

Most teams run Postgres for relational data, then reach for Redis for caching, a separate service for vector search, a cron scheduler, a geospatial database, and a message queue. What if four or five of those were already sitting inside your Postgres installation, waiting to be enabled with a single CREATE EXTENSION?

This isn’t hype. The Postgres extension ecosystem has matured to the point where specialized extensions can replace entire external services — with lower operational overhead, simpler architecture, and often comparable performance. This guide covers the extensions actually worth deploying, what they replace, and where they fall short.

pgvector: Kill Your Standalone Vector Database

If you’re building AI-powered features in 2026, you’ve probably looked at Pinecone, Weaviate, or Qdrant. Before you sign up, check whether pgvector covers your use case — it does for most teams.

pgvector adds a vector data type and efficient similarity search (HNSW and IVFFlat indexes) to Postgres. Installation:

-- On Ubuntu/Debian
sudo apt install postgresql-16-pgvector

-- Then in psql
CREATE EXTENSION vector;

Creating a table with embeddings and querying by cosine similarity:

CREATE TABLE documents (
  id        bigserial PRIMARY KEY,
  content   text,
  embedding vector(1536)  -- OpenAI ada-002 dimensions
);

-- Create an HNSW index for fast approximate search
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Find the 5 most similar documents
SELECT id, content,
       1 - (embedding <=> '[0.1,0.2,...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1,0.2,...]'::vector
LIMIT 5;

The killer feature is joining vector search with structured filters — something that external vector databases handle awkwardly:

-- Semantic search within a specific user's documents
SELECT d.id, d.content,
       1 - (d.embedding <=> $1::vector) AS similarity
FROM documents d
WHERE d.user_id = $2
  AND d.created_at > NOW() - INTERVAL '30 days'
ORDER BY d.embedding <=> $1::vector
LIMIT 10;

This query would require either a pre-filter in a standalone vector DB (losing recall) or a two-step fetch-then-filter approach. In Postgres, it’s a single query with full ACID guarantees.

When pgvector isn’t enough

pgvector starts showing strain above ~5 million vectors with sub-10ms latency requirements. If you’re at that scale, dedicated systems like Qdrant or Weaviate offer better throughput. For most applications — especially early-stage — pgvector is the right starting point.

pg_cron: Replace Your External Job Scheduler

Cron jobs that touch databases are usually managed externally: a Linux crontab entry that runs a script, an AWS EventBridge rule, a Kubernetes CronJob. Each adds operational overhead and a failure domain. pg_cron runs scheduled jobs directly inside Postgres.

-- Install
CREATE EXTENSION pg_cron;

-- Schedule a cleanup job every night at 3am UTC
SELECT cron.schedule(
  'cleanup-old-sessions',
  '0 3 * * *',
  $$DELETE FROM user_sessions WHERE expires_at < NOW() - INTERVAL '7 days'$$
);

-- Run a data rollup every hour
SELECT cron.schedule(
  'hourly-stats-rollup',
  '0 * * * *',
  $$INSERT INTO hourly_stats SELECT date_trunc('hour', created_at), COUNT(*), SUM(amount)
    FROM orders WHERE created_at >= date_trunc('hour', NOW()) - INTERVAL '1 hour'
    AND created_at < date_trunc('hour', NOW())
    GROUP BY 1$$
);

-- View scheduled jobs
SELECT * FROM cron.job;

-- View recent run history
SELECT * FROM cron.job_run_details ORDER BY start_time DESC LIMIT 20;

The configuration for pg_cron requires adding it to shared_preload_libraries in postgresql.conf:

shared_preload_libraries = 'pg_cron'
cron.database_name = 'myapp'

Practical advantages over external schedulers

  • No network hop — the job runs where the data lives
  • Jobs run in a transaction — failed jobs leave no partial state
  • Job history is in the database — queryable, joinable with other data
  • No separate credentials or connection management

The limitation: pg_cron only runs on the primary node in a replica setup. If the primary fails, scheduled jobs don’t run until failover completes. For truly critical scheduled work, keep an external fallback.

PostGIS: Geographic Queries Without a Separate Geo Service

PostGIS turns Postgres into a full-featured spatial database. If your application stores locations, routes, service areas, or any geographic data, PostGIS eliminates the need for a separate mapping/geo service for most backend queries.

CREATE EXTENSION postgis;

-- Store business locations
CREATE TABLE businesses (
  id      bigserial PRIMARY KEY,
  name    text,
  location geometry(Point, 4326)  -- WGS84 coordinates
);

-- Add a spatial index
CREATE INDEX businesses_location_idx
  ON businesses USING GIST (location);

-- Find all businesses within 500 meters of a point
SELECT name,
       ST_Distance(
         location::geography,
         ST_SetSRID(ST_MakePoint(-73.9847, 40.7490), 4326)::geography
       ) AS distance_meters
FROM businesses
WHERE ST_DWithin(
  location::geography,
  ST_SetSRID(ST_MakePoint(-73.9847, 40.7490), 4326)::geography,
  500
)
ORDER BY distance_meters;

More complex geospatial operations:

-- Check if a point is inside a delivery zone polygon
SELECT z.name AS zone_name
FROM delivery_zones z
WHERE ST_Within(
  ST_SetSRID(ST_MakePoint($1, $2), 4326),
  z.boundary
);

pg_partman: Automated Table Partitioning

Time-series tables grow unbounded without partitioning. pg_partman automates creating and maintaining partitioned tables by time or serial range — replacing what many teams solve with custom scripts or a separate time-series database.

CREATE EXTENSION pg_partman;

-- Create a partitioned events table
CREATE TABLE events (
  id         bigserial,
  event_type text,
  payload    jsonb,
  created_at timestamptz NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Hand it to pg_partman: create monthly partitions, keep 12 months
SELECT partman.create_parent(
  p_parent_table => 'public.events',
  p_control      => 'created_at',
  p_type         => 'native',
  p_interval     => 'monthly',
  p_premake      => 3
);

-- Configure retention: drop partitions older than 12 months
UPDATE partman.part_config
SET retention = '12 months',
    retention_keep_table = false
WHERE parent_table = 'public.events';

-- Run maintenance (schedule this with pg_cron)
SELECT partman.run_maintenance();

pg_trgm: Full-Text Search and Fuzzy Matching

Before standing up Elasticsearch or Typesense for search, try pg_trgm. It enables trigram-based similarity search and dramatically speeds up LIKE/ILIKE queries on text columns.

CREATE EXTENSION pg_trgm;

-- Create a GIN index for fast LIKE queries
CREATE INDEX products_name_trgm_idx
  ON products USING GIN (name gin_trgm_ops);

-- Fast LIKE search (uses the index)
SELECT * FROM products
WHERE name ILIKE '%wireless headphones%'
ORDER BY similarity(name, 'wireless headphones') DESC
LIMIT 20;

-- Fuzzy search: handles typos
SELECT name, similarity(name, 'airpods pro') AS score
FROM products
WHERE similarity(name, 'airpods pro') > 0.3
ORDER BY score DESC
LIMIT 10;

Combined with Postgres full-text search (tsvector/tsquery), this covers the majority of search use cases without Elasticsearch:

-- Full-text search with ranking
SELECT name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'wireless & headphones') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Combining Extensions: The Real Power

The compounding benefit is that these extensions work together. A single query can do vector similarity search with geospatial filtering, scheduled via pg_cron, on a partitioned table managed by pg_partman — all with full transactional consistency.

-- Find restaurants near a location that match cuisine preferences (embeddings)
SELECT r.id, r.name,
  ST_Distance(r.location::geography, $1::geography) AS distance_m,
  1 - (r.cuisine_embedding <=> $2::vector) AS preference_match
FROM restaurants r
WHERE ST_DWithin(r.location::geography, $1::geography, 2000)
  AND r.is_active = true
ORDER BY
  (1 - (r.cuisine_embedding <=> $2::vector)) * 0.6
  + (1 - LEAST(ST_Distance(r.location::geography, $1::geography) / 2000, 1)) * 0.4
DESC
LIMIT 10;

Try doing that as a single atomic query across Pinecone + PostGIS + your relational database.

The Honest Trade-off Assessment

Postgres extensions are not always the right answer. Here’s an honest breakdown:

  • pgvector vs. Pinecone/Qdrant: Use pgvector under ~5M vectors. Above that, dedicated vector DBs win on throughput.
  • pg_cron vs. Temporal/Airflow: pg_cron is ideal for simple periodic SQL tasks. Complex multi-step workflows with retries and branching need a dedicated orchestrator.
  • PostGIS vs. specialized geo APIs: PostGIS handles backend queries brilliantly. For routing, map rendering, or geocoding, you still need external services.
  • pg_trgm vs. Elasticsearch: pg_trgm covers search for most apps. Elasticsearch wins at large scale, faceted search, and relevance tuning.

Getting Started: Audit Your Current Stack

The exercise worth doing: list every non-Postgres service in your stack and ask whether a Postgres extension covers 80% of the use case. For most teams with under 10 million records per service, the answer is yes far more often than expected.

Start with pgvector if you’re doing any AI/embedding work. Add pg_cron to eliminate your cron scripts. The operational simplicity — one database to monitor, backup, scale — compounds over time. Every service you don’t run is a failure domain you don’t have.

The goal isn’t to run everything in Postgres forever. It’s to avoid premature architectural complexity while you’re still learning what your application actually needs.

By Michael Sun

Founder and Editor-in-Chief of NovVista. Software engineer with hands-on experience in cloud infrastructure, full-stack development, and DevOps. Writes about AI tools, developer workflows, server architecture, and the practical side of technology. Based in China.

Leave a Reply

Your email address will not be published. Required fields are marked *