Postgres Extensions You Should Know: pg_stat_statements, TimescaleDB, and Beyond

PostgreSQL ships with a solid core, but extensions are where it transforms from a reliable relational database into something far more versatile. The extension ecosystem is one of Postgres's greatest competitive advantages over MySQL, SQL Server, and even managed databases like DynamoDB or Firestore. You can bolt on time-series capabilities, full-text search in 30 languages, geospatial queries, columnar storage, and real-time analytics — all without leaving Postgres.

But with over 1,000 extensions available through PGXN, GitHub, and vendor repositories, knowing which ones actually matter is half the battle. This guide covers 10 extensions that backend engineers and DBAs should know in 2026, with real configuration examples, performance considerations, and honest assessments of when each one is (and is not) the right tool.

1. pg_stat_statements: The Extension Everyone Should Enable

If you run Postgres in production and have not enabled pg_stat_statements, stop reading and go enable it now. It tracks execution statistics for every SQL statement your database runs — total time, number of calls, rows returned, block I/O, and more.

Setup

# In postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all

# Then restart Postgres and create the extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Finding Your Slowest Queries

SELECT
  query,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

This single query has saved me more hours of debugging than any APM tool. The total_exec_time column tells you where your database spends the most cumulative time — a query that runs in 2ms but executes 500,000 times per day will show up here, and that is exactly the kind of query you want to optimize.

Key Metrics to Watch

mean_exec_time vs. stddev_exec_time: High standard deviation means inconsistent performance — likely due to table bloat, lock contention, or parameter-dependent plan changes.
shared_blks_hit vs. shared_blks_read: The ratio tells you your buffer cache hit rate per query. If shared_blks_read is high relative to shared_blks_hit, that query is hitting disk.
rows / calls: Average rows per execution. If a query returns 50,000 rows on average, you probably need pagination or a more specific WHERE clause.

In Postgres 17, pg_stat_statements gained the toplevel column, which distinguishes between top-level queries and those executed inside functions or procedures. This is significant for teams that use PL/pgSQL heavily — you can now see which function calls are expensive without guessing.

2. TimescaleDB: Time-Series Without a Separate Database

TimescaleDB turns Postgres into a time-series database. It partitions data into “chunks” by time interval and adds time-aware query optimizations, compression, and continuous aggregates.

When to Use It

If you are storing metrics, IoT sensor data, financial tick data, event logs, or any append-heavy workload with a timestamp primary axis, TimescaleDB is worth evaluating before reaching for InfluxDB, QuestDB, or ClickHouse.

Setup and Basic Usage

CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Convert a regular table to a hypertable
CREATE TABLE sensor_readings (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INTEGER NOT NULL,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);

SELECT create_hypertable('sensor_readings', by_range('time'));

Behind the scenes, TimescaleDB partitions sensor_readings into chunks (default: 7-day intervals). Queries that filter on time only scan relevant chunks, which is dramatically faster than a full table scan on a monolithic table with billions of rows.

Compression

TimescaleDB's native compression is one of its strongest features. On typical IoT workloads, expect 10-20x compression ratios:

ALTER TABLE sensor_readings SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'sensor_id',
  timescaledb.compress_orderby = 'time DESC'
);

-- Compress chunks older than 7 days automatically
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

A team I worked with migrated from InfluxDB 2.x to TimescaleDB 2.14 in mid-2025. Their primary motivation was operational simplicity — they already ran Postgres for their application data, and adding TimescaleDB meant one fewer database to manage. Query performance was comparable for their workload (50,000 inserts/second, sub-second aggregation queries over 90-day windows). Storage costs dropped 40% due to better compression.

Continuous Aggregates

CREATE MATERIALIZED VIEW hourly_temperatures
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', time) AS bucket,
  sensor_id,
  AVG(temperature) AS avg_temp,
  MAX(temperature) AS max_temp,
  MIN(temperature) AS min_temp
FROM sensor_readings
GROUP BY bucket, sensor_id;

-- Auto-refresh every hour
SELECT add_continuous_aggregate_policy('hourly_temperatures',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);

Continuous aggregates are incrementally maintained materialized views. Unlike standard Postgres materialized views that must be fully refreshed, TimescaleDB only processes new data since the last refresh. This makes them practical for dashboards that need sub-second response times over months of data.

3. pgvector: Vector Search in Postgres

The AI/ML boom made vector databases a hot category, but for most applications, pgvector eliminates the need for a dedicated vector store like Pinecone or Weaviate.

Setup

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT,
  embedding vector(1536)  -- OpenAI text-embedding-3-small dimension
);

-- Create an HNSW index for fast approximate nearest neighbor search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

Querying

-- Find the 10 most similar documents to a query embedding
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;

As of pgvector 0.8.0 (released January 2026), HNSW index build times improved by roughly 2x over version 0.6, and the extension supports quantization (binary and scalar) to reduce memory usage. For datasets under 10 million vectors, pgvector performs within 10-15% of dedicated vector databases on standard recall@10 benchmarks — and you get the enormous advantage of running vector search alongside your relational data in a single transaction.

When pgvector Falls Short

If you need to index 100M+ vectors with sub-10ms p99 latency, dedicated vector databases still have an edge. Also, pgvector's filtering (WHERE clause + vector similarity) can be slow when the filter is very selective, because the HNSW index does not natively support pre-filtering. The workaround is to use partial indexes or the ivfflat index type with appropriate probes.

4. PostGIS: The Gold Standard for Geospatial

PostGIS has been the de facto standard for geospatial queries in relational databases for over 20 years. If your application stores locations, boundaries, routes, or any geometry, PostGIS is almost certainly the right choice.

CREATE EXTENSION IF NOT EXISTS postgis;

-- Find all restaurants within 2km of a point
SELECT name, ST_Distance(
  location::geography,
  ST_SetSRID(ST_MakePoint(-73.9857, 40.7484), 4326)::geography
) AS distance_meters
FROM restaurants
WHERE ST_DWithin(
  location::geography,
  ST_SetSRID(ST_MakePoint(-73.9857, 40.7484), 4326)::geography,
  2000
)
ORDER BY distance_meters;

PostGIS 3.5 (current stable as of early 2026) added parallel support for more spatial joins and improved GEOS 3.13 integration for faster polygon operations. If you are upgrading from PostGIS 3.2 or earlier, the performance improvement on complex ST_Intersection and ST_Union operations is substantial — benchmarks show 30-50% speedups on multi-polygon datasets.

5. pg_partman: Automated Table Partitioning

Native partitioning in Postgres (available since v10) is powerful but requires manual partition management. pg_partman automates partition creation, retention, and maintenance.

CREATE EXTENSION IF NOT EXISTS pg_partman;

-- Create a partitioned table
CREATE TABLE events (
  id BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  event_type TEXT,
  payload JSONB
) PARTITION BY RANGE (created_at);

-- Let pg_partman manage it
SELECT partman.create_parent(
  p_parent_table => 'public.events',
  p_control => 'created_at',
  p_interval => 'monthly',
  p_premake => 3
);

The p_premake => 3 parameter tells pg_partman to create partitions 3 months ahead. A background worker job handles creating new partitions and optionally dropping old ones based on a retention policy.

Retention Policies

-- Keep only 12 months of data
UPDATE partman.part_config
SET retention = '12 months',
    retention_keep_table = false
WHERE parent_table = 'public.events';

For high-volume tables (100M+ rows), partitioning with pg_partman is not optional — it is a requirement. Without it, VACUUM operations become painfully slow, index maintenance balloons, and query planning degrades. A table with 2 billion rows and monthly partitions will have queries that touch only the relevant month's partition (typically 150-200M rows), which is a night-and-day difference for both sequential scans and index lookups.

6. pg_cron: In-Database Job Scheduling

Instead of managing cron jobs on your application server or using an external scheduler, pg_cron runs scheduled tasks directly inside Postgres.

CREATE EXTENSION IF NOT EXISTS pg_cron;

-- Vacuum a table every night at 3 AM UTC
SELECT cron.schedule('nightly-vacuum-events', '0 3 * * *',
  $$VACUUM ANALYZE events$$
);

-- Refresh a materialized view every 15 minutes
SELECT cron.schedule('refresh-dashboard-mv', '*/15 * * * *',
  $$REFRESH MATERIALIZED VIEW CONCURRENTLY dashboard_summary$$
);

-- Delete old sessions every hour
SELECT cron.schedule('cleanup-sessions', '0 * * * *',
  $$DELETE FROM sessions WHERE expires_at < NOW() - INTERVAL '24 hours'$$
);

The appeal is simplicity: the schedule lives in the database, executes in the database, and you can monitor job history with SELECT * FROM cron.job_run_details ORDER BY start_time DESC;. No external dependencies.

One caveat: pg_cron runs jobs as the database superuser by default (or the user specified in the job). Long-running jobs can hold connections and potentially block other operations. For heavy ETL work, an external scheduler like Airflow or Dagster is still the better choice.

7. pgaudit: Compliance-Grade Audit Logging

If you operate in healthcare (HIPAA), finance (SOX, PCI-DSS), or government (FedRAMP), you need audit logs that show who executed what SQL and when. pgaudit provides session and object-level audit logging.

# In postgresql.conf
shared_preload_libraries = 'pgaudit'
pgaudit.log = 'write, ddl'
pgaudit.log_relation = on
pgaudit.log_parameter = on

This configuration logs all write operations (INSERT, UPDATE, DELETE) and DDL statements (CREATE, ALTER, DROP), including the table names and query parameters. The logs go to the standard Postgres log destination, which you can ship to your SIEM via Fluentd, Vector, or the Datadog Agent.

A sample audit log entry looks like:

AUDIT: SESSION,1,1,WRITE,INSERT,,public.users,
  "INSERT INTO users (email, name) VALUES ($1, $2)",
  <john@example.com,John Doe>

For object-level auditing (logging access to specific sensitive tables), use the object audit mode:

-- Audit all access to the payments table
SET pgaudit.role = 'auditor';
GRANT SELECT, INSERT, UPDATE, DELETE ON payments TO auditor;

Now every query touching the payments table is logged, regardless of which user executes it. This is the pattern most compliance teams want.

8. pg_repack: Online Table Reorganization

Postgres tables accumulate bloat over time as UPDATE and DELETE operations leave dead tuples. VACUUM reclaims space, but it cannot reorganize the physical layout of the table or reclaim space back to the OS (only VACUUM FULL does that, and it locks the table exclusively).

pg_repack rebuilds tables and indexes online, without exclusive locks:

-- Install
CREATE EXTENSION IF NOT EXISTS pg_repack;

-- Repack a bloated table (run from command line)
pg_repack -d mydb -t orders --no-superuser-check

-- Repack only indexes
pg_repack -d mydb -t orders --only-indexes

Under the hood, pg_repack creates a new copy of the table, replays changes via a trigger, then swaps the old and new tables in a brief lock. The total lock time is typically under 1 second, even for tables with hundreds of millions of rows.

When You Need It

Check table bloat with this query:

SELECT
  schemaname || '.' || tablename AS table_name,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_size_pretty(
    pg_total_relation_size(schemaname || '.' || tablename) -
    pg_relation_size(schemaname || '.' || tablename)
  ) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;

If a table is significantly larger than the actual data it contains (you can estimate this with pgstattuple), pg_repack is the tool to use. In my experience, running pg_repack on heavily-updated tables monthly keeps bloat under control and maintains consistent query performance.

9. HypoPG: Test Indexes Without Creating Them

Creating an index on a 500M-row table takes time and consumes I/O. HypoPG lets you create hypothetical indexes that exist only in the query planner — no actual index is built, but EXPLAIN will use them when planning queries.

CREATE EXTENSION IF NOT EXISTS hypopg;

-- Create a hypothetical index
SELECT * FROM hypopg_create_index(
  'CREATE INDEX ON orders (customer_id, created_at DESC)'
);

-- Check if the planner would use it
EXPLAIN SELECT * FROM orders
WHERE customer_id = 42
ORDER BY created_at DESC
LIMIT 10;

-- Clean up hypothetical indexes
SELECT hypopg_reset();

This is invaluable for index planning on production databases. Instead of creating an index (which might take 30 minutes on a large table and consume significant I/O), you can verify the planner would actually use it before committing. I have seen teams create expensive indexes that the planner ignores because the selectivity is not high enough — HypoPG prevents that waste.

10. Citus: Distributed Postgres

Citus turns Postgres into a distributed database by sharding tables across multiple nodes. It is now fully open-source (since Microsoft acquired the company and released it under AGPLv3 in 2022) and ships as an extension, not a fork.

CREATE EXTENSION IF NOT EXISTS citus;

-- Add worker nodes
SELECT citus_set_coordinator_host('coordinator.internal', 5432);
SELECT * FROM citus_add_node('worker1.internal', 5432);
SELECT * FROM citus_add_node('worker2.internal', 5432);

-- Distribute a table
SELECT create_distributed_table('orders', 'tenant_id');

-- Queries are automatically distributed
SELECT tenant_id, COUNT(*), SUM(amount)
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY tenant_id;

Multi-Tenant SaaS Use Case

Citus shines for multi-tenant applications where you can shard by tenant_id. Queries that filter on tenant_id are routed to a single shard, which means performance scales linearly as you add tenants — you just add more worker nodes.

A SaaS analytics platform I advised migrated from a single Postgres instance to Citus when their largest tenant's data exceeded 500GB. Before Citus, that tenant's dashboard queries took 8-12 seconds. After distributing by tenant_id across 4 worker nodes, the same queries ran in 1-3 seconds. Total migration effort: about 3 weeks, mostly spent adjusting queries that joined distributed and non-distributed tables.

When Not to Use Citus

If your workload does not have a natural distribution key, Citus introduces complexity without proportional benefit. Cross-shard queries (queries that must touch all shards) are slower than the equivalent query on a single well-tuned instance. For analytics workloads without a clear tenant or partition key, consider columnar storage extensions or a dedicated OLAP database instead.

Honorable Mentions

A few more extensions worth knowing about, even if they did not make the top 10:

pgvector + pg_embedding: If pgvector's HNSW index does not fit your memory budget, pg_embedding from Neon offers DiskANN-based indexes that spill to disk more gracefully.
pg_stat_kcache: Extends pg_stat_statements with OS-level metrics (CPU time, filesystem reads/writes). Requires the perf subsystem on Linux.
pgsodium: Column-level encryption using libsodium. Useful for encrypting PII at rest without application-layer changes.
pglogical: Logical replication with more flexibility than built-in logical replication. Supports selective table replication, replication between different Postgres major versions, and bidirectional replication.
pg_hint_plan: Lets you force specific query plans using SQL hints. A last-resort tool for when the planner makes bad decisions and you cannot restructure the query.

Installation and Compatibility Notes

Not all extensions are available on all managed Postgres providers. Here is a quick compatibility matrix for the major platforms as of early 2026:

Extension	AWS RDS	Google Cloud SQL	Azure Flexible	Neon	Supabase
pg_stat_statements	Yes	Yes	Yes	Yes	Yes
TimescaleDB	No (use Timescale Cloud)	No	No	No	Yes
pgvector	Yes	Yes	Yes	Yes	Yes
PostGIS	Yes	Yes	Yes	Yes	Yes
pg_partman	Yes	No	Yes	No	Yes
pg_cron	Yes	No	Yes	No	Yes
pgaudit	Yes	Yes (as pgAudit)	Yes	No	No
pg_repack	No	No	No	No	No
HypoPG	No	No	No	Yes	Yes
Citus	No	No	Yes (native)	No	No

If you self-host Postgres (or use a provider like Crunchy Data or Percona), all of these extensions are available. The managed provider limitations are the primary reason some teams choose self-hosted Postgres — extension availability is a real constraint.

Performance and Resource Considerations

Extensions are not free. Each one loaded via shared_preload_libraries consumes shared memory and adds overhead to the Postgres startup process. A few guidelines:

pg_stat_statements adds negligible overhead (under 1% CPU) and should always be enabled.
TimescaleDB adds memory overhead for chunk management. Plan for an additional 128-256MB of shared memory on busy instances.
pgvector HNSW indexes are memory-resident. A 1M-vector index with 1536 dimensions consumes roughly 6-8GB of RAM. Size your instance accordingly.
pgaudit with verbose logging can increase WAL volume significantly. If you log all SELECT statements on a busy OLTP database, expect 3-5x more log volume.
Citus adds network overhead for distributed queries. Co-locate worker nodes in the same availability zone to minimize latency.

Wrapping Up

The Postgres extension ecosystem is one of the strongest arguments for choosing Postgres as your primary database. Instead of running five specialized databases (relational + time-series + vector + geospatial + analytics), you can often run one Postgres instance with the right extensions.

That said, “Postgres can do everything” is not the same as “Postgres should do everything.” Extensions add complexity, and each one is another dependency to upgrade and maintain. The pragmatic approach is to start with pg_stat_statements (everyone needs it), add extensions as specific requirements arise, and consider a dedicated system only when the extension approach hits clear performance or operational limits.

The extensions covered here — pg_stat_statements, TimescaleDB, pgvector, PostGIS, pg_partman, pg_cron, pgaudit, pg_repack, HypoPG, and Citus — represent the most battle-tested and widely adopted tools in the ecosystem. Master these ten, and you will be equipped to handle most backend engineering challenges without reaching for another database.

ByMichael Sun

1. pg_stat_statements: The Extension Everyone Should Enable

Setup

Finding Your Slowest Queries

Key Metrics to Watch

2. TimescaleDB: Time-Series Without a Separate Database

When to Use It

Setup and Basic Usage

Compression

Continuous Aggregates

3. pgvector: Vector Search in Postgres

Setup

Querying

When pgvector Falls Short

4. PostGIS: The Gold Standard for Geospatial

5. pg_partman: Automated Table Partitioning

Retention Policies

6. pg_cron: In-Database Job Scheduling

7. pgaudit: Compliance-Grade Audit Logging

8. pg_repack: Online Table Reorganization

When You Need It

9. HypoPG: Test Indexes Without Creating Them

10. Citus: Distributed Postgres

Multi-Tenant SaaS Use Case

When Not to Use Citus

Honorable Mentions

Installation and Compatibility Notes

Performance and Resource Considerations

Wrapping Up

By Michael Sun

Related Post

Leave a Reply Cancel reply

You missed