PostgreSQL Performance Tuning: Indexes, Query Optimization & Scaling
Rating
Verdict
PostgreSQL is the best open-source relational database available. Master indexes and EXPLAIN ANALYZE first — 90% of performance issues are solved at the query level before you ever need to scale hardware.
Pros
- EXPLAIN ANALYZE reveals exact bottlenecks
- Partial and composite indexes are extremely flexible
- JSONB with GIN indexes handles semi-structured data brilliantly
- pg_stat_statements gives visibility into slow queries
Cons
- N+1 query problems are easy to miss in ORMs
- Index maintenance has write overhead
- Partitioning requires careful upfront planning
PostgreSQL Performance Tuning: Indexes, Query Optimization & Scaling
PostgreSQL is one of the most capable open-source databases available, but raw capability doesn't equal performance — that requires understanding the query planner, indexing strategies, and the right configuration for your workload. This guide walks through every layer of the PostgreSQL performance stack with practical, runnable examples.
Understanding EXPLAIN ANALYZE
EXPLAIN ANALYZE is the most powerful tool in your PostgreSQL arsenal. It shows the actual execution plan with real timing data:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY post_count DESC
LIMIT 20;
Key things to look for in the output:
- Seq Scan — Full table scan, usually means a missing index
- Index Scan — Good, using an index
- Index Only Scan — Best, all data from the index itself
- Hash Join / Nested Loop / Merge Join — Join strategies
- actual time — Where your query is actually spending time
- rows removed by filter — High number = poor selectivity
Indexing Strategies
B-tree Indexes (default)
-- Basic index
CREATE INDEX idx_users_email ON users(email);
-- Composite index (order matters! most selective first)
CREATE INDEX idx_posts_user_status ON posts(user_id, status, created_at DESC);
-- Index on expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Enables: WHERE LOWER(email) = 'alice@example.com'
-- Partial index — only index a subset of rows
CREATE INDEX idx_posts_published ON posts(created_at DESC)
WHERE status = 'published';
-- Much smaller index, highly selective for published posts query
GIN Indexes for JSONB and Arrays
-- JSONB column with GIN index
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
attributes JSONB
);
CREATE INDEX idx_products_attributes ON products USING GIN (attributes);
-- Now these are fast:
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
SELECT * FROM products WHERE attributes ? 'warranty';
SELECT * FROM products WHERE attributes ->> 'brand' = 'Apple';
-- Array column with GIN index
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags && ARRAY['postgresql', 'database'];
Covering Indexes
Include all columns needed by a query to enable Index Only Scans:
-- Without covering index: Index Scan + Heap fetch
CREATE INDEX idx_users_email ON users(email);
SELECT id, name FROM users WHERE email = 'alice@example.com';
-- With covering index: Index Only Scan (much faster)
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (id, name);
Query Optimization Patterns
Eliminating N+1 Queries
-- ✗ BAD: N+1 — 1 query for users + N queries for posts
SELECT id, name FROM users LIMIT 100;
-- Then for each user:
SELECT * FROM posts WHERE user_id = $1;
-- ✓ GOOD: Single query with JOIN
SELECT
u.id,
u.name,
json_agg(
json_build_object('id', p.id, 'title', p.title)
ORDER BY p.created_at DESC
) FILTER (WHERE p.id IS NOT NULL) AS posts
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.id = ANY($1::int[])
GROUP BY u.id, u.name;
CTEs and Subquery Optimization
-- Use CTEs for readability, but watch for optimization fences (PG < 12)
WITH recent_active_users AS MATERIALIZED (
SELECT id FROM users
WHERE last_login > NOW() - INTERVAL '30 days'
AND status = 'active'
),
user_stats AS (
SELECT
user_id,
COUNT(*) as post_count,
MAX(created_at) as last_post
FROM posts
WHERE user_id IN (SELECT id FROM recent_active_users)
GROUP BY user_id
)
SELECT
u.name,
u.email,
s.post_count,
s.last_post
FROM users u
JOIN user_stats s ON s.user_id = u.id
ORDER BY s.post_count DESC;
Window Functions
-- Row number, rank, running totals — all in one pass
SELECT
id,
user_id,
title,
created_at,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn,
COUNT(*) OVER (PARTITION BY user_id) AS total_posts,
SUM(view_count) OVER (PARTITION BY user_id ORDER BY created_at) AS running_views
FROM posts
WHERE status = 'published';
Connection Pooling with PgBouncer
PostgreSQL spawns a process per connection — expensive at scale. PgBouncer maintains a pool of connections to the database and reuses them:
# pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/users.txt
pool_mode = transaction ; Best for most web apps
default_pool_size = 20
max_client_conn = 1000
server_lifetime = 3600
server_idle_timeout = 600
log_connections = 0
log_disconnections = 0
Key PostgreSQL Configuration
# postgresql.conf tuning (16GB RAM server example)
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
work_mem = 64MB # Per-sort/hash operation
maintenance_work_mem = 1GB # For VACUUM, CREATE INDEX
wal_buffers = 64MB
checkpoint_completion_target = 0.9
random_page_cost = 1.1 # SSD storage
effective_io_concurrency = 200 # SSD concurrent I/O
max_connections = 100 # Use PgBouncer for more
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
autovacuum_vacuum_scale_factor = 0.02
autovacuum_analyze_scale_factor = 0.01
Table Partitioning
-- Range partitioning for time-series data (events table)
CREATE TABLE events (
id BIGSERIAL,
user_id INT NOT NULL,
event_type TEXT NOT NULL,
metadata JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Index each partition
CREATE INDEX ON events_2024_01(user_id, created_at);
CREATE INDEX ON events_2024_02(user_id, created_at);
-- Queries against specific date ranges hit only relevant partitions
SELECT * FROM events
WHERE created_at BETWEEN '2024-01-15' AND '2024-01-31'
AND user_id = 42;
Monitoring with pg_stat_statements
-- Enable in postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'
-- Find the 10 slowest queries by total execution time
SELECT
query,
calls,
ROUND(total_exec_time::numeric, 2) AS total_ms,
ROUND(mean_exec_time::numeric, 2) AS mean_ms,
ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Find queries with high I/O
SELECT query, shared_blks_hit, shared_blks_read,
ROUND(100.0 * shared_blks_hit /
NULLIF(shared_blks_hit + shared_blks_read, 0), 2) AS cache_hit_ratio
FROM pg_stat_statements
WHERE calls > 100
ORDER BY shared_blks_read DESC
LIMIT 10;
Conclusion
PostgreSQL performance optimization is a layered problem: fix queries first (EXPLAIN ANALYZE, proper indexes), then tune configuration (shared_buffers, work_mem), then scale out (read replicas, connection pooling). The 80/20 rule applies strongly — most performance wins come from finding and fixing slow queries with missing indexes, not from hardware upgrades.