Database

PostgreSQL Performance Tuning: Indexes, Query Optimization & Scaling

A
Admin
March 9, 2026 • 6 min read • 1,002 words
9
Overall Scoreout of 10

Rating

9/10

Verdict

PostgreSQL is the best open-source relational database available. Master indexes and EXPLAIN ANALYZE first — 90% of performance issues are solved at the query level before you ever need to scale hardware.

Pros

  • EXPLAIN ANALYZE reveals exact bottlenecks
  • Partial and composite indexes are extremely flexible
  • JSONB with GIN indexes handles semi-structured data brilliantly
  • pg_stat_statements gives visibility into slow queries

Cons

  • N+1 query problems are easy to miss in ORMs
  • Index maintenance has write overhead
  • Partitioning requires careful upfront planning

PostgreSQL Performance Tuning: Indexes, Query Optimization & Scaling

PostgreSQL is one of the most capable open-source databases available, but raw capability doesn't equal performance — that requires understanding the query planner, indexing strategies, and the right configuration for your workload. This guide walks through every layer of the PostgreSQL performance stack with practical, runnable examples.

Understanding EXPLAIN ANALYZE

EXPLAIN ANALYZE is the most powerful tool in your PostgreSQL arsenal. It shows the actual execution plan with real timing data:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.id, u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id WHERE u.created_at > '2024-01-01' GROUP BY u.id, u.name ORDER BY post_count DESC LIMIT 20;

Key things to look for in the output:

  • Seq Scan — Full table scan, usually means a missing index
  • Index Scan — Good, using an index
  • Index Only Scan — Best, all data from the index itself
  • Hash Join / Nested Loop / Merge Join — Join strategies
  • actual time — Where your query is actually spending time
  • rows removed by filter — High number = poor selectivity

Indexing Strategies

B-tree Indexes (default)

-- Basic index CREATE INDEX idx_users_email ON users(email); -- Composite index (order matters! most selective first) CREATE INDEX idx_posts_user_status ON posts(user_id, status, created_at DESC); -- Index on expression CREATE INDEX idx_users_lower_email ON users(LOWER(email)); -- Enables: WHERE LOWER(email) = 'alice@example.com' -- Partial index — only index a subset of rows CREATE INDEX idx_posts_published ON posts(created_at DESC) WHERE status = 'published'; -- Much smaller index, highly selective for published posts query

GIN Indexes for JSONB and Arrays

-- JSONB column with GIN index CREATE TABLE products ( id SERIAL PRIMARY KEY, name TEXT, attributes JSONB ); CREATE INDEX idx_products_attributes ON products USING GIN (attributes); -- Now these are fast: SELECT * FROM products WHERE attributes @> '{"color": "blue"}'; SELECT * FROM products WHERE attributes ? 'warranty'; SELECT * FROM products WHERE attributes ->> 'brand' = 'Apple'; -- Array column with GIN index CREATE INDEX idx_posts_tags ON posts USING GIN (tags); SELECT * FROM posts WHERE tags && ARRAY['postgresql', 'database'];

Covering Indexes

Include all columns needed by a query to enable Index Only Scans:

-- Without covering index: Index Scan + Heap fetch CREATE INDEX idx_users_email ON users(email); SELECT id, name FROM users WHERE email = 'alice@example.com'; -- With covering index: Index Only Scan (much faster) CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (id, name);

Query Optimization Patterns

Eliminating N+1 Queries

-- ✗ BAD: N+1 — 1 query for users + N queries for posts SELECT id, name FROM users LIMIT 100; -- Then for each user: SELECT * FROM posts WHERE user_id = $1; -- ✓ GOOD: Single query with JOIN SELECT u.id, u.name, json_agg( json_build_object('id', p.id, 'title', p.title) ORDER BY p.created_at DESC ) FILTER (WHERE p.id IS NOT NULL) AS posts FROM users u LEFT JOIN posts p ON p.user_id = u.id WHERE u.id = ANY($1::int[]) GROUP BY u.id, u.name;

CTEs and Subquery Optimization

-- Use CTEs for readability, but watch for optimization fences (PG < 12) WITH recent_active_users AS MATERIALIZED ( SELECT id FROM users WHERE last_login > NOW() - INTERVAL '30 days' AND status = 'active' ), user_stats AS ( SELECT user_id, COUNT(*) as post_count, MAX(created_at) as last_post FROM posts WHERE user_id IN (SELECT id FROM recent_active_users) GROUP BY user_id ) SELECT u.name, u.email, s.post_count, s.last_post FROM users u JOIN user_stats s ON s.user_id = u.id ORDER BY s.post_count DESC;

Window Functions

-- Row number, rank, running totals — all in one pass SELECT id, user_id, title, created_at, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn, COUNT(*) OVER (PARTITION BY user_id) AS total_posts, SUM(view_count) OVER (PARTITION BY user_id ORDER BY created_at) AS running_views FROM posts WHERE status = 'published';

Connection Pooling with PgBouncer

PostgreSQL spawns a process per connection — expensive at scale. PgBouncer maintains a pool of connections to the database and reuses them:

# pgbouncer.ini [databases] myapp = host=127.0.0.1 port=5432 dbname=myapp [pgbouncer] listen_addr = 0.0.0.0 listen_port = 6432 auth_type = scram-sha-256 auth_file = /etc/pgbouncer/users.txt pool_mode = transaction ; Best for most web apps default_pool_size = 20 max_client_conn = 1000 server_lifetime = 3600 server_idle_timeout = 600 log_connections = 0 log_disconnections = 0

Key PostgreSQL Configuration

# postgresql.conf tuning (16GB RAM server example) shared_buffers = 4GB # 25% of RAM effective_cache_size = 12GB # 75% of RAM work_mem = 64MB # Per-sort/hash operation maintenance_work_mem = 1GB # For VACUUM, CREATE INDEX wal_buffers = 64MB checkpoint_completion_target = 0.9 random_page_cost = 1.1 # SSD storage effective_io_concurrency = 200 # SSD concurrent I/O max_connections = 100 # Use PgBouncer for more max_parallel_workers_per_gather = 4 max_parallel_workers = 8 autovacuum_vacuum_scale_factor = 0.02 autovacuum_analyze_scale_factor = 0.01

Table Partitioning

-- Range partitioning for time-series data (events table) CREATE TABLE events ( id BIGSERIAL, user_id INT NOT NULL, event_type TEXT NOT NULL, metadata JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ) PARTITION BY RANGE (created_at); -- Create monthly partitions CREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); -- Index each partition CREATE INDEX ON events_2024_01(user_id, created_at); CREATE INDEX ON events_2024_02(user_id, created_at); -- Queries against specific date ranges hit only relevant partitions SELECT * FROM events WHERE created_at BETWEEN '2024-01-15' AND '2024-01-31' AND user_id = 42;

Monitoring with pg_stat_statements

-- Enable in postgresql.conf: -- shared_preload_libraries = 'pg_stat_statements' -- Find the 10 slowest queries by total execution time SELECT query, calls, ROUND(total_exec_time::numeric, 2) AS total_ms, ROUND(mean_exec_time::numeric, 2) AS mean_ms, ROUND(stddev_exec_time::numeric, 2) AS stddev_ms, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; -- Find queries with high I/O SELECT query, shared_blks_hit, shared_blks_read, ROUND(100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0), 2) AS cache_hit_ratio FROM pg_stat_statements WHERE calls > 100 ORDER BY shared_blks_read DESC LIMIT 10;

Conclusion

PostgreSQL performance optimization is a layered problem: fix queries first (EXPLAIN ANALYZE, proper indexes), then tune configuration (shared_buffers, work_mem), then scale out (read replicas, connection pooling). The 80/20 rule applies strongly — most performance wins come from finding and fixing slow queries with missing indexes, not from hardware upgrades.

#PostgreSQL#Database#Performance#SQL#Backend