SQL Query and performance tuning - DB stats and identifying slow queries

- August 04, 2025

📊 Understanding Table Statistics in PostgreSQL (With Simple Examples)

Efficient queries depend on knowing your data. PostgreSQL uses table statistics to build better query plans, and you can use them too, to diagnose slow queries, analyze distributions, and optimize performance.

Let’s explore how to gather and use these statistics using tools like:

ANALYZE
pg_stat_statements
auto_explain
pg_stats

📦 Why Do Table Statistics Matter?

PostgreSQL keeps metadata about:

🔢 Number of rows in each table
💾 Storage usage
🔄 Value frequency (e.g., most common values)
📉 Data distribution (e.g., histograms, skew)
⚖️ Cardinality (how many distinct values a column has)

This info helps the query planner decide how to execute your queries efficiently.

⚙️ 1. Collecting Stats with `ANALYZE`

Run ANALYZE to gather statistics on tables:

ANALYZE your_table;

You can also combine it with VACUUM to clean and analyze at the same time:

VACUUM ANALYZE your_table;

📌 Note: The raw output of ANALYZE isn’t human-readable — it’s used internally by the planner. But you can view the results in pg_stats.

🧼 What’s the Role of `VACUUM`?

The VACUUM command cleans up dead rows left behind after updates/deletes.

✅ Regular VACUUM

Reclaims space
Does not lock the table
Can be run automatically via the autovacuum daemon

VACUUM your_table;

💣 VACUUM FULL

Locks the table during operation
Rewrites the table entirely to reclaim more space
Requires extra disk space (old + new copy during operation)

VACUUM FULL your_table;

🧪 Combine with ANALYZE

Want to clean up and update stats together?

VACUUM FULL ANALYZE your_table;

🔧 Rebuilding Indexes: `REINDEX`

Normally, you don’t need to manually rebuild indexes. But if you suspect corruption or you’re fixing bloat, you can use:

REINDEX INDEX index_name;
REINDEX TABLE table_name;
REINDEX SCHEMA schema_name;

This rebuilds B-tree indexes, freeing unused pages and correcting any inconsistencies.

✅ Best Practices Summary

Task	Command	When to Use
Gather stats	`ANALYZE`	After bulk inserts or deletes
Clean dead rows	`VACUUM`	Regular maintenance (or autovacuum)
Reclaim more space	`VACUUM FULL`	Only when lots of space can be reclaimed
Clean + analyze together	`VACUUM FULL ANALYZE`	After large deletes + need updated stats
Rebuild indexes	`REINDEX`	If indexes are corrupted or bloated

🚀 About Analyze and Vaccum

Statistics are the silent engine behind PostgreSQL performance. Without them, the planner guesses — and guesses can be wrong.

By regularly using ANALYZE, VACUUM, and REINDEX, you’re giving PostgreSQL the visibility it needs to run queries faster and smarter.

📊 2. View Stats with `pg_stats`

pg_stats is a readable view into PostgreSQL’s internal statistics.

SELECT * FROM pg_stats WHERE tablename = 'users';

🔍 Key Columns:

Column	Description
`attname`	Column name
`null_frac`	Fraction of NULLs
`n_distinct`	Number of distinct values
`most_common_vals`	Most frequent values
`histogram_bounds`	Distribution spread (like a bell curve)

🧪 Example:

SELECT attname, null_frac, n_distinct, most_common_vals
FROM pg_stats
WHERE tablename = 'users' AND attname = 'country';

This helps you answer questions like:

Are there too many NULLs?
Are most users from one country?
Should I create an index on this column?

📈 3. Track SQL Performance with `pg_stat_statements`

Want to know which queries are the slowest?

Enable pg_stat_statements in postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

Then restart PostgreSQL and run:

CREATE EXTENSION pg_stat_statements;

Query the view:

SELECT query, calls, total_time, rows
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 5;

🧠 You’ll See:

The exact SQL statement
Number of times it ran (calls)
Total time spent
Rows returned

📌 Use this to identify frequent or slow queries to tune.

🐢 4. Find Slow Queries with `auto_explain`

You can automatically log query plans for slow queries — without manually running EXPLAIN.

Enable auto_explain:

shared_preload_libraries = 'auto_explain'

Set minimum execution time (e.g., 200ms):

auto_explain.log_min_duration = '200ms'

📄 This logs the plan for any query slower than 200ms in your PostgreSQL log.

Use this in production to catch:

Missing indexes
Table scans
Bad joins

📉 Bonus: Understand Data Skew

Skew = when values aren’t evenly spread. Here are 3 common types:

Skew Type	What It Means
Uniform	Values spread evenly
Negative skew	More values on the higher end
Positive skew	More values on the lower end

💡 Why it matters: The planner might guess wrongly if it assumes uniform data but your real data is skewed — leading to bad query plans.

You can spot skew via histograms in pg_stats.

✅ Summary: Tools for PostgreSQL Statistics

Tool	Use Case
`ANALYZE`	Collect or refresh table statistics
`pg_stats`	View per-column data distribution
`pg_stat_statements`	Track query performance
`auto_explain`	Log slow query plans automatically

🚀 Next Steps

Want to improve query speed and understand your data better?

Start by running ANALYZE
Explore pg_stats to learn about your data
Enable pg_stat_statements to find slow queries
Use auto_explain in production to diagnose bad plans

Search This Blog

Simply Everything I want to share

SQL Query and performance tuning - DB stats and identifying slow queries

📊 Understanding Table Statistics in PostgreSQL (With Simple Examples)

📦 Why Do Table Statistics Matter?

⚙️ 1. Collecting Stats with `ANALYZE`

🧼 What’s the Role of `VACUUM`?

✅ Regular VACUUM

💣 VACUUM FULL

🧪 Combine with ANALYZE

🔧 Rebuilding Indexes: `REINDEX`

✅ Best Practices Summary

🚀 About Analyze and Vaccum

📊 2. View Stats with `pg_stats`

🔍 Key Columns:

📈 3. Track SQL Performance with `pg_stat_statements`

🧠 You’ll See:

🐢 4. Find Slow Queries with `auto_explain`

📉 Bonus: Understand Data Skew

✅ Summary: Tools for PostgreSQL Statistics

🚀 Next Steps

Comments

Post a Comment

Popular posts from this blog

Kafka: How to handle duplicate messages

SQL Query and performance tuning - Indexing in depth

Apache Kafka - The basics

SQL Query and performance tuning - DB stats and identifying slow queries

📊 Understanding Table Statistics in PostgreSQL (With Simple Examples)

📦 Why Do Table Statistics Matter?

⚙️ 1. Collecting Stats with ANALYZE

🧼 What’s the Role of VACUUM?

✅ Regular VACUUM

💣 VACUUM FULL

🧪 Combine with ANALYZE

🔧 Rebuilding Indexes: REINDEX

✅ Best Practices Summary

🚀 About Analyze and Vaccum

📊 2. View Stats with pg_stats

🔍 Key Columns:

📈 3. Track SQL Performance with pg_stat_statements

🧠 You’ll See:

🐢 4. Find Slow Queries with auto_explain

📉 Bonus: Understand Data Skew

✅ Summary: Tools for PostgreSQL Statistics

🚀 Next Steps

Comments

Post a Comment

Popular posts from this blog

Kafka: How to handle duplicate messages

SQL Query and performance tuning - Indexing in depth

Apache Kafka - The basics

⚙️ 1. Collecting Stats with `ANALYZE`

🧼 What’s the Role of `VACUUM`?

🔧 Rebuilding Indexes: `REINDEX`

📊 2. View Stats with `pg_stats`

📈 3. Track SQL Performance with `pg_stat_statements`

🐢 4. Find Slow Queries with `auto_explain`