SQL Query and performance tuning - DB stats and identifying slow queries
๐ Understanding Table Statistics in PostgreSQL (With Simple Examples)
Efficient queries depend on knowing your data. PostgreSQL uses table statistics to build better query plans, and you can use them too, to diagnose slow queries, analyze distributions, and optimize performance.
Let’s explore how to gather and use these statistics using tools like:
-
ANALYZE
-
pg_stat_statements
-
auto_explain
-
pg_stats
๐ฆ Why Do Table Statistics Matter?
PostgreSQL keeps metadata about:
-
๐ข Number of rows in each table
-
๐พ Storage usage
-
๐ Value frequency (e.g., most common values)
-
๐ Data distribution (e.g., histograms, skew)
-
⚖️ Cardinality (how many distinct values a column has)
This info helps the query planner decide how to execute your queries efficiently.
⚙️ 1. Collecting Stats with ANALYZE
Run ANALYZE
to gather statistics on tables:
You can also combine it with VACUUM
to clean and analyze at the same time:
๐ Note: The raw output of
ANALYZE
isn’t human-readable — it’s used internally by the planner. But you can view the results inpg_stats
.
๐งผ What’s the Role of VACUUM
?
The VACUUM
command cleans up dead rows left behind after updates/deletes.
✅ Regular VACUUM
-
Reclaims space
-
Does not lock the table
-
Can be run automatically via the autovacuum daemon
๐ฃ VACUUM FULL
-
Locks the table during operation
-
Rewrites the table entirely to reclaim more space
-
Requires extra disk space (old + new copy during operation)
๐งช Combine with ANALYZE
Want to clean up and update stats together?
๐ง Rebuilding Indexes: REINDEX
Normally, you don’t need to manually rebuild indexes. But if you suspect corruption or you’re fixing bloat, you can use:
This rebuilds B-tree indexes, freeing unused pages and correcting any inconsistencies.
✅ Best Practices Summary
Task | Command | When to Use |
---|---|---|
Gather stats | ANALYZE | After bulk inserts or deletes |
Clean dead rows | VACUUM | Regular maintenance (or autovacuum) |
Reclaim more space | VACUUM FULL | Only when lots of space can be reclaimed |
Clean + analyze together | VACUUM FULL ANALYZE | After large deletes + need updated stats |
Rebuild indexes | REINDEX | If indexes are corrupted or bloated |
๐ About Analyze and Vaccum
Statistics are the silent engine behind PostgreSQL performance. Without them, the planner guesses — and guesses can be wrong.
By regularly using ANALYZE
, VACUUM
, and REINDEX
, you’re giving PostgreSQL the visibility it needs to run queries faster and smarter.
๐ 2. View Stats with pg_stats
pg_stats
is a readable view into PostgreSQL’s internal statistics.
๐ Key Columns:
Column | Description |
---|---|
attname | Column name |
null_frac | Fraction of NULLs |
n_distinct | Number of distinct values |
most_common_vals | Most frequent values |
histogram_bounds | Distribution spread (like a bell curve) |
This helps you answer questions like:
-
Are there too many NULLs?
-
Are most users from one country?
-
Should I create an index on this column?
๐ 3. Track SQL Performance with pg_stat_statements
Want to know which queries are the slowest?
Enable pg_stat_statements
in postgresql.conf
:
Then restart PostgreSQL and run:
Query the view:
๐ง You’ll See:
-
The exact SQL statement
-
Number of times it ran (
calls
) -
Total time spent
-
Rows returned
๐ Use this to identify frequent or slow queries to tune.
๐ข 4. Find Slow Queries with auto_explain
You can automatically log query plans for slow queries — without manually running EXPLAIN
.
Enable auto_explain
:
Set minimum execution time (e.g., 200ms):
๐ This logs the plan for any query slower than 200ms in your PostgreSQL log.
Use this in production to catch:
-
Missing indexes
-
Table scans
-
Bad joins
๐ Bonus: Understand Data Skew
Skew = when values aren’t evenly spread. Here are 3 common types:
Skew Type | What It Means |
---|---|
Uniform | Values spread evenly |
Negative skew | More values on the higher end |
Positive skew | More values on the lower end |
You can spot skew via histograms in pg_stats
.
✅ Summary: Tools for PostgreSQL Statistics
Tool | Use Case |
---|---|
ANALYZE | Collect or refresh table statistics |
pg_stats | View per-column data distribution |
pg_stat_statements | Track query performance |
auto_explain | Log slow query plans automatically |
๐ Next Steps
Want to improve query speed and understand your data better?
-
Start by running
ANALYZE
-
Explore
pg_stats
to learn about your data -
Enable
pg_stat_statements
to find slow queries -
Use
auto_explain
in production to diagnose bad plans
Comments
Post a Comment