SQL Query and performance tuning - Indexing in depth

 


🚀 Indexing in Relational Databases: A Practical Overview with PostgreSQL

One of the most powerful tools for improving query performance in relational databases is indexing. Whether you're dealing with transactional workloads or building analytical pipelines, the right indexing strategy can make a massive difference in speed, scalability, and efficiency.

In this post, we’ll explore:

  • What indexes are and why they matter

  • Types of indexes: B-tree, Bitmap, Hash, and Bloom Filter

  • Practical guidance for PostgreSQL users


🧠 What is an Index?

At its core, an index is a separate data structure that stores a subset of table data in an ordered, searchable format, much like a book index helps you find content faster without scanning every page.

Indexes:

  • Speed up data retrieval

  • Help enforce constraints (like uniqueness)

  • Reduce full table scans

  • Improve cache efficiency (indexes are smaller and more likely to stay in memory)

However, indexes do come at a cost:

  • Extra storage required

  • Overhead during inserts/updates/deletes


🏢 Use Case: A Simple Data Model

Let’s imagine we have below table:

  • staff: core employee data

When querying staff by email, without an index, PostgreSQL would scan every row, slow and inefficient. Add an index on the email column, and the system can directly look up the matching row.


🌳 B-tree Indexes: The Default

B-tree (Balanced Tree) indexes are the most common type in PostgreSQL and most relational databases.

🔧 How it Works:

  • Organizes data in a tree structure with a root and nodes

  • Every value comparison allows the query to halve the search space (like binary search)

  • Lookup time: O(log n)

🧠 When to Use:

  • Columns with high cardinality (many distinct values)

  • Range queries (e.g. >, <, BETWEEN)

  • Default index type for primary keys

✅ Pros:

  • Fast and efficient for a wide range of queries

  • Self-balancing


Before indexing


After Indexing




🧮 Bitmap Indexes: Optimized for Low Cardinality

Bitmap indexes are best when dealing with low cardinality columns (few unique values), such as:

  • gender (M, F)

  • active_status (yes, no)

  • pay_type (hourly, salary, contract)

🔧 How it Works:

  • Each distinct value is mapped to a bit vector

  • Boolean operations (AND, OR, NOT) are extremely fast on bitmaps

🧠 When to Use:

  • Filtering on low cardinality columns

  • Analytical or read-heavy environments (like data warehouses)

Note: PostgreSQL doesn't support static bitmap indexes, but dynamically generates bitmap index scans during query planning when appropriate.


In my example table , it has the column job title which has low cardinality. 
for example, if you filter staff members of a given job title, you will be able to find more than 10 records.
but suppose you choose a column like FirstName, then the cardinality can be high (could be almost unique)

I created the index for job_title. 

CREATE index idx_staff_job_title
ON staff(job_title)

 Nothing fancy. same command used. but let's see whether it is actually created a bitMap index or the default b-tree index.



No special way of bitMap index creation. Since the column is low cardinal one, PostgreSQL simply did it for you..


🔐 Hash Indexes: Simple and Fast for Equality Lookups

Hash indexes use a hash function to map values to fixed-size buckets.

🔧 How it Works:

  • Useful only for equality comparisons (e.g., =), not for ranges

  • Improved in recent PostgreSQL versions for better reliability and performance

🧠 When to Use:

  • Keys with exact match lookups

  • Columns with uniformly distributed values

✅ Pros:

  • Smaller size in memory

  • Very fast equality checks

    Think of a column which we can be used to demonstrate Hash index scenario. 
    A column that a range lookup is highly unlikely. 
    I choose email. Email equality checks are very common. so let's see how you can create a hash index.

    CREATE index idx_staff_email
    ON staff USING HASH(email)
and let see what it shows you when you use explain command.



You can try creating a b-tree index for email and see what perform the best. In my sample table I don't have enough data to show you the different. but with loads of data it will comes in handy.



🌸 Bloom Filter Indexes: Compact and Probabilistic

When you're querying by many combinations of columns and don’t want to create a separate index for each, Bloom filter indexes are a game-changer.

🔧 How it Works:

  • Uses a bit-based, lossy structure

  • Supports fast filtering with some false positives

  • Returns a superset of matching rows that still need filtering afterward

🧠 When to Use:

  • High-dimensional, sparse queries

  • Use cases with many optional filtering criteria

  • Example: Customer analytics by region, loyalty status, sales volume, and frequency

-- Enable Bloom filter extension
CREATE EXTENSION IF NOT EXISTS bloom;

-- Create a Bloom index
CREATE INDEX idx_customer_features_bloom
ON customer_features
USING bloom (c1, c2, c3, c4, c5, c6, c7, c8)
WITH (length = 160);

Bloom filters are extremely space-efficient, but remember they can return false positives. They're ideal when storage overhead matters more than perfect precision during filtering.


🔄 Recap: Choosing the Right Index

Index TypeBest ForRange QueriesSpace EfficientFalse Positives
B-treeGeneral-purpose, high cardinalityModerate
BitmapLow cardinality, analytics
HashEquality lookups
Bloom FilterMulti-column filtering, many combos✅✅✅

🛠 Final Thoughts

Indexes are vital for performant querying, but they’re not one-size-fits-all. Understanding the data and access patterns in your application is key to choosing the right strategy. Whether you're relying on the tried-and-true B-tree, experimenting with bloom filters, or optimizing for equality with hash indexes, PostgreSQL gives you a powerful set of tools to tune performance effectively.


Comments

Popular posts from this blog

Apache Kafka - The basics

Spring: How to deal with circular dependencies