🚀 Indexing in Relational Databases: A Practical Overview with PostgreSQL

One of the most powerful tools for improving query performance in relational databases is indexing. Whether you're dealing with transactional workloads or building analytical pipelines, the right indexing strategy can make a massive difference in speed, scalability, and efficiency.

In this post, we’ll explore:

What indexes are and why they matter
Types of indexes: B-tree, Bitmap, Hash, and Bloom Filter
Practical guidance for PostgreSQL users

🧠 What is an Index?

At its core, an index is a separate data structure that stores a subset of table data in an ordered, searchable format, much like a book index helps you find content faster without scanning every page.

Indexes:

Speed up data retrieval
Help enforce constraints (like uniqueness)
Reduce full table scans
Improve cache efficiency (indexes are smaller and more likely to stay in memory)

However, indexes do come at a cost:

Extra storage required
Overhead during inserts/updates/deletes

🏢 Use Case: A Simple Data Model

Let’s imagine we have below table:

staff: core employee data

When querying staff by email, without an index, PostgreSQL would scan every row, slow and inefficient. Add an index on the email column, and the system can directly look up the matching row.

🌳 B-tree Indexes: The Default

B-tree (Balanced Tree) indexes are the most common type in PostgreSQL and most relational databases.

🔧 How it Works:

Organizes data in a tree structure with a root and nodes
Every value comparison allows the query to halve the search space (like binary search)
Lookup time: O(log n)

🧠 When to Use:

Columns with high cardinality (many distinct values)
Range queries (e.g. >, <, BETWEEN)
Default index type for primary keys

✅ Pros:

Fast and efficient for a wide range of queries
Self-balancing

Before indexing

After Indexing

🧮 Bitmap Indexes: Optimized for Low Cardinality

Bitmap indexes are best when dealing with low cardinality columns (few unique values), such as:

gender (M, F)
active_status (yes, no)
pay_type (hourly, salary, contract)

🔧 How it Works:

Each distinct value is mapped to a bit vector
Boolean operations (AND, OR, NOT) are extremely fast on bitmaps

🧠 When to Use:

Filtering on low cardinality columns
Analytical or read-heavy environments (like data warehouses)

Note: PostgreSQL doesn't support static bitmap indexes, but dynamically generates bitmap index scans during query planning when appropriate.

In my example table , it has the column job title which has low cardinality.
for example, if you filter staff members of a given job title, you will be able to find more than 10 records.
but suppose you choose a column like FirstName, then the cardinality can be high (could be almost unique)

I created the index for job_title.

CREATE index idx_staff_job_title
ON staff(job_title)

Nothing fancy. same command used. but let's see whether it is actually created a bitMap index or the default b-tree index.

No special way of bitMap index creation. Since the column is low cardinal one, PostgreSQL simply did it for you..

🔐 Hash Indexes: Simple and Fast for Equality Lookups

Hash indexes use a hash function to map values to fixed-size buckets.

🔧 How it Works:

Useful only for equality comparisons (e.g., =), not for ranges
Improved in recent PostgreSQL versions for better reliability and performance

🧠 When to Use:

Keys with exact match lookups
Columns with uniformly distributed values

✅ Pros:

Smaller size in memory
Very fast equality checks

Think of a column which we can be used to demonstrate Hash index scenario.
A column that a range lookup is highly unlikely.
I choose email. Email equality checks are very common. so let's see how you can create a hash index.

CREATE index idx_staff_email
ON staff USING HASH(email)

and let see what it shows you when you use explain command.

You can try creating a b-tree index for email and see what perform the best. In my sample table I don't have enough data to show you the different. but with loads of data it will comes in handy.

🌸 Bloom Filter Indexes: Compact and Probabilistic

When you're querying by many combinations of columns and don’t want to create a separate index for each, Bloom filter indexes are a game-changer.

🔧 How it Works:

Uses a bit-based, lossy structure
Supports fast filtering with some false positives
Returns a superset of matching rows that still need filtering afterward

🧠 When to Use:

High-dimensional, sparse queries
Use cases with many optional filtering criteria
Example: Customer analytics by region, loyalty status, sales volume, and frequency

-- Enable Bloom filter extension
CREATE EXTENSION IF NOT EXISTS bloom;

-- Create a Bloom indexCREATE INDEX idx_customer_features_bloomON customer_featuresUSING bloom (c1, c2, c3, c4, c5, c6, c7, c8)WITH (length = 160);

Bloom filters are extremely space-efficient, but remember they can return false positives. They're ideal when storage overhead matters more than perfect precision during filtering.

🔄 Recap: Choosing the Right Index

Index Type	Best For	Range Queries	Space Efficient	False Positives
B-tree	General-purpose, high cardinality	✅	Moderate	❌
Bitmap	Low cardinality, analytics	❌	✅	❌
Hash	Equality lookups	❌	✅	❌
Bloom Filter	Multi-column filtering, many combos	❌	✅✅✅	✅

🛠 Final Thoughts

Indexes are vital for performant querying, but they’re not one-size-fits-all. Understanding the data and access patterns in your application is key to choosing the right strategy. Whether you're relying on the tried-and-true B-tree, experimenting with bloom filters, or optimizing for equality with hash indexes, PostgreSQL gives you a powerful set of tools to tune performance effectively.

Search This Blog

Simply Everything I want to share

SQL Query and performance tuning - Indexing in depth

🚀 Indexing in Relational Databases: A Practical Overview with PostgreSQL

🧠 What is an Index?

🏢 Use Case: A Simple Data Model

🌳 B-tree Indexes: The Default

🔧 How it Works:

🧠 When to Use:

✅ Pros:

🧮 Bitmap Indexes: Optimized for Low Cardinality

🔧 How it Works:

🧠 When to Use:

🔐 Hash Indexes: Simple and Fast for Equality Lookups

🔧 How it Works:

🧠 When to Use:

✅ Pros:

🌸 Bloom Filter Indexes: Compact and Probabilistic

🔧 How it Works:

🧠 When to Use:

🔄 Recap: Choosing the Right Index

🛠 Final Thoughts

Comments

Post a Comment

Popular posts from this blog

Kafka: How to handle duplicate messages

Spring: How to deal with circular dependencies