SQL Query and performance tuning - Indexes


What is Document Indexing & How to automate it?

From Declarative SQL to Procedural Execution Plans

Understanding the Journey Behind Your Queries


๐Ÿง  Why SQL Is So Widely Used

SQL (Structured Query Language) has come a long way from its roots as a tool for querying relational databases. Today, it powers data analytics in platforms like Apache Spark, and even in streaming tools like Apache Kafka.

So, what makes SQL so universal?

It’s Ideal for Tabular Data

Most data in the real world naturally fits into tables:

  • ๐Ÿ› Retailers track customer purchases in structured tables.

  • ๐Ÿฅ Healthcare systems manage patient records in relational databases.

  • ๐Ÿ’ณ Financial institutions store transaction histories in tables.

It's tough to find an industry that doesn't benefit from tabular data, and by extension, SQL.


๐Ÿ“ฃ SQL Is Declarative — But What Does That Mean?

SQL is a declarative language, meaning you state what you want, not how to get it.

Example:

SELECT name, email FROM customers WHERE city = 'Tallinn';

You don’t specify how the database should fetch this data, you just declare the result you want.

This makes SQL different from procedural programming languages like:

  • Java

  • Python

  • C++

In those, you have to define the exact steps to perform operations, such as iterating through arrays or handling logic flow.


๐Ÿงฐ Under the Hood: From SQL to Execution Plan

Although SQL is declarative, the database still needs to figure out how to execute your request. That’s where execution plans come in.

An execution plan is the procedural version of your SQL query. It’s a sequence of low-level operations like:

  • ๐Ÿ” Scans

  • ⚙️ Filters

  • ๐Ÿ”— Joins

Think of it like this:

You say:

“I want a list of customers in Austin.”

The database replies:

“Got it. I’ll scan the customers table, apply a filter for city = 'Austin', then return the relevant columns.”


๐Ÿš€ Query Optimization: Why Efficiency Matters

Just like you'd write efficient code in a procedural language, execution plans need optimization. Different plans can yield the same result, but some are much faster.

๐Ÿงช Example: Filtering Customer Purchases

Let’s say you want to retrieve customer purchases where the ID starts with 3014:

Inefficient Plan:
Scan all 10 million rows to find the matching records.

Optimized Plan:
Use an index to jump straight to the relevant rows, much faster!


๐Ÿ” Beyond Tables: Advanced SQL Features

Although SQL’s foundation is tabular, it also supports more advanced features like:

  • ROLLUPs

  • CUBEs

  • Window functions

These let you create rich summaries and multidimensional reports. While not strictly tabular in appearance, they still operate on table-like structures.


๐Ÿ” Scanning Tables and Indexes

How Databases Read Your Data — Efficiently or Not


๐Ÿ“ฆ What Is a Table Scan?

One of the most fundamental operations in a database execution plan is a scan. It's exactly what it sounds like: going row by row through a table to find relevant data.

๐Ÿง  Simple Definition:

Table Scan = Read each row → Apply filter → Move to next row.

This is known as a linear operation, and it happens whether the data is being read from persistent storage (like disk) or in-memory cache.


๐Ÿงช Example: Table Scan in Action

Let’s say you want to retrieve all orders placed in 2025:

SELECT * FROM orders WHERE order_year = 2025;

If there’s no index on order_year, the database will scan every single row in the orders table, checking the order_year value one at a time.

That’s a full table scan—and it can be expensive for large tables.


๐Ÿ“˜ Full Table Scans: When Are They Okay?

✅ Good for:

  • Small tables

  • Queries that touch most rows (e.g., aggregate reports)

❌ Inefficient when:

  • You only need a few rows

  • The query is run frequently on a large dataset

If you spot a full table scan in your query plan, it’s worth asking:

“Is there a better way to retrieve this data?”


๐ŸŒฟ Enter Indexes: The Fast Lane of Data Access

Indexes are special data structures that make it faster to look up rows without scanning the entire table.

⚙️ How Indexes Work:

  • An index entry stores:

    • A key value (e.g., customer ID, order date)

    • A pointer to the row location in the table

  • Indexes are typically stored in trees, which allow for fast searching.


๐Ÿงฑ Types of Indexes (and When to Use Them)

Index TypeBest ForDescription
B-TreeEquality & Range QueriesMost common type. Used when searching values like age > 30 or name = 'John'.

Hash Index

Equality Queries Only

Fast lookup for exact matches, like email = 'abc@xyz.com'.
Bitmap IndexColumns with Few Unique Values
Great for gender, status, or flags (is_active = true). Used for set operations.

SpecializedCustom Data (e.g., Geospatial)PostgreSQL and others support GIN, GiST, etc., for non-standard data types.

๐ŸŒณ Index Structures: Why Trees?

Trees (especially balanced trees like B-Trees) are efficient for:

  • Keeping data sorted

  • Allowing quick traversal

  • Minimizing disk reads

Think of a B-Tree like an efficient phone book—you don’t flip through every page to find a name. You jump to the right section and narrow down quickly.


๐Ÿง  Columnar Storage: A Special Case

In analytical databases (like Amazon Redshift or Google BigQuery), data is stored in columns rather than rows. This makes it:

  • Highly efficient for queries filtering on just a few columns

  • Better suited for OLAP workloads (e.g., dashboards, BI)

⚠️ However, the content discussed here assumes row-oriented storage, which is the norm in transactional systems like MySQL, PostgreSQL, and Oracle.


๐Ÿ“Œ Best Practices

Use Indexes When:

  • You're filtering on a column that’s used often in WHERE clauses

  • The table is large and scanned frequently

  • The column has sufficient cardinality (many unique values)

Avoid Index Overuse:

  • Too many indexes slow down writes (INSERT/UPDATE/DELETE)

  • Use query execution plans to confirm index usage


๐Ÿงญ Summary: Scan or Index?



OperationSpeedBest For
Full Table Scan  Slow on large tables      Small tables or broad queries
Index Scan               Fast (when applicable)Narrow filters, large datasets



Always check your execution plan. If you're scanning more than necessary, an index might save the day.

Comments

Popular posts from this blog

SQL Query and performance tuning - Indexing in depth

Apache Kafka - The basics

Spring: How to deal with circular dependencies