SQL Query and performance tuning - Indexes

From Declarative SQL to Procedural Execution Plans
Understanding the Journey Behind Your Queries
๐ง Why SQL Is So Widely Used
SQL (Structured Query Language) has come a long way from its roots as a tool for querying relational databases. Today, it powers data analytics in platforms like Apache Spark, and even in streaming tools like Apache Kafka.
So, what makes SQL so universal?
✅ It’s Ideal for Tabular Data
Most data in the real world naturally fits into tables:
-
๐ Retailers track customer purchases in structured tables.
-
๐ฅ Healthcare systems manage patient records in relational databases.
-
๐ณ Financial institutions store transaction histories in tables.
It's tough to find an industry that doesn't benefit from tabular data, and by extension, SQL.
๐ฃ SQL Is Declarative — But What Does That Mean?
SQL is a declarative language, meaning you state what you want, not how to get it.
Example:
You don’t specify how the database should fetch this data, you just declare the result you want.
This makes SQL different from procedural programming languages like:
-
Java
-
Python
-
C++
In those, you have to define the exact steps to perform operations, such as iterating through arrays or handling logic flow.
๐งฐ Under the Hood: From SQL to Execution Plan
Although SQL is declarative, the database still needs to figure out how to execute your request. That’s where execution plans come in.
An execution plan is the procedural version of your SQL query. It’s a sequence of low-level operations like:
-
๐ Scans
-
⚙️ Filters
-
๐ Joins
Think of it like this:
You say:
“I want a list of customers in Austin.”
The database replies:
“Got it. I’ll scan the
customers
table, apply a filter forcity = 'Austin'
, then return the relevant columns.”
๐ Query Optimization: Why Efficiency Matters
Just like you'd write efficient code in a procedural language, execution plans need optimization. Different plans can yield the same result, but some are much faster.
๐งช Example: Filtering Customer Purchases
Let’s say you want to retrieve customer purchases where the ID starts with 3014
:
Inefficient Plan:
Scan all 10 million rows to find the matching records.
Optimized Plan:
Use an index to jump straight to the relevant rows, much faster!
๐ Beyond Tables: Advanced SQL Features
Although SQL’s foundation is tabular, it also supports more advanced features like:
-
ROLLUPs
-
CUBEs
-
Window functions
These let you create rich summaries and multidimensional reports. While not strictly tabular in appearance, they still operate on table-like structures.
๐ Scanning Tables and Indexes
How Databases Read Your Data — Efficiently or Not
๐ฆ What Is a Table Scan?
One of the most fundamental operations in a database execution plan is a scan. It's exactly what it sounds like: going row by row through a table to find relevant data.
๐ง Simple Definition:
Table Scan = Read each row → Apply filter → Move to next row.
This is known as a linear operation, and it happens whether the data is being read from persistent storage (like disk) or in-memory cache.
๐งช Example: Table Scan in Action
Let’s say you want to retrieve all orders placed in 2025:
If there’s no index on order_year
, the database will scan every single row in the orders
table, checking the order_year
value one at a time.
That’s a full table scan—and it can be expensive for large tables.
๐ Full Table Scans: When Are They Okay?
✅ Good for:
-
Small tables
-
Queries that touch most rows (e.g., aggregate reports)
❌ Inefficient when:
-
You only need a few rows
-
The query is run frequently on a large dataset
If you spot a full table scan in your query plan, it’s worth asking:
“Is there a better way to retrieve this data?”
๐ฟ Enter Indexes: The Fast Lane of Data Access
Indexes are special data structures that make it faster to look up rows without scanning the entire table.
⚙️ How Indexes Work:
-
An index entry stores:
-
A key value (e.g., customer ID, order date)
-
A pointer to the row location in the table
-
-
Indexes are typically stored in trees, which allow for fast searching.
๐งฑ Types of Indexes (and When to Use Them)
Index Type | Best For | Description |
---|---|---|
B-Tree | Equality & Range Queries | Most common type. Used when searching values like age > 30 or name = 'John' . |
Hash Index | Equality Queries Only | Fast lookup for exact matches, like email = 'abc@xyz.com' . |
Bitmap Index | Columns with Few Unique Values | Great for gender, status, or flags ( is_active = true ). Used for set operations. |
Specialized | Custom Data (e.g., Geospatial) | PostgreSQL and others support GIN, GiST, etc., for non-standard data types. |
๐ณ Index Structures: Why Trees?
Trees (especially balanced trees like B-Trees) are efficient for:
-
Keeping data sorted
-
Allowing quick traversal
-
Minimizing disk reads
Think of a B-Tree like an efficient phone book—you don’t flip through every page to find a name. You jump to the right section and narrow down quickly.
๐ง Columnar Storage: A Special Case
In analytical databases (like Amazon Redshift or Google BigQuery), data is stored in columns rather than rows. This makes it:
-
Highly efficient for queries filtering on just a few columns
-
Better suited for OLAP workloads (e.g., dashboards, BI)
⚠️ However, the content discussed here assumes row-oriented storage, which is the norm in transactional systems like MySQL, PostgreSQL, and Oracle.
๐ Best Practices
✅ Use Indexes When:
-
You're filtering on a column that’s used often in WHERE clauses
-
The table is large and scanned frequently
-
The column has sufficient cardinality (many unique values)
❌ Avoid Index Overuse:
-
Too many indexes slow down writes (INSERT/UPDATE/DELETE)
-
Use query execution plans to confirm index usage
๐งญ Summary: Scan or Index?
| |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Always check your execution plan. If you're scanning more than necessary, an index might save the day. |
Comments
Post a Comment