SQL Query and performance tuning - Joins in depth

- July 15, 2025

🔗 Joins vs. Subqueries: What to Use and How It Works Behind the Scenes

When you're working with SQL, you'll often face this question:

Should I use a JOIN or a SUBQUERY?

Both get the job done. But behind the scenes, your choice can affect performance, sometimes dramatically. Let’s break down the difference, peek under the hood at how joins actually work, and help you choose the right one.

🧩 What’s the Difference?

Let’s say you have two tables:

customers (id, name, region_id)
regions (id, region_name)

🔁 Option 1: Using a JOIN

SELECT customers.name, regions.region_name
FROM customers
JOIN regions ON customers.region_id = regions.id;

🧠 Option 2: Using a SUBQUERY

SELECT name,
  (SELECT region_name 
   FROM regions 
   WHERE regions.id = customers.region_id) AS region_name
FROM customers;

✅ Both queries return the same result. But what happens under the hood?

🛠️ How Joins Actually Work

Let’s break down three core join algorithms used by SQL engines:

1. 🌀 Nested Loop Join

How it works:

Outer loop → scans one table (driver table)
Inner loop → scans the other table for each row in the outer loop

Visual Example:

for customer in customers:
    for status in statuses:
        if customer.status = status.status:
            return match

Best for:

Small datasets
Flexible: works with =, <, >, BETWEEN, etc.
Indexes can speed this up

Drawback:
Slow for large tables if indexes are missing or data doesn’t fit in memory.

2. 🧮 Hash Join

How it works:

Build a hash table from the smaller input
Scan the larger table and look up matches in the hash

Visual Example:

Build hash(status_id) from status table

for customer in customers:
    lookup customer.status in hash table

Best for:

Large joins with equality conditions
Fast when both tables are big

Drawback:

Only supports equality (=) joins
More memory-intensive than nested loops

3. 🔀 Merge Join (Sort-Merge Join)

How it works:

Sort both tables on the join key
Walk through both in order, like merging two sorted lists

Visual Example:

Sort customers and statuses by status_id

Walk through both tables:
    if match, return row
    if not, advance lower value

Best for:

Pre-sorted or indexed tables
Very efficient on large datasets

Drawback:

Only supports equality (=) joins
Requires sort step if data isn’t already ordered

🔎 So… Should You Use a Join or Subquery?

✅ Use Joins When:

You're combining multiple tables in a straightforward way
You need better performance (joins are often optimized better)
You want to apply filters on multiple tables together

✅ Use Subqueries When:

You need to fetch a single value (scalar subquery)
You're trying to keep query logic simpler and more readable
You're returning aggregates or values that only relate to one row

🎯 Final Takeaway

Use the method that makes your intentions clear.
Then, check the execution plan to optimize.

Modern database engines are smart. Joins and subqueries can often be optimized to run similarly. But knowing how they work helps you write faster and clearer SQL.

📌 Pro Tip: Look for “Covered Indexes”

If your index includes all the columns used in your query (called a covered index), your join might never touch the full table —> big performance boost!

💬 Over to You

Which join type do you use most often? Have you ever switched from a subquery to a join for performance? Share your thoughts in the comments below!

Search This Blog

Simply Everything I want to share

SQL Query and performance tuning - Joins in depth

🔗 Joins vs. Subqueries: What to Use and How It Works Behind the Scenes

🧩 What’s the Difference?

🔁 Option 1: Using a JOIN

🧠 Option 2: Using a SUBQUERY

🛠️ How Joins Actually Work

1. 🌀 Nested Loop Join

2. 🧮 Hash Join

3. 🔀 Merge Join (Sort-Merge Join)

🔎 So… Should You Use a Join or Subquery?

✅ Use Joins When:

✅ Use Subqueries When:

🎯 Final Takeaway

📌 Pro Tip: Look for “Covered Indexes”

💬 Over to You

Comments

Post a Comment

Popular posts from this blog

Kafka: How to handle duplicate messages

SQL Query and performance tuning - Indexing in depth

Apache Kafka - The basics