SQL Query and performance tuning - Joins in depth

 


🔗 Joins vs. Subqueries: What to Use and How It Works Behind the Scenes

When you're working with SQL, you'll often face this question:

Should I use a JOIN or a SUBQUERY?

Both get the job done. But behind the scenes, your choice can affect performance, sometimes dramatically. Let’s break down the difference, peek under the hood at how joins actually work, and help you choose the right one.


🧩 What’s the Difference?

Let’s say you have two tables:

customers (id, name, region_id) regions (id, region_name)

🔁 Option 1: Using a JOIN

SELECT customers.name, regions.region_name FROM customers JOIN regions ON customers.region_id = regions.id;

🧠 Option 2: Using a SUBQUERY

SELECT name, (SELECT region_name FROM regions WHERE regions.id = customers.region_id) AS region_name FROM customers;

✅ Both queries return the same result. But what happens under the hood?


🛠️ How Joins Actually Work

Let’s break down three core join algorithms used by SQL engines:


1. 🌀 Nested Loop Join

How it works:

  • Outer loop → scans one table (driver table)

  • Inner loop → scans the other table for each row in the outer loop

Visual Example:

for customer in customers: for status in statuses: if customer.status = status.status: return match

Best for:

  • Small datasets

  • Flexible: works with =, <, >, BETWEEN, etc.

  • Indexes can speed this up

Drawback:
Slow for large tables if indexes are missing or data doesn’t fit in memory.



2. 🧮 Hash Join

How it works:

  • Build a hash table from the smaller input

  • Scan the larger table and look up matches in the hash

Visual Example:

Build hash(status_id) from status table for customer in customers: lookup customer.status in hash table

Best for:

  • Large joins with equality conditions

  • Fast when both tables are big

Drawback:

  • Only supports equality (=) joins

  • More memory-intensive than nested loops




3. 🔀 Merge Join (Sort-Merge Join)

How it works:

  • Sort both tables on the join key

  • Walk through both in order, like merging two sorted lists

Visual Example:

Sort customers and statuses by status_id Walk through both tables: if match, return row if not, advance lower value

Best for:

  • Pre-sorted or indexed tables

  • Very efficient on large datasets

Drawback:

  • Only supports equality (=) joins

  • Requires sort step if data isn’t already ordered



🔎 So… Should You Use a Join or Subquery?

✅ Use Joins When:

  • You're combining multiple tables in a straightforward way

  • You need better performance (joins are often optimized better)

  • You want to apply filters on multiple tables together

✅ Use Subqueries When:

  • You need to fetch a single value (scalar subquery)

  • You're trying to keep query logic simpler and more readable

  • You're returning aggregates or values that only relate to one row


🎯 Final Takeaway

Use the method that makes your intentions clear.
Then, check the execution plan to optimize.

Modern database engines are smart. Joins and subqueries can often be optimized to run similarly. But knowing how they work helps you write faster and clearer SQL.


📌 Pro Tip: Look for “Covered Indexes”

If your index includes all the columns used in your query (called a covered index), your join might never touch the full table —> big performance boost!


💬 Over to You

Which join type do you use most often? Have you ever switched from a subquery to a join for performance? Share your thoughts in the comments below!

Comments

Popular posts from this blog

SQL Query and performance tuning - Indexing in depth

Apache Kafka - The basics

Spring: How to deal with circular dependencies