SQL Query and performance tuning - Partitioning in Depth

 


๐Ÿ“Š Partitioning in Databases: Horizontal vs. Vertical (With Simple Examples)

Working with large tables in a database can quickly become inefficient. Queries slow down, indexes balloon in size, and deleting old data becomes a headache.

๐Ÿ‘‰ Enter partitioning.
Partitioning is a technique where we split a big table into smaller pieces to improve performance and manageability.

Let’s break down the two main types of partitioning —> Horizontal and Vertical, and explore Range, List, and Hash partitioning methods.


๐Ÿงฑ 1. Horizontal vs. Vertical Partitioning


๐Ÿ”„ Horizontal Partitioning

What is it?
You're splitting rows into separate tables (aka partitions). Each partition holds a subset of the rows.

Think of it like:
Breaking one giant table into several smaller tables based on some criteria like date, region, or category.

๐Ÿ“Œ Use Case:
When querying time-based logs, you usually care about recent data. Why search the whole table if you only need last week?

✅ Benefits:

  • Smaller indexes per partition = faster queries

  • Can drop old data just by deleting a partition

  • Easier to scan recent data (less I/O)


๐Ÿ“ Vertical Partitioning

What is it?
You're splitting columns into separate tables. Each table keeps the same primary key, but only contains certain columns.

Think of it like:
Separating “frequently used” columns from “rarely used” ones.

๐Ÿ“Œ Use Case:
Product tables with 100+ attributes, but most queries only use 4 or 5 columns.

✅ Benefits:

  • More rows fit per block = faster reads

  • Reduced I/O for queries needing fewer columns

  • Works well with analytics and columnar storage


๐Ÿ” 2. Types of Horizontal Partitioning

Now let’s dig deeper into horizontal partitioning methods. There are three major types:


๐Ÿ“† A. Range Partitioning

How it works:
You split data based on ranges of values — usually dates or numbers.

Example:
A measurements table with a date column:

date   temperature     humidity
2025-07-01   22°C     45%
2025-07-08   24°C     40%

๐Ÿ“‚ Partitioned as:

  • measurements_2025_wk1: July 1–7

  • measurements_2025_wk2: July 8–14

PARTITION BY RANGE (measurement_date) ( PARTITION wk1 VALUES LESS THAN ('2025-07-08'), PARTITION wk2 VALUES LESS THAN ('2025-07-15') );

✅ Great for:

  • Time-series data (logs, sensors)

  • Periodic reports (monthly, weekly)


๐Ÿ“‹ B. List Partitioning

How it works:
You split data based on a defined list of values.

Example:
A products table partitioned by category.

id             name  category
1             Shirt     Clothing
2             Laptop     Electronics

๐Ÿ“‚ Partitioned as:
  • products_clothing

  • products_electronics


PARTITION BY LIST (category) ( PARTITION clothing VALUES IN ('Clothing'), PARTITION electronics VALUES IN ('Electronics') );

✅ Great for:

  • Product catalogs

  • Region- or department-specific data

  • Queries that focus on a specific group


๐Ÿ”ข C. Hash Partitioning

How it works:
Data is assigned to partitions using a hash function on a key, like customer_id.

Example:
A web_logs table with millions of sessions.

session_idpage_urltimestamp
101   /home         2025-07-08 08:10AM
202  /products/123         2025-07-08 08:15AM



๐Ÿ“‚ Hash partitions:
PARTITION BY HASH (session_id) PARTITIONS 5;

Each session_id % 5 determines the partition (0–4).

✅ Great for:

  • Evenly distributing data

  • Large, ungrouped datasets

  • Load balancing across partitions


๐ŸŽฏ Which Should You Use?

Use CaseSuggested Partitioning
Time-based data (logs, IoT)⏳ Range
Grouped categories๐Ÿ—‚️ List
Random or evenly spread IDs๐Ÿ”ข Hash
Huge table, many columns๐Ÿ“ Vertical
Need to purge old data fast๐Ÿงน Horizontal



๐Ÿง  Final Thoughts

Partitioning isn't just for giant enterprises. Even mid-sized datasets can benefit from smarter table structures.

Start small:

  • Choose a partition key that aligns with your query patterns

  • Monitor performance before and after

  • Keep partitions manageable (avoid too many!)

Comments

Popular posts from this blog

Apache Kafka - The basics

Spring: How to deal with circular dependencies

Kafka: How to handle duplicate messages