How queries work when table has been partitioned in PostgreSQL
If you partition a table, queries that involve data from multiple partitions work efficiently through a process called partition pruning and smart execution by the database.
Here's how PostgreSQL handles queries that span multiple partitions:
1. Partition Pruning (The Key to Speed) ✂️
The main reason partitioning works is due to partition pruning. This is the database's ability to skip searching through partitions that couldn't possibly contain the data needed for a specific query.
How it Works:
- The Partition Key: When you set up partitioning, you choose a partition key (like a
dateor acustomer_id). The table is broken into pieces (partitions) based on ranges of that key (e.g., one partition for January, one for February). - The Query Filter: When you run a query like:
SELECT * FROM orders WHERE order_date >= '2025-03-01' AND order_date < '2025-04-01'; - The Pruning: PostgreSQL looks at the
WHEREclause (order_dateis in March 2025) and compares it to the partition definitions. It immediately realizes that the partitions for January and February are irrelevant. It prunes those partitions from the search list. - The Search: The database then only searches the relevant partition (
orders_2025_march). Even though the table logically has billions of rows, the database physically only touches a small fraction of them.
If the query includes the partition key, the database only needs to search the relevant pieces.
2. Queries Spanning Multiple Partitions
Sometimes, you need to search across several partitions. This is where the database uses parallelism and a unified view to execute the query quickly.
Unified View
The original "parent" table acts as a unified view over all the smaller partitions. You don't have to write code to check each partition manually.
- Example Query: If you query for data across two months:
SELECT SUM(amount) FROM sales WHERE sales_date >= '2025-01-01' AND sales_date < '2025-03-01'; - Execution:
- PostgreSQL performs pruning, identifying only the
sales_2025_januaryandsales_2025_februarypartitions. - It then treats these two small tables as one temporary table for the query.
- It runs the
SUM(amount)query across the two tables and combines the final results.
- PostgreSQL performs pruning, identifying only the
Aggregate and Join Operations
Queries that involve grouping or joining data work seamlessly, but they might be slower than queries that are pruned to a single partition:
- Aggregates (
GROUP BY): PostgreSQL can run the aggregate function on each relevant partition simultaneously and then combine the results. - Joins: If you join the partitioned table to another table, PostgreSQL will typically join the external table to each required partition individually, which, thanks to the pruning, is still faster than joining against one giant table.
No Partition Key Filter
If you run a query that doesn't include the partition key (e.g., SELECT * FROM orders WHERE customer_name = 'Smith'), the database cannot perform pruning.
In this scenario, the database must scan all partitions. While this is unavoidable, it will still be slower than a pruned query and might approach the performance of querying the original, unpartitioned giant table. This is why choosing an effective partition key is critical.