window function in Postgress
Window functions in PostgreSQL perform a calculation across a set of table rows that are related to the current row, but without collapsing the rows into a single output row (unlike a standard aggregate function with GROUP BY).
This allows you to return individual rows while also displaying a calculated aggregate, ranking, or statistical value that is derived from a defined "window" of related rows.
Core Syntax
A window function is syntactically distinguished by the OVER clause, which defines the "window" or set of rows the function operates on.
$$\text{<function_name>} \text{(\text{
Key Components
| Component | Purpose | Analogy to GROUP BY |
|---|---|---|
function_name() | The specific function (e.g., SUM(), AVG(), ROW_NUMBER(), RANK()). | Standard aggregate function |
OVER | The mandatory clause that marks the function as a window function. | N/A |
PARTITION BY | Divides the rows into smaller groups (partitions). The function is computed separately within each partition. | The GROUP BY clause |
ORDER BY | Specifies the order of rows within each partition, which is crucial for functions like RANK() or for calculating running totals. | The ORDER BY clause in the main query (but applied locally) |
<frame_clause> | Further refines the set of rows within a partition (the "window frame") used for the calculation, often specified with ROWS BETWEEN.... | N/A (creates a "sliding window") |
Use Cases and Common Functions
Window functions are essential for complex analytical queries that would otherwise require cumbersome subqueries or self-joins.
1. Ranking 🏆
These functions assign a numerical rank or sequence number to each row within its partition.
ROW_NUMBER(): Assigns a unique, sequential integer to each row (1, 2, 3, 4...).RANK(): Assigns a rank. Tied rows get the same rank, and the next rank skips numbers (1, 2, 2, 5...).DENSE_RANK(): Assigns a rank. Tied rows get the same rank, and no rank numbers are skipped (1, 2, 2, 3...).
Example: Find the top 3 selling products in each category.
SELECT
product_name,
category,
sales,
RANK() OVER (PARTITION BY category ORDER BY sales DESC) AS rank_in_category
FROM
sales_data
WHERE
(RANK() OVER (PARTITION BY category ORDER BY sales DESC)) <= 3;
2. Analytical and Aggregate Functions 📊
Standard aggregate functions can be used as window functions by adding the OVER clause.
SUM()/AVG()/MAX()/MIN(): Calculate aggregates over the window.- Running Totals: Achieved by using
SUM()withORDER BYand a default or specific frame clause. - Moving Averages: Calculated using
AVG()with aROWS BETWEENframe clause to define a fixed "sliding window."
Example: Calculate a running total of sales over time.
SELECT
order_date,
sale_amount,
SUM(sale_amount) OVER (ORDER BY order_date) AS running_total
FROM
orders;
3. Lag/Lead Functions 🕰️
These functions access the data of a preceding or succeeding row relative to the current row.
LAG(): Returns a column's value from a row before the current row.LEAD(): Returns a column's value from a row after the current row.
Example: Compare today's stock price to yesterday's price.
SELECT
stock_date,
price,
LAG(price, 1) OVER (ORDER BY stock_date) AS previous_day_price
FROM
stock_prices;