The window function is useful when using SQL for analysis. Let me first introduce what is window functions and why we should use them. I would like to summarise the types window functions with some easy examples.
What is a window function
As Per the PostgresSQL documentation:
A window function performs a calculation across a set of table rows that are somehow related to the current row…Behind the scenes, the window function is able to access more than just the current row of the query result.
Window functions applies aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window.
Why to use window functions
- Window functions are simple to use and read.
- It allows you to work with both aggregate and non-aggregate values all at once.
- Window functions increase the efficiency and reduce the complexity of queries.
- Decrease the performance issues. For example, you can use a window function instead of having to do a self-join or cross-join.
How to use window functions
Window functions are initiated with the OVER clause, and are configured using three concepts:
- window partition (PARTITION BY) - Groups rows into partitions
- window ordering (ORDER BY) - Orders rows within those partitions into a particular order
- window frame (ROWS) - Defines the window by use of an offset from the specified row
Types of window functions
- Aggregate Function
- Ranking Function
- Value Function
OVER([PARTITION BY partition_list] [ORDER BY order_list] ROWS frame_couse) AS new_column
- Window_Function : It is the Name of the window function. for example any aggregate or ranking function.
- Expression - Column on which window function is to be applied.
- OVER - To signify that this is a window function.
- PARTITION BY - It divides the rows into partitions.
- partition_list - It is the name of the column(s) we want to partition by
- ORDER BY - It is used so that we can order the rows within each partition. This is optional and does not have to be specified
- order_list- It is the Name of the column(s) we want to order by
- new_column- Name of new column
- table_name-Name of the table
- ROWS- It is use to limit the rows within our partition.[optional]
- frame_clause - frame_clause defines the group of rows that create the window frame. The frame_clause supports the following frames:
- RANGE UNBOUNDED PRECEDING
- RANGE BETWEEN CURRENT ROW AND CURRENT ROW
Various aggregate functions such as AVERAGE(), SUM(), COUNT(), MAX(), MIN() applied over a particular window (set of rows) are called aggregate window functions.
Think about it with a table that has Customer_Id, Product_name, City, Product_sale. Let's call this table the Product_info table.
For example, You want to find the average sale of each Product and order the product by City.
AVG(Product_sale) OVER( PARTITION BY Product_name ORDER BY City desc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Avg_Sale
The result obtained is
As we can see in above example, the Average Sale of each Product is calculated and displayed in column Avg_Sale. Also ordered by City.
Ranking functions are, RANK(), DENSE_RANK(), ROW_NUMBER()
Suppose you want to assign a number to each product in order of the Product_Sale. Use the ROW_NUMBER() function.
ROW_NUMBER() OVER( PARTITION BY Product_name ORDER BY Product_sale desc ) AS row_num
The result obtained is As we can see in above example, It assigns consecutive integers to all the rows within partition.
For example, If you want to assign a Ranking value for each product according to the product_sale.
RANK() OVER( PARTITION BY City ORDER BY Product_sale desc ) AS Rank
The result obtained is As we can see in above example, we have rank the Product according to Product_sale within each City. It assigns rank to all the rows within every partition. Rank is assigned such that rank 1 given to the first row and rows having same value are assigned same rank.
For example, If you want to assign Dense Rank for each product according to the product_sale.
DENSE_RANK() OVER( PARTITION BY Product_Name ORDER BY Product_sale desc ) AS Dense_Rank
The result obtained is As we can see in above example, It assigns Dense Rank for each product according to the Product_sale. It assigns rank to each row within partition. The difference between RANK() and DENSE_RANK() is that in DENSE_RANK(), for the next rank after two same rank, consecutive integer is used, no rank is skipped.
Value functions are great for extracting values from other rows that might be useful for a report. These functions really help in reporting and summarizing data.
LAG() OR LEAD()
For example , You want create a new column that provides the previous sale and following sale for each Product order by the city.
LAG(product_sale) OVER( PARTITION BY Product_name ORDER BY City) AS Lag,
LEAD(product_sale) OVER( PARTITION BY Product_name ORDER BY City) as lead
The result obtained is LAG or LEAD functions are use to pull values from other rows. LAG can return values from the previous rows whereas LEAD returns values from following rows. Default is what to return if there is no previous one. Default is NULL. Comparing previous or following rows is useful while calculating differences across time.
The SQL window functions are often used by data scientists and data analysts. Most SQL beginners may not really be aware of modern SQL. It is helpful to know how to use window functions which is more advanced sql technique. I have summarised the window functions which I have most frequently use in my work, so hopefully they are applicable to your work as well.