Skip to main content
Version: 2.x

Queries with Window Functions

Window expressions allow application developers to more easily compose complex online analytical processing (OLAP) queries using standard SQL commands. For example, with window expressions, users can calculate moving averages or sums over various intervals, reset aggregations and ranks as selected column values change, and express complex ratios in simple terms.

A window expression represents the application of a <window function> to a <window frame>, which is defined with an OVER() clause. This is comparable to the type of calculation that can be done with an aggregate function and a GROUP BY clause. Unlike aggregate functions, which return a single result value for each group of rows, window functions return a result value for every row, but that value is calculated with respect to the set of rows in the window frame to which the row belongs. The OVER() clause allows dividing the rows into partitions and then further restricting the window frame by specifying which rows preceding or following the current row within its partition to include in the calculation.

Apache Cloudberry does not support specifying a window function as an argument to another window function.

The syntax of a window expression is:

<window_function> ( [<expression> [, ...]] ) [ FILTER ( WHERE <filter_clause> ) ] OVER ( <window_specification> )

Where <window_function> might be a user-defined window function, and <expression> is any value expression that does not contain a window expression, and <window_specification> is:

[<window_name>]
[PARTITION BY <expression> [, ...]]
[[ORDER BY <expression> [ASC | DESC | USING operator] [NULLS {FIRST | LAST}] [, ...]
[ <frame_clause> ]

The optional <frame_clause> can be one of the following:

{ RANGE | ROWS | GROUPS } <frame_start> [ <frame_exclusion> ]
{ RANGE | ROWS | GROUPS } BETWEEN <frame_start> AND <frame_end> [ <frame_exclusion> ]

Where <frame_start> and <frame_end> can be one of the following:

UNBOUNDED PRECEDING
<offset> PRECEDING
CURRENT ROW
<offset> FOLLOWING
UNBOUNDED FOLLOWING

and <frame_exclusion> can be one of the following:

EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS

A window expression can appear only in the select list of a SELECT command. For example:

SELECT count(*) OVER(PARTITION BY customer_id), * FROM sales;

If FILTER is specified, then only the input rows for which the <filter_clause> evaluates to true are fed to the window function; other rows are discarded. In a window expression, a FILTER clause can be used only with a <window_function> that is an aggregate function.

In a window expression, the expression must contain an OVER clause. The OVER clause specifies the window frame—the rows to be processed by the window function. This syntactically distinguishes the function from a regular or aggregate function.

In a window aggregate function that is used in a window expression, Apache Cloudberry does not support a DISTINCT clause with multiple input expressions.

A window specification has the following characteristics:

  • The PARTITION BY clause defines the window partitions to which the window function is applied. If omitted, the entire result set is treated as one partition.
  • The ORDER BY clause defines the expression(s) for sorting rows within a window partition. The ORDER BY clause of a window specification is separate and distinct from the ORDER BY clause of a regular query expression. The ORDER BY clause is required for the window functions that calculate rankings, as it identifies the measure(s) for the ranking values. For OLAP aggregations, the ORDER BY clause is required to use window frames (the ROWS, RANGE or GROUPS clause).
note

Columns of data types without a coherent ordering, such as time, are not good candidates for use in the ORDER BY clause of a window specification. Time, with or without a specified time zone, lacks a coherent ordering because addition and subtraction do not have the expected effects. For example, the following is not generally true: x::time < x::time + '2 hour'::interval

  • The <frame_clause> specifies the set of rows constituting the <window frame>, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition. The set of rows in the frame can vary depending on which row is the current row. The frame can be specified in RANGE, ROWS or GROUPS mode; in each case, it runs from the <frame_start> to the <frame_end>. If <frame_end> is omitted, the end defaults to CURRENT ROW.

  • A <frame_start> of UNBOUNDED PRECEDING means that the frame starts with the first row of the partition, and similarly a <frame_end> of UNBOUNDED FOLLOWING means that the frame ends with the last row of the partition.

  • In RANGE or GROUPS mode, a <frame_start> of CURRENT ROW means the frame starts with the current row's first peer row (a row that the window's ORDER BY clause sorts as equivalent to the current row), while a <frame_end> of CURRENT ROW means the frame ends with the current row's last peer row. In ROWS mode, CURRENT ROW simply means the current row.

  • In the <offset> PRECEDING and <offset> FOLLOWING frame options, the <offset> must be an expression not containing any variables, aggregate functions, or window functions. The meaning of the <offset> depends on the frame mode:

    • In ROWS mode, the <offset> must yield a non-null, non-negative integer, and the option means that the frame starts or ends the specified number of rows before or after the current row.

    • In GROUPS mode, the <offset> again must yield a non-null, non-negative integer, and the option means that the frame starts or ends the specified number of peer groups before or after the current row's peer group, where a peer group is a set of rows that are equivalent in the ORDER BY ordering. (There must be an ORDER BY clause in the window definition to use GROUPS mode).

    • In RANGE mode, these options require that the ORDER BY clause specifies exactly one column. The <offset> specifies the maximum difference between the value of that column in the current row and its value in preceding or following rows of the frame. The data type of the <offset> expression varies depending on the data type of the ordering column. For numeric ordering columns it is typically of the same type as the ordering column, but for datetime ordering columns it is an interval. For example, if the ordering column is of type date or timestamp, one could write RANGE BETWEEN '1 day' PRECEDING AND '10 days' FOLLOWING. The <offset> is still required to be non-null and non-negative, though the meaning of “non-negative” depends on its data type.

    In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere.

  • Notice that in both ROWS and GROUPS mode, 0 PRECEDING and 0 FOLLOWING are equivalent to CURRENT ROW. This normally holds in RANGE mode as well, for an appropriate data-type-specific meaning of “zero”.

  • The <frame_exclusion> option allows rows around the current row to be excluded from the frame, even if they would be included according to the frame start and frame end options. EXCLUDE CURRENT ROW excludes the current row from the frame. EXCLUDE GROUP excludes the current row and its ordering peers from the frame. EXCLUDE TIES excludes any peers of the current row from the frame, but not the current row itself. EXCLUDE NO OTHERS simply specifies explicitly the default behavior of not excluding the current row or its peers.

  • The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, because all rows become peers of the current row.

  • Restrictions are that <frame_start> cannot be UNBOUNDED FOLLOWING, <frame_end> cannot be UNBOUNDED PRECEDING, and the <frame_end> choice cannot appear earlier in the above list than the <frame_start> choice. for example RANGE BETWEEN CURRENT ROW AND value PRECEDING is not allowed.

Window examples

The following examples demonstrate using window functions with partitions and window frames.

Example 1 – Aggregate window function over a partition

The PARTITION BY list in the OVER clause divides the rows into groups, or partitions, that have the same values as the specified expressions.

This example compares employees' salaries with the average salaries for their departments:

SELECT depname, empno, salary, avg(salary) OVER(PARTITION BY depname)
FROM empsalary;
depname | empno | salary | avg
-----------+-------+--------+-----------------------
develop | 9 | 4500 | 5020.0000000000000000
develop | 10 | 5200 | 5020.0000000000000000
develop | 11 | 5200 | 5020.0000000000000000
develop | 7 | 4200 | 5020.0000000000000000
develop | 8 | 6000 | 5020.0000000000000000
personnel | 5 | 3500 | 3700.0000000000000000
personnel | 2 | 3900 | 3700.0000000000000000
sales | 1 | 5000 | 4866.6666666666666667
sales | 3 | 4800 | 4866.6666666666666667
sales | 4 | 4800 | 4866.6666666666666667
(10 rows)

The first three output columns come from the table empsalary, and there is one output row for each row in the table. The fourth column is the average calculated on all rows that have the same depname value as the current row. Rows that share the same depname value constitute a partition, and there are three partitions in this example. The avg function is the same as the regular avg aggregate function, but the OVER clause causes it to be applied as a window function.

You can also put the window specification in a WINDOW clause and reference it in the select list. This example is equivalent to the previous query:

SELECT depname, empno, salary, avg(salary) OVER(mywindow)
FROM empsalary
WINDOW mywindow AS (PARTITION BY depname);

Defining a named window is useful when the select list has multiple window functions using the same window specification.

Example 2 – Ranking window function with an ORDER BY clause

An ORDER BY clause within the OVER clause controls the order in which rows are processed by window functions. The ORDER BY list for the window function does not have to match the output order of the query. This example uses the rank() window function to rank employees' salaries within their departments:

SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 11 | 5200 | 2
develop | 10 | 5200 | 2
develop | 9 | 4500 | 4
develop | 7 | 4200 | 5
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
sales | 3 | 4800 | 2
(10 rows)

Example 3 – Aggregate function over a row window frame

A RANGE, ROWS or GROUPS clause defines the window frame—a set of rows within a partition—that the window function includes in the calculation. ROWS specifies a physical set of rows to process, for example all rows from the beginning of the partition to the current row.

This example calculates a running total of employee's salaries by department using the sum() function to total rows from the start of the partition to the current row:

SELECT depname, empno, salary,
sum(salary) OVER (PARTITION BY depname ORDER BY salary
ROWS between UNBOUNDED PRECEDING AND CURRENT ROW)
FROM empsalary ORDER BY depname, sum;
depname | empno | salary | sum
-----------+-------+--------+-------
develop | 7 | 4200 | 4200
develop | 9 | 4500 | 8700
develop | 11 | 5200 | 13900
develop | 10 | 5200 | 19100
develop | 8 | 6000 | 25100
personnel | 5 | 3500 | 3500
personnel | 2 | 3900 | 7400
sales | 4 | 4800 | 4800
sales | 3 | 4800 | 9600
sales | 1 | 5000 | 14600
(10 rows)

Example 4 – Aggregate function for a range or groups window frame

RANGE and GROUPS modes specify logical values based on values of the ORDER BY expression in the OVER clause. This example demonstrates the difference between ROWS and RANGE or GROUPS. The frame contains all rows with salary values less than or equal to the current row. Unlike the previous example, for employees with the same salary, the sum is the same and includes the salaries of all of those employees.

SELECT depname, empno, salary,
sum(salary) OVER (PARTITION BY depname ORDER BY salary
RANGE between UNBOUNDED PRECEDING AND CURRENT ROW)
FROM empsalary ORDER BY depname, sum;
depname | empno | salary | sum
-----------+-------+--------+-------
develop | 7 | 4200 | 4200
develop | 9 | 4500 | 8700
develop | 11 | 5200 | 19100
develop | 10 | 5200 | 19100
develop | 8 | 6000 | 25100
personnel | 5 | 3500 | 3500
personnel | 2 | 3900 | 7400
sales | 4 | 4800 | 9600
sales | 3 | 4800 | 9600
sales | 1 | 5000 | 14600
(10 rows)

In this example, because the <frame_start> and <frame_end> are not using an <offset>, RANGE and GROUPS mode provide identical results:

SELECT depname, empno, salary,
sum(salary) OVER (PARTITION BY depname ORDER BY salary
GROUPS between UNBOUNDED PRECEDING AND CURRENT ROW)
FROM empsalary ORDER BY depname, sum;
depname | empno | salary | sum
-----------+-------+--------+-------
develop | 7 | 4200 | 4200
develop | 9 | 4500 | 8700
develop | 11 | 5200 | 19100
develop | 10 | 5200 | 19100
develop | 8 | 6000 | 25100
personnel | 5 | 3500 | 3500
personnel | 2 | 3900 | 7400
sales | 4 | 4800 | 9600
sales | 3 | 4800 | 9600
sales | 1 | 5000 | 14600
(10 rows)

Example 5 – Aggregate function for a range and a groups window frame

This example demonstrates the difference between RANGE and GROUPS. The table sales lists the sales a company's two shops during a period of four days.

When using GROUPS mode, the value of <offset> indicates the number of peer groups before the current row's peer group. The different peer groups are set by the ORDER BY clause, in this case the date. because the query does not specify a value for <frame_end>, the end defaults to CURRENT ROW.

SELECT date, shop, total, sum(total) OVER (PARTITION BY shop ORDER BY date asc GROUPS 2 PRECEDING) 
FROM sales ORDER BY shop, date;
date | shop | total | sum
------------+--------+---------+----------
2022-01-07 | Shop 1 | 3000.00 | 3000.00
2022-01-08 | Shop 1 | 1000.00 | 4000.00
2022-01-09 | Shop 1 | 5000.00 | 11000.00
2022-01-09 | Shop 1 | 2000.00 | 11000.00
2022-01-07 | Shop 2 | 4000.00 | 10000.00
2022-01-07 | Shop 2 | 6000.00 | 10000.00
2022-01-09 | Shop 2 | 7000.00 | 21000.00
2022-01-09 | Shop 2 | 4000.00 | 21000.00
2022-01-10 | Shop 2 | 2000.00 | 23000.00
(9 rows)

In order to get the equivalent result when using the RANGE mode, the <frame_clause> must specify an <offset> using the same data type as the ordering column, in this case date. Using the numberic value 2 will return an error.

SELECT date, shop, total, sum(total) OVER (PARTITION BY shop ORDER BY date asc RANGE '2 days' PRECEDING) 
FROM sales ORDER BY shop, date;
date | shop | total | sum
------------+--------+---------+----------
2022-01-07 | Shop 1 | 3000.00 | 3000.00
2022-01-08 | Shop 1 | 1000.00 | 4000.00
2022-01-09 | Shop 1 | 5000.00 | 11000.00
2022-01-09 | Shop 1 | 2000.00 | 11000.00
2022-01-07 | Shop 2 | 4000.00 | 10000.00
2022-01-07 | Shop 2 | 6000.00 | 10000.00
2022-01-09 | Shop 2 | 7000.00 | 21000.00
2022-01-09 | Shop 2 | 4000.00 | 21000.00
2022-01-10 | Shop 2 | 2000.00 | 13000.00
(9 rows)

Note that the above outputs differ in the last row because RANGE mode uses the entries from the previous two days and there is no entry for 2022-01-08 for Shop 2, but GROUPS mode uses the previous two peer groups, which are 2022-01-07 and 2022-01-09.