Applying Multiple Window Functions On Same Partition

Question

Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)

For example you can do

SELECT name, first_value() over (partition by name order by date) from table1

But is there a way to do something like:

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

Where we are applying two functions onto the same window?

Reference: http://postgresql.ro/docs/8.4/static/tutorial-window.html

Ashish · Accepted Answer · 2015-04-30 15:25:17Z

37

Can you not just use the window per selection

Something like

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

Also from your reference you can do it like this

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)

edited Apr 30, 2015 at 15:25

Ashish

1412 silver badges9 bronze badges

answered Dec 13, 2009 at 10:29

Adriaan Stander

167k32 gold badges294 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Skippy le Grand Gourou Over a year ago

Is it still the most efficient query when the number of PARTITION BY increases ?

Michal Špondr Over a year ago

You need to use parameter in first_value() and last_value(). I guess it should be date.

v1bri Over a year ago

@SkippyleGrandGourou according to the Postgres documentation, using the exact same PARTITION BY and ORDER BY clauses will guarantee that all window functions will use the same single pass over the data. postgresql.org/docs/9.1/…

gilly3 Over a year ago

According to the documentation, the WINDOW clause "saves typing". 😆

Skippy le Grand Gourou · Accepted Answer · 2017-08-12 18:39:00Z

20

Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar) (see proof at the end of the answer).

Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.

Consider the following query :

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

This is a big mistake, as it will take much longer. Proof :

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY clause on a partition which doesn't need it by itself.

Here comes the WINDOW syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE) to the first query :

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

Post-warning update :

I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)" seems questionable, so here is an example :

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)

edited Aug 12, 2017 at 18:39

answered Oct 1, 2014 at 12:13

Skippy le Grand Gourou

7,8626 gold badges66 silver badges82 bronze badges

7 Comments

Nick Over a year ago

In the warning, it's said, "That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)." Why is it not?

Skippy le Grand Gourou Over a year ago

@Cromax Just run the WINDOW command of my answer on this minimal example : create table foobar(foo float, bar int); insert into foobar values (1,1); insert into foobar values (3,1);, with and without the ORDER BY.

Skippy le Grand Gourou Over a year ago

@Nick I'm not into SQL anymore enough to reliably answer the why, but try the example in my previous comment (maybe add some lines to make it more obvious), the array_agg() output will give some hint.

Cromax Over a year ago

@Skippy Damn, you've changed my understanding of PARTITION, thanks! (For those interested: with ORDER BY it returns rows of {1; 2}, and without it returns {2; 2}.

Nick Over a year ago

THANK YOU. I understand now. Here's another pretty good example: postgresql.org/docs/9.1/static/tutorial-window.html. Then, the ORDER BY sorts the rows within the window, and processes them in that order including only the current and previously seen rows, ignoring rows that are after current or not in the window.

|

Collectives™ on Stack Overflow

Applying Multiple Window Functions On Same Partition

2 Answers 2

4 Comments

Post-warning update :

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Post-warning update :

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related