32

Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)

For example you can do

SELECT name, first_value() over (partition by name order by date) from table1

But is there a way to do something like:

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

Where we are applying two functions onto the same window?

Reference: http://postgresql.ro/docs/8.4/static/tutorial-window.html

2 Answers 2

37

Can you not just use the window per selection

Something like

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

Also from your reference you can do it like this

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)
Sign up to request clarification or add additional context in comments.

4 Comments

Is it still the most efficient query when the number of PARTITION BY increases ?
You need to use parameter in first_value() and last_value(). I guess it should be date.
@SkippyleGrandGourou according to the Postgres documentation, using the exact same PARTITION BY and ORDER BY clauses will guarantee that all window functions will use the same single pass over the data. postgresql.org/docs/9.1/…
According to the documentation, the WINDOW clause "saves typing". 😆
20

Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar) (see proof at the end of the answer).

Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.

Consider the following query :

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

This is a big mistake, as it will take much longer. Proof :

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY clause on a partition which doesn't need it by itself.

Here comes the WINDOW syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE) to the first query :

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

Post-warning update :

I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)" seems questionable, so here is an example :

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)

7 Comments

In the warning, it's said, "That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)." Why is it not?
@Cromax Just run the WINDOW command of my answer on this minimal example : create table foobar(foo float, bar int); insert into foobar values (1,1); insert into foobar values (3,1);, with and without the ORDER BY.
@Nick I'm not into SQL anymore enough to reliably answer the why, but try the example in my previous comment (maybe add some lines to make it more obvious), the array_agg() output will give some hint.
@Skippy Damn, you've changed my understanding of PARTITION, thanks! (For those interested: with ORDER BY it returns rows of {1; 2}, and without it returns {2; 2}.
THANK YOU. I understand now. Here's another pretty good example: postgresql.org/docs/9.1/static/tutorial-window.html. Then, the ORDER BY sorts the rows within the window, and processes them in that order including only the current and previously seen rows, ignoring rows that are after current or not in the window.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.