Optimization of simple SQL query with window function

Question

I have a PostgreSQL table constructed as

  a  |  b  |  c
-----+-----+-----
   3 |   2 |   1
   1 |   5 |   1
   8 |   4 |   1
   2 |   5 |   1
   4 |   4 |   2
   2 |   5 |   2
   9 |   3 |   2
   3 |   5 |   3
   2 |   5 |   3
   4 |   4 |   3
   5 |   6 |   3
   9 |   7 |   3

I want to compute the average value of a for each value of c where b is below a given value — e.g. the average of b.

Here is my query :

SELECT avg(a) FROM mytable t WHERE b<(SELECT avg(b) FROM mytable WHERE c=t.c) GROUP BY c;

I actually have two issues, but I believe they both belong to this single question (the first one will actually allow me to update the title) :

Is there a particular name or expression for this kind of query (I mean, operations on subselections and reintegration in the main query, or something like that) ? I couldn't find how to even search for a solution online… => ok, window functions.
This query is very slow, how can I optimize it ? I'm using 9.3.5, and b's are already sorted in numerical order.

Thanks.

Update : Edit on user17130's answer was rejected, but this answer won't work from scratch, so here is the working piece of code :

explain select 
   avg(a) 
   from  
   (
       select  
        avg(b) over (partition by c) as b_avg,
        a,
        b,
        c 
        from mytable
    ) as t 
    where b<b_avg 
    group by c;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 GroupAggregate  (cost=135.34..202.46 rows=67 width=8)
   Subquery Scan on t  (cost=135.34..198.39 rows=647 width=8)
     Filter: ((t.b)::numeric < t.b_avg)
     ->  WindowAgg  (cost=135.34..169.29 rows=1940 width=12)
           ->  Sort  (cost=135.34..140.19 rows=1940 width=12)
                 Sort Key: mytable.c
                 ->  Seq Scan on mytable  (cost=0.00..29.40 rows=1940 width=12)

It's much easier to search for the right terms indeed, thanks ! I'll update the question accordingly. — Skippy le Grand Gourou
– Skippy le Grand Gourou, Commented Sep 17, 2014 at 16:55
Post the explain analyze (using explain.depesz.com preferebly). "very slow" is a bit too vague. — Jakub Kania
– Jakub Kania, Commented Sep 17, 2014 at 17:02

user17130 · Accepted Answer · 2014-09-17 17:07:34Z

2

I think this is what you mean to do. This only has one table scan using window functions. As you can see your query below is estimated to cost a lot more running time than this one. Without any selective conditions your going to have scan the table at least once.

 explain select                                                                     
    a_avg
    from
    (
        select
         avg(a) over (partition by c) as a_avg  
        ,avg(b) over (partition by c) as b_avg
        ,c
        ,b
        from mytable
    ) as t
    where b < b_avg
;
                                  QUERY PLAN                                  
──────────────────────────────────────────────────────────────────────────────
 Subquery Scan on t  (cost=135.34..203.24 rows=647 width=32)
   Filter: ((t.b)::numeric < t.b_avg)
   ->  WindowAgg  (cost=135.34..174.14 rows=1940 width=12)
         ->  Sort  (cost=135.34..140.19 rows=1940 width=12)
               Sort Key: mytable.c
               ->  Seq Scan on mytable  (cost=0.00..29.40 rows=1940 width=12)
 Planning time: 0.128 ms
(7 rows)

...

crow@test=# explain SELECT avg(a) FROM mytable t WHERE b<(SELECT avg(b) FROM mytable WHERE c=t.c) GROUP BY c;
                                 QUERY PLAN                                  
─────────────────────────────────────────────────────────────────────────────
 HashAggregate  (cost=66560.08..66560.92 rows=67 width=8)
   Group Key: t.c
   ->  Seq Scan on mytable t  (cost=0.00..66556.85 rows=647 width=8)
         Filter: ((b)::numeric < (SubPlan 1))
         SubPlan 1
           ->  Aggregate  (cost=34.28..34.29 rows=1 width=4)
                 ->  Seq Scan on mytable  (cost=0.00..34.25 rows=10 width=4)
                       Filter: (c = t.c)
 Planning time: 0.191 ms
(9 rows)

answered Sep 17, 2014 at 17:07

user17130

2611 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Skippy le Grand Gourou Over a year ago

Your code didn't do exactly what I expected from scratch, but your answer helped dramatically, so I edited it with the corrected code.

Skippy le Grand Gourou Over a year ago

Edit was rejected, see updated question for the working code.

Collectives™ on Stack Overflow

Optimization of simple SQL query with window function

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related