Postgres query performance - hash aggregate costly operation

Question

I'm running Postgres 9.6.6 and I have a relatively simple query, however, I'm finding that it is slow with the hash aggregation function which is the most costly operation:

    select
      program_schedule_source_count.full_name,
      program_schedule_source_count.country,
      sum(program_schedule_source_count.displays) as displays,
      program_schedule_source_count.source_id,
      program_schedule_source_count.source_region
    from program_schedule_source_count
    where program_schedule_source_count.original_title = 'How I Met Your Mother'
    and program_schedule_source_count.show_type in ('SM', 'SE')
    and program_schedule_source_count.start_date between '20200101' and '20200217'
    group by
      program_schedule_source_count.source_id,
      program_schedule_source_count.full_name,
      program_schedule_source_count.country,
      program_schedule_source_count.source_region;

Below is the query plan:

HashAggregate  (cost=1139676.26..1139725.25 rows=3919 width=46) (actual time=18769.670..18770.066 rows=736 loops=1)
  Group Key: source_id, full_name, country, source_region
  ->  Index Scan using title_date_show_type_country on program_schedule_source_count  (cost=0.70..1139186.45 rows=39185 width=46) (actual time=0.098..18733.005 rows=42654 loops=1)
        Index Cond: ((start_date >= '20200101'::bpchar) AND (start_date <= '20200217'::bpchar) AND ((original_title)::text = 'How I Met Your Mother'::text))
        Filter: (show_type = ANY ('{SM,SE}'::bpchar[]))
Planning time: 0.223 ms
Execution time: 18770.252 ms

The table has indexes on all of the fields in the where clause:

CREATE UNIQUE INDEX program_schedule_pkey ON public.program_schedule_source_count USING btree (source_id, start_date, program_id);
CREATE INDEX title_date_show_type_country ON public.program_schedule_source_count USING btree (start_date, original_title, release_year, show_type, country);

I have tried changing the order of the group by function but this did nothing to alter the performance. I tried to disable hash aggregation to see if this would speed up the query but it still runs in roughly the same amount of time. I assume adding an index to the group by fields would not have any benefit because I am not searching these fields.

I've seen that clustering could benefit but in the documentation it says you would cluster by an index does this mean I would have to create another index of the group by fields and then cluster the data by this?

Is there an alternative way I can write the query which could make it faster?

Thanks all for your help

The time is spent in the Index Scan, the HashAggregate only adds about 37ms to the runtime of the statement. You can run explain (analyze, buffers) to see how much data was retrieved during the scan. If you turn on track_io_timings then you can also see how fast that was — user1822
– user1822, Commented Mar 26, 2020 at 12:32

Laurenz Albe · Accepted Answer · 2020-03-26 12:36:36Z

4

Your time is spent in the index scan. Both of your indexes are not perfect for that query.

Try this one:

CREATE INDEX ON program_schedule_source_count (start_date, original_title);

If the third condition is always the same, you might put it into a WHERE clause:

CREATE INDEX ON program_schedule_source_count (start_date, original_title)
   WHERE show_type in ('SM', 'SE');

answered Mar 26, 2020 at 12:36

Laurenz Albe

62.7k4 gold badges58 silver badges94 bronze badges

Thanks. A question on this - is it then recommended to drop the unsuitable indexes? Or would adding these indexes as well enable the optimiser to choose the best index?

vinayman
– vinayman

2020-03-26 12:39:26 +00:00
Commented Mar 26, 2020 at 12:39
It is best to drop unneeded indexes. The use space and make all data modification operations slower.

Laurenz Albe
– Laurenz Albe

2020-03-26 12:41:09 +00:00
Commented Mar 26, 2020 at 12:41
1

I think the columns should be reversed. (equality, range) is better than (range, equality). Also, he already has an index with a prefix of (start_date, original_title), so so adding another one is not likely to improve much.

jjanes
– jjanes

2020-03-26 16:55:16 +00:00
Commented Mar 26, 2020 at 16:55
FYI - Both your suggestions helped speeding up the query - thank you for your help

vinayman
– vinayman

2020-03-31 10:33:17 +00:00
Commented Mar 31, 2020 at 10:33

Add a comment |

Stack Exchange Network

Postgres query performance - hash aggregate costly operation

1 Answer 1

Your Answer

Hot Network Questions

Postgres query performance - hash aggregate costly operation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions