Optimizing this counting query in Postgresql

Question

I need to implement a basic facet search sidebar in my app. I unfortunately can't use Elasticsearch/Solr/alternatives and limited to Postgres.

I have around 10+ columns ('status', 'classification', 'filing_type'...) I need to return counts for every distinct value after every search made and display them accordingly. I've drafted this bit of sql, however, this won't take me very far in the long run as it will slow down massively once I reach a high number of rows.

select row_to_json(t) from (
    select 'status' as column, status as value, count(*) from api_articles_mv_temp group by status 
  union
    select 'classification' as column, classification as value, count(*) from api_articles_mv_temp group by classification 
  union 
    select 'filing_type' as column, filing_type as value, count(*) from api_articles_mv_temp group by filing_type
  union
    ...) t;

This yields

 {"column":"classification","value":"State","count":2001}
 {"column":"classification","value":"Territory","count":23}
 {"column":"filing_type","value":"Joint","count":169}
 {"column":"classification","value":"SRO","count":771}
 {"column":"filing_type","value":"Single","count":4238}
 {"column":"status","value":"Updated","count":506}
 {"column":"classification","value":"Federal","count":1612}
 {"column":"status","value":"New","count":3901}

From the query plan, the HashAggregates are slowing it down.

Subquery Scan on t  (cost=2397.58..2397.76 rows=8 width=32) (actual time=212.822..213.022 rows=8 loops=1)
  ->  HashAggregate  (cost=2397.58..2397.66 rows=8 width=186) (actual time=212.780..212.856 rows=8 loops=1)
         Group Key: ('status'::text), api_articles_mv_temp.status, (count(*))
         ->  Append  (cost=799.11..2397.52 rows=8 width=186) (actual time=75.238..212.701 rows=8 loops=1)
               ->  HashAggregate  (cost=799.11..799.13 rows=2 width=44) (actual time=75.221..75.242 rows=2 loops=1)
                     Group Key: api_articles_mv_temp.status
...

Is there a simpler, more optimized way of getting this result?

etsuhisa · Accepted Answer · 2020-08-09 06:22:22Z

It may be improve the performance that reading api_articles_mv_temp is just once. I gave you examples so can you try them?

If the combinations of "column" and "value" are fixed, the query looks like this:

select row_to_json(t) from (
  select "column", "value", count(*) as "count"
  from column_temp left outer join api_articles_mv_temp on
    "value"=
    case "column"
      when 'status' then status
      when 'classification' then classification
      when 'filing_type' then filing_type
    end
  group by "column", "value"
) t;

The column_temp has records below:

column         |value
---------------+----------
status         |New
status         |Updated
classification |State
classification |Territory
classification |SRO
filing_type    |Single
filing_type    |Joint

DB Fiddle

If just the "column" is fixed, the query looks like this:

select row_to_json(t) from (
  select "column",
    case "column"
      when 'status' then status
      when 'classification' then classification
      when 'filing_type' then filing_type
    end as "value",
    sum("count") as "count"
  from column_temp a
    cross join (
      select
        status,
        classification,
        filing_type,
        count(*) as "count"
      from api_articles_mv_temp
      group by
        status,
        classification,
        filing_type) b
  group by "column", "value"
) t;

The column_temp has records below:

column         
---------------
status         
classification 
filing_type

DB Fiddle

Collectives™ on Stack Overflow

Optimizing this counting query in Postgresql

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related