1

I would need help in re-writing query to improve performance. I think below query is slow because of OR part and two sub queries for products table (so twice scanning) for the each key in category table.

SELECT key
FROM category c
WHERE (1=1)
AND (  ((EXISTS (SELECT * from products p
                  WHERE p.attribute_key=2 
                   AND p.category_key=c.key 
                   AND ((value && CAST(ARRAY['Active', 'active'] AS text[])))
                   AND p.status='active'
        ))
     OR(NOT EXISTS(SELECT * from products p
                    WHERE p.attribute_key=2 
                     AND p.category_key=c.key 
                     AND p.status='active') 
        ))
    )
AND c.status='active'
AND c.type_key=4

expected output is (9 rows) below

key
1
13
3
6
2
7
4
10
15

Query returns keys from category table if type_key=4 and category.status='active' and

  1. products table has value='Active' or 'active' and attribute_key=2 and status='active. (this is EXISTS part in query)

  2. even if products table doesn't have any attribute_key or has attribute_key!=2 records for type_key=4. (this is OR( NOT EXISTS) part in query) Example: type_key 13 and 3

Query explain analyze plan is below. indexes are used well.

I hope query can be improved by writing another way or changing OR part in query

sample data is here dbfiddle

Aggregate  (cost=3156220.03..3156220.04 rows=1 width=8) (actual time=86100.329..86100.342 rows=1 loops=1)
  ->  Index Scan using category_type_key_status on category c  (cost=0.43..3155906.64 rows=125355 width=0) (actual time=12.618..85925.747 rows=120852 loops=1)
        Index Cond: ((type_key = 4) AND (status = 'active'))
        Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (SubPlan 3)))
        Rows Removed by Filter: 86879
        SubPlan 1
          ->  Index Scan using products_category_key_attribute_key_status on products p  (cost=0.56..8.59 rows=1 width=0) (actual time=0.332..0.332 rows=1 loops=207731)
                Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
                Filter: (value && '{Active,active}'::text[])
                Rows Removed by Filter: 0
        SubPlan 2
          ->  Gather  (cost=1000.00..1155110.30 rows=8916 width=4) (never executed)
                Workers Planned: 2
                Workers Launched: 0
                ->  Parallel Seq Scan on products p_1  (cost=0.00..1153218.70 rows=3715 width=4) (never executed)
                      Filter: ((value && '{Active,active}'::text[]) AND (attribute_key = 2) AND (status = 'active'))
        SubPlan 3
          ->  Index Only Scan using  products_category_key_attribute_key_status on products p_2  (cost=0.56..8.58 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=86933)
                Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
                Heap Fetches: 11497
Planning Time: 35.808 ms
Planning Time: 35.808 ms

1
  • Why do you use an array to store data? And it's not indexed. Commented Mar 15, 2022 at 20:36

1 Answer 1

2

Both of the predicates on the products table are almost exactly the same. You can simply join this table once and use OR to apply either of the conditions:

SELECT c.key
  FROM category c
  LEFT OUTER
  JOIN products p
    ON p.category_key = c.key
   AND p.status = 'active'
   AND p.attribute_key = 2
 WHERE c.status='active'
   AND c.type_key=4
   AND (   p.category_key IS NULL -- NOT EXISTS
        OR ((value && CAST(ARRAY['Active', 'active'] AS text[])))) -- or value matches
                  

dbfiddle with results

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.