0

PostgreSQL 8.4; Three tables - store (~100k, pk id, fk supplier_id & item_id), supplier(~10 pk supplier_id), item(~1000 pk item_id);

I created the following query to get the data I need:

SELECT store.quantity, store.price, x.supplier_name
FROM store NATURAL JOIN
     (SELECT * FROM item NATURAL JOIN supplier) AS x 
 WHERE store.price > 500 AND store.quantity > 0 AND
       store.quantity < 100 AND
       x.item_name = 'SomeName';

The query plan:

Nested Loop  (cost=20.76..6513.55 rows=8 width=229)
  ->  Hash Join  (cost=20.76..6511.30 rows=8 width=15)
        Hash Cond: (store.item_id = item.item_id)
        ->  Seq Scan on store  (cost=0.00..6459.00 rows=8388 width=23)
              Filter: ((price > 500::numeric) AND (quantity > 0) AND (quantity < 100))
        ->  Hash  (cost=20.75..20.75 rows=1 width=8)
              ->  Seq Scan on item  (cost=0.00..20.75 rows=1 width=8)
                    Filter: ((item_name)::text = 'SomeName'::text)
  ->  Index Scan using supplier_pkey on supplier  (cost=0.00..0.27 rows=1 width=222)
        Index Cond: (supplier.supplier_id = store.supplier_id)

Now the aim is to reduce the cost by more than 30% by optimizing the query itself. The only instances of this problem I found were solved by modifying the table or the server settings, but I am looking to do this by modifying nothing else than the query and that's where I fell short in research.

Clearly the issue to be solved is the Seq Scan, which brings me to thinking I need to arrange it so that the scanning/filtering is applied only to a subset of the store table - but iirc you need to scan the table in any such case, so maybe use something else than a Seq Scan? Index scan isn't going to help since I wouldn't be filtering by the index... I'm puzzled here because this seems more of a choice that the PostgreSQL optimizer makes and not something I can change at will...

(If you're wondering, this was part of an assignment and I'm asking here because I have spent quite a few hours researching the problem failing to find anything relevant, and I just gave up on it, but I'm still curious...)

0

2 Answers 2

1

You can probably fix this with indexes. It is a little hard to tell what the keys are because of the "natural join"s. (I recommend using instead of natural join so you can at least see what keys are being used and if one of the tables is modified, it won't mess up the join.)

I think an index on item(item_name, item_id) would help the query plan.

Sign up to request clarification or add additional context in comments.

3 Comments

Doing that only improves the hash of the item from 20.75 to 8.27, which is insignificant next to the 6000ish seq scan. (I added the relevant indexes that the joins use to the op)
I didn't consider global changes when posting the question, but your suggestion definitely works - the right index to add was store(item_id,price), which resulted in a very different query plan and reduced the cost all the way to 5% of the original :-) (I'm not marking your answer as accepted, though, because I'm still interested if an optimization can be done locally - the op says that anyway )
@user742925 . . . You can give the query a hint to use a hash-based algorithm for the join rather than a nested loop. My guess is that is where the performance bottleneck is likely to be.
1

Will be hard to optimize because it looks nice, try this to avoid subquery :

SELECT 
    store.quantity, 
    store.price, 
    supplier.supplier_name 
FROM store 
    INNER JOIN item
        ON store.item_id = item.item_id
    INNER JOIN supplier
        ON supplier.supplier_id = store.supplier_id
        AND supplier.item_name = 'SomeName'
WHERE 
    store.price > 500 
    AND store.quantity BETWEEN 0 AND 100;

Use BETWEEN it's better.

Also, add indexes on :

  • store.item_id
  • item.item_id
  • supplier.item_name

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.