2

I have the following query:

SELECT a.id, a.col2, b.id, b.col2, c.id, c.col2
FROM a 
  JOIN b on b.fk_a_id = a.id 
  JOIN c on c.fk_a_id = a.id
  INNER JOIN d on d.fk_c_id = c.id 
WHERE a.owner_id NOT IN (1, 3, 100, 41)
GROUP BY a.id, b.id, c.id 
ORDER BY a.created_date desc
LIMIT __ OFFSET __

Indexes on: a.id, a.owner_id, b.id, c.id

However, none of my indexes are being used in this query. I have another similar query with one extra table join that is using indexes as I expect. Any ideas as to why this query is not making use of the indexes?

edit to include explain:

"Limit  (cost=7.88..7.89 rows=4 width=243) (actual time=175.824..175.825 rows=10 loops=1)" 
"  ->  Sort  (cost=7.88..7.89 rows=4 width=243) (actual time=175.822..175.822 rows=10 loops=1)" 
"        Sort Key: a.created_date DESC" 
"        Sort Method: quicksort  Memory: 27kB" 
"        ->  HashAggregate  (cost=7.78..7.84 rows=4 width=243) (actual time=175.771..175.778 rows=10 loops=1)" 
"              Group Key: a.id, b.id, c.id" 
"              ->  Hash Join  (cost=5.12..7.75 rows=4 width=243) (actual time=0.072..0.099 rows=20 loops=1)" 
"                    Hash Cond: (a.id = b.fk_a_id)" 
"                    ->  Hash Join  (cost=2.85..5.43 rows=4 width=163) (actual time=0.041..0.063 rows=20 loops=1)" 
"                          Hash Cond: (a.id = d.fk_a_id)" 
"                          ->  Seq Scan on table a  (cost=0.00..2.44 rows=27 width=126) (actual time=0.008..0.025 rows=28 loops=1)" 
"                                Filter: (owner_id <> ALL ('{1,3,100,41}'::bigint[]))" 
"                                Rows Removed by Filter: 1" 
"                          ->  Hash  (cost=2.76..2.76 rows=7 width=53) (actual time=0.027..0.027 rows=3 loops=1)" 
"                                Buckets: 1024  Batches: 1  Memory Usage: 9kB" 
"                                ->  Hash Join  (cost=1.16..2.76 rows=7 width=53) (actual time=0.019..0.023 rows=3 loops=1)" 
"                                      Hash Cond: (c.id = d.fk_c_id)" 
"                                      ->  Seq Scan on table c  (cost=0.00..1.39 rows=39 width=45) (actual time=0.002..0.004 rows=39 loops=1)" 
"                                      ->  Hash  (cost=1.07..1.07 rows=7 width=8) (actual time=0.007..0.007 rows=3 loops=1)" 
"                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                            ->  Seq Scan on table d  (cost=0.00..1.07 rows=7 width=8) (actual time=0.003..0.004 rows=3 loops=1)" 
"                    ->  Hash  (cost=2.12..2.12 rows=12 width=88) (actual time=0.022..0.022 rows=12 loops=1)" 
"                          Buckets: 1024  Batches: 1  Memory Usage: 9kB" 
"                          ->  Seq Scan on table b  (cost=0.00..2.12 rows=12 width=88) (actual time=0.005..0.013 rows=12 loops=1)"
"Planning time: 210.946 ms"
"Execution time: 175.987 ms"
7
  • EXPLAIN your query and post results. Commented Dec 16, 2016 at 17:05
  • 2
    Oh, I know why! The Postgres optimizer thinks that full-table scans are the most performant way to execute the query. Why it thinks that, I don't know. The optimizer has access to lots of information I don't have . . . the size of the tables, estimated distributions, the exact data types involved, and so on. Commented Dec 16, 2016 at 17:15
  • I have just edited to include EXPLAIN Commented Dec 16, 2016 at 17:25
  • BTW: the planning time seems rather large (not that it matters) Catalog bloat? Commented Dec 16, 2016 at 17:35
  • Just for comparison, what does the query and plan look like for the one that uses the indexes? Commented Dec 16, 2016 at 18:06

2 Answers 2

2

An index is used to (quickly) find "a few" rows in "many rows". It's not a magical silver bullet that makes everything go faster.

You don't have enough rows in your tables to make an index lookup efficient. And you are getting nearly all of them, not just a small part.

If you look at the plan you will see that the actual data retrieval never takes more then 0.1 ms. (that's a tenth of a millisecond). The Seq Scan for tables c and d only takes 0.004ms - no index will speed that up for only 4 rows.

Doing that through indexes with random I/O for just 20 or 30 rows would definitely be slower. Depending on the number of columns in the tables even the 39 rows of the "biggest" table are probably stored on a single block - which means to read all rows, the database only needs to do a single I/O operation when using a Seq Scan.

The slowest part of the plan is the HashAggregate not the reading of the data, so the choice to not use an index seems correct.

What kind of hardware is that? 175 milliseconds to aggregate 20 rows seems extremely slow - as is the planning time of 210ms. For such a simple statement, the planning time should be more like 1ms.

Sign up to request clarification or add additional context in comments.

Comments

0

There is ony one criteria on table A: WHERE a.owner_id NOT IN (1, 3, 100, 41). To me that sounds like "select all records except a few". It would be a lot of work to read all records via an index and still end up up with most of them. It's way faster to simply read the table and dismiss some few records on-the-fly.

Then with those many, many A records, we may match many, many B, C, and D records. Again, we'd better don't read indexes plus most of the table data, but simply put the data in buckets (hash join). That seems fastest.

So that the the optimizer chooses not to use indexes for your query seems like a good idea. It proves that it's doing a good job :-)

The only way I see to speed this up with indexes would be covering indexes, i.e. indexes containing all desired columns:

  • a(owner_id, id, created_date, col2)
  • b(fk_a_id, id, col2)
  • c(fk_a_id, id, col2)
  • d(fk_c_id)

Then we wouldn't have to read the indexes plus the tables, but the indexes alone.

BTW: As you don't select anything from D, you either want to check for existence, then you should use EXISTS or IN instead of a join for readability in my opinion. Or you don't; then you can dismiss it completely from your query.

3 Comments

I am very new with this. I performed a VACUUM on a local database that I was testing on and the index are now being used?
Good point. Over time an index can get bloated and therefore PostgreSQL can decide that they are less efficient than just using another index or the table itself (at least that seems that was what happened to us). We fixed this with: reindex index your_index_name you may want to add concurrently. Autovacuum does not seem to avoid this, vacuum full on the table (or database) will also rebuild the indexes.
(Gory details of our issue: PostgreSQL 11.x, btree-, covering-, partial-index)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.