3

We have 2 identical (double precision) columns on the same table with 2 identical indices running 2 identical queries. yet one runs nearly 10* quicker than the other. what's causing this?

1) SELECT MIN("reports"."longitude") AS min_id FROM "reports" WHERE (area2 = 18)

2) SELECT MIN("reports"."latitude") AS min_id FROM "reports" WHERE (area2 = 18)

1 runs in 28ms and 2 runs in >300ms

Here are the 'explains':
1)

Result  (cost=6.07..6.08 rows=1 width=0)"
InitPlan 1 (returns $0)"
  ->  Limit  (cost=0.00..6.07 rows=1 width=8)"
      ->  Index Scan using longitude on reports  (cost=0.00..139617.49 rows=22983 width=8)"
            Index Cond: (longitude IS NOT NULL)"
            Filter: (area2 = 18)"

2)

Result  (cost=5.95..5.96 rows=1 width=0)"
InitPlan 1 (returns $0)"
  ->  Limit  (cost=0.00..5.95 rows=1 width=8)"
      ->  Index Scan using latitude on reports  (cost=0.00..136754.07 rows=22983 width=8)"
            Index Cond: (latitude IS NOT NULL)"
            Filter: (area2 = 18)"

as requested here is the explain analyse output...

1)

Result  (cost=6.07..6.08 rows=1 width=0) (actual time=10.992..10.993 rows=1 loops=1)"
InitPlan 1 (returns $0)"
    ->  Limit  (cost=0.00..6.07 rows=1 width=8) (actual time=10.985..10.986 rows=1 loops=1)"
          ->  Index Scan using longitude on reports  (cost=0.00..139617.49 rows=22983 width=8) (actual time=10.983..10.983 rows=1 loops=1)"
                Index Cond: (longitude IS NOT NULL)"
                Filter: (area2 = 18)"
Total runtime: 11.033 ms"

2)

 Result  (cost=5.95..5.96 rows=1 width=0) (actual time=259.749..259.749 rows=1 loops=1)"
InitPlan 1 (returns $0)"
    ->  Limit  (cost=0.00..5.95 rows=1 width=8) (actual time=259.740..259.740 rows=1 loops=1)"
          ->  Index Scan using latitude on reports  (cost=0.00..136754.07 rows=22983 width=8) (actual time=259.739..259.739 rows=1 loops=1)"
                Index Cond: (latitude IS NOT NULL)"
                Filter: (area2 = 18)"
Total runtime: 259.789 ms"
---------------------

What is going on? How can I get the second query to behave properly and run quickly? Both setups are identical as far as I can tell.

3
  • Did you do a VACUUM FULL ANALYZE before? Did you do an EXPLAIN ANALYZE or just an EXPLAIN? Commented Jul 11, 2012 at 16:27
  • Is there a composite index involved? please show us the table definition, and the output of explain analyze Commented Jul 11, 2012 at 16:37
  • 1
    You really should include the exact table and index definition in a question like this. Which columns are NOT NULL? Index sizes? Does VACUUM FULL ANALYZE change anything? Does CLUSTER reports USING USING idx1 and CLUSTER reports USING USING idx2 change anything? (I would expect it does!) Commented Jul 11, 2012 at 18:45

2 Answers 2

3

First, there is no guarantee that indexes speed queries. Second, when doing performance considerations, you need to run each query multiple times. There is overhead for loading the index and loading pages into the cache that can affect the length of the queries.

I am not a specialist in Postgres, but on thinking about this, I'm not that surprised.

The query plan is looping through the index, finding the corresponding row that matches area2 = 18, and then hopefully stopping at the first one (it is using the index, so it can start at the lowest value and move upwards). This is speculation on how it is working; I don't know that Postgres is doing this exactly.

In any case, what is happening is that the area is much closer to the beginning of the longitude index than the beginning of the latitude index. So, it finds the first matching record there first. If this explanation is correct, it would suggest that the area is relatively west (lower longitude) and relatively north (higher latitude), compared to other things in the database.

By the way, assuming that there are lots of areas, you might get better results with an index on Area2.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the input - there actually is an index on area2 also - it just seems postgres is choosing not to make use of it...
2

You are getting an index scan, but the number of records examined depends on how far up the list you have to go to match the area2 condition.

Unless your area2 distribution is strange, to optimize this query you should put composite indices on (area2, latitude) and (area2, longitude). I suspect you will get <10 ms. PG may also be able to combine a separate index on area2 with the existing indices, in lieu of composite indices, using its Bitmap Heap Scan capabilities.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.