7

Need help regarding performance of a query in PostgreSQL. It seems to relate to the indexes.

This query:

  • Filters according to type
  • Orders by timestamp, ascending:

SELECT * FROM the_table WHERE type = 'some_type' ORDER BY timestamp LIMIT 20

The Indexes:

 CREATE INDEX the_table_timestamp_index ON the_table(timestamp);

 CREATE INDEX the_table_type_index ON the_table(type);

The values of the type field are only ever one of about 11 different strings.
The problem is that the query seems to execute in O(log n) time, taking only a few milliseconds most times except for some values of type which take on the order of several minutes to run.

In these example queries, the first takes only a few milliseconds to run while the second takes over 30 minutes:

SELECT * FROM the_table WHERE type = 'goq' ORDER BY timestamp LIMIT 20
SELECT * FROM the_table WHERE type = 'csp' ORDER BY timestamp LIMIT 20

I suspect, with about 90% certainty, that the indexes we have are not the right ones. I think, after reading this similar question about index performance, that most likely what we need is a composite index, over type and timestamp.

The query plans that I have run are here:

  1. Expected performance, type-specific index (i.e. new index with the type = 'csq' in the WHERE clause).
  2. Slowest, problematic case, indexes as described above.
  3. Fast case, same indexes as above.

Thanks very much for your help! Any pointers will be really appreciated!

1
  • What is the size of the indexes? And the size of the dataset? Commented Jan 31, 2013 at 21:12

2 Answers 2

2

The indexes can be used either for the where clause or the order by clause. With the index thetable(type, timestamp), then the same index can be used for both.

My guess is that Postgres is deciding which index to use based on statistics it gathers. When it uses the index for the where and then attempts a sort, you get really bad performance.

This is just a guess, but it is worth creating the above index to see if that fixes the performance problems.

Sign up to request clarification or add additional context in comments.

Comments

2

The explain outputs all use the timestamp index. That is probably because the cardinality of the type column is too low so a scan on an index on that column is as expensive as a table scan.

The composite index to be created should be:

create index comp_index on the_table ("timestamp", type)

In that order.

2 Comments

Awesome! So there's a difference with the order of the columns in the index?
@JuanCarlosCoto . . . In fact, the order does make a difference. By putting timestamp first, the engine cannot use the index for the where clause. The various types will be scattered throughout the index.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.