1

I have created in PostgreSQL a table partitioned (see here) by received column. Let's use a toy example:

CREATE TABLE measurement (
    received        timestamp without timezone PRIMARY KEY,
    city_id         int not null,
    peaktemp        int,
    unitsales       int
);

I have created one partition for each month for several years (measurement_y2012m01 ... measurement_y2016m03).

I have noticed that postgresql is not aware of the order of the partitions, so for a query like below:

select * from measurement where ... order by received desc limit 1000;

postgresql performs index scan over all partitions, even though it is very likely that the first 1000 results are located in the latest partition (or the first two or three).

Do you have an idea how to take advantage of partitions for such query? I want to emphasize that where clause may vary, I don't want to hardcode it.

The first idea is to iterate partitions in a proper order until 1000 records are fetched or all partitions are visited. But how to implement it in a flexible way? I want to avoid implementing the aforementioned iteration in the application, but I don't mind if the app needs to call a stored procedure.

Thanks in advance for your help!

Grzegorz

2 Answers 2

1

If you really don't know how many partitions to scan to get your desired 1000 rows in the output you could build up your resultset in a stored procedure and fetch results iterating over partitions until your limit condition is satisfied.

Starting with the most recent partition would be a wise thing to do.

select * from measurement_y2016m03 where ... order by received desc limit 1000;

You could store the immediate resultset in a record and issue a count over it and change the limit dynamically for the next scanned partition, so that if you fetch for example 870 rows in first partition, you could build up a second query with limit 130 and then perform count once again after that and increase the counter if it still doesn't satisfy your 1000 rows condition.

Why Postgres doesn't know when to stop during planning?

Planner is unaware of how many partitions are needed to satisfy your LIMIT clause. Thus, it has to order the entire set by appending results from each partition and then perform a limit (unless it already satisfies this condition during run time). The only way to do this in an SQL statement would be to restrict the lookup only to a few partitions - but that may not be the case for you. Also, increasing work_mem setting may speed things up for you if you're hitting disk during lookups.

Key note

Also, a thing to remember is that when you setup your partitioning, you should have a descending order of mostly accessed partitions. This would speed up your inserts, because Postgres checks conditions one by one and stops on first that satisfies.

Sign up to request clarification or add additional context in comments.

Comments

1

Instead of iterating the partitions, you could guess at the range of received that will satisfy your query and expand it until you get the desired number of rows. Adding the range to WHERE will exclude the unnecessary partitions (assuming you have exclusion constraints set).

Edit

Correct, that's what I meant (could've phrased it better).

Simplicity seems like a pretty reasonable advantage. I don't see the performance being different, either way. This might actually be a little more efficient if you guess reasonably close to the desired range most of the time, but probably won't make a significant difference.

It's also a little more flexible, since you're not relying on the particular partitioning scheme in your query code.

2 Comments

This is more or less the same as iterating partitions but less effective in scenario when searching more than one partition is required. The only advantage is simplicity :) I need a good balance between performance and complexity (performance is more important for me).
Probably I misunderstood your answer at first. I guess you meant that instead of iterating over I should iterate over ranges and join the results of consecutive queries until limit is reached?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.