Performance: Using indexing and partitioning (PostgreSQL)

Question

I have a fairly simple database model. My table "main" looks like this:

| id (PK) | device_id (int) | msg_type (int) | rawdata (text) | timestamp (date+time) |

Therefore each received message is stored within this table, including the message type, timestamp, the device which sent it and the rawdata.

In addition for each possible msg_type (in total approx. 30) I have a separate table storing the parsed raw data. Example for the table "main_type1":

| id (PK) | main_id (FK) | device_id (int) | attribute_1 | attribute_2 | attribute_n |

(Structure differs for each msg_type and the messages are not equally distributed meaning some tables are hugh some tables are small).

Please note that the device_id is always included within the rawdata, so each table has this column.

Now to my problem:

I used to have queries such as:

select attribute_1, attribute_2 from main_type1 inner join main on main_type1.main_id = main.id where timestamp > X and timestamp < Y and main.device_id = Z

At the beginning everything was sufficient and also fast. But now my database has more than 400.000.000 entries in "main". Queries are taking up to 15 minutes now.

Indexing

I tried to use indexing such as:

CREATE INDEX device_id_index ON main (device_id);

Well, now I can retreive data much faster from the main table, but it does not help with joins. My biggest problem here is that I stored the timestamp information only in the main table. So I have to join all the time... is this a general failure of my database model? I tried to avoid storing timestamps twice.

Partitioning

Would one solution be to create a new table with rawdata for each device_id by using partitioning? I would then (of course automatically) create appropriate partitions such as:

main_device_id_343223
main_device_id_4563
main_device_id_92338
main_device_id_4142315

Would this give me speed advantages related to the joins? What other options do I have? For the sake of completeness: I am using PostgreSQL

Partitioning is not a query performance feature, it usually makes query perf worse compared to using a good indexing strategy. — usr
– usr, Commented Jun 27, 2015 at 17:01
Why are you indexing on device when your query does not mention device at all? — usr
– usr, Commented Jun 27, 2015 at 17:02
@usr: You are right, I added the missing id to the query. Of course I am trying to get data for a specific device. Thanks for pointing that out! — Anonymous
– Anonymous, Commented Jun 27, 2015 at 17:04
Have you tried creating the ideal indexes for this particular query yet? One on each table. Report the perf numbers for that configuration. — usr
– usr, Commented Jun 27, 2015 at 17:06
What do you mean by "prefect indexes"? Instructions are a bit unclear, could you give me more information about which indexes I should use? I would then report performance rates — Anonymous
– Anonymous, Commented Jun 27, 2015 at 17:08

Community · Accepted Answer · 2017-05-23 12:15:34Z

2

Since your problem is the time of execution of a join, the first thing to do is try to speed up the query by creating indexes in the following way:

Indexes that help the join itself, in this case an index on the foreign key main.id in main_type1 (note that a foreign key declaration does not automatically create an index):
```
CREATE INDEX main_type_main_id_index ON main_type1(main_id);
```
Indexes that help in restricting the set of data considered by the query, in this case on the timestamp attribute:
```
CREATE INDEX main_timestamp_index ON main(timestamp);
```

You can also consider the possibility of creating a Partial Index for the attribute timestamp, if your queries only look for specific subset of the values.

If these indexes do not speed up the query in a significant way, then you should follow the answer of @klin.

edited May 23, 2017 at 12:15

CommunityBot

11 silver badge

answered Jun 27, 2015 at 17:04

Renzo

27.6k5 gold badges54 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ram Over a year ago

This looks more like a comment than an answer. If its an answer expand it or add more details else delete it and add it as a comment to the question.

Renzo Over a year ago

Thank for the comment @Ram, I expanded the query since I think it could be a solution for the question.

Ram Over a year ago

Thanks for expanding the answer. I edited it to improve it.

Anonymous Over a year ago

Thanks to Renzo and klin. Indexing helped improving the performance, queries now take approx. 1/10 of the original time. However, this is still too long. For that reason I will also try using partitioning in the near future

Renzo Over a year ago

Ok, but I'll give also a possibility to partial indexes in the case in which the timestamp that you are using more frequently are a (relatively) small subset of all the timestamps (for instance if you are making frequent queries starting from a certain time_.

klin · Accepted Answer · 2015-06-27 17:53:20Z

I would suggest the scenario: first, create indexes proposed by Renzo. If that does not improve performance enough, try using partitions.

From the documentation:

Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. (...)

If you use partitioning all queries containing references to a specific device (such as in your question) will be much faster. Only those queries that will apply to many device_id (e.g. containg aggregates) may be slower.

Collectives™ on Stack Overflow

Performance: Using indexing and partitioning (PostgreSQL)

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related