1

I have a fairly simple database model. My table "main" looks like this:

| id (PK) | device_id (int) | msg_type (int) | rawdata (text) | timestamp (date+time) |

Therefore each received message is stored within this table, including the message type, timestamp, the device which sent it and the rawdata.

In addition for each possible msg_type (in total approx. 30) I have a separate table storing the parsed raw data. Example for the table "main_type1":

| id (PK) | main_id (FK) | device_id (int) | attribute_1 | attribute_2 | attribute_n |

(Structure differs for each msg_type and the messages are not equally distributed meaning some tables are hugh some tables are small).

Please note that the device_id is always included within the rawdata, so each table has this column.

Now to my problem:

I used to have queries such as:

select attribute_1, attribute_2 from main_type1 inner join main on main_type1.main_id = main.id where timestamp > X and timestamp < Y and main.device_id = Z

At the beginning everything was sufficient and also fast. But now my database has more than 400.000.000 entries in "main". Queries are taking up to 15 minutes now.

Indexing

I tried to use indexing such as:

CREATE INDEX device_id_index ON main (device_id);

Well, now I can retreive data much faster from the main table, but it does not help with joins. My biggest problem here is that I stored the timestamp information only in the main table. So I have to join all the time... is this a general failure of my database model? I tried to avoid storing timestamps twice.

Partitioning

Would one solution be to create a new table with rawdata for each device_id by using partitioning? I would then (of course automatically) create appropriate partitions such as:

main_device_id_343223
main_device_id_4563
main_device_id_92338
main_device_id_4142315

Would this give me speed advantages related to the joins? What other options do I have? For the sake of completeness: I am using PostgreSQL

5
  • Partitioning is not a query performance feature, it usually makes query perf worse compared to using a good indexing strategy. Commented Jun 27, 2015 at 17:01
  • 1
    Why are you indexing on device when your query does not mention device at all? Commented Jun 27, 2015 at 17:02
  • @usr: You are right, I added the missing id to the query. Of course I am trying to get data for a specific device. Thanks for pointing that out! Commented Jun 27, 2015 at 17:04
  • Have you tried creating the ideal indexes for this particular query yet? One on each table. Report the perf numbers for that configuration. Commented Jun 27, 2015 at 17:06
  • What do you mean by "prefect indexes"? Instructions are a bit unclear, could you give me more information about which indexes I should use? I would then report performance rates Commented Jun 27, 2015 at 17:08

2 Answers 2

2

Since your problem is the time of execution of a join, the first thing to do is try to speed up the query by creating indexes in the following way:

  1. Indexes that help the join itself, in this case an index on the foreign key main.id in main_type1 (note that a foreign key declaration does not automatically create an index):

    CREATE INDEX main_type_main_id_index ON main_type1(main_id);
    
  2. Indexes that help in restricting the set of data considered by the query, in this case on the timestamp attribute:

    CREATE INDEX main_timestamp_index ON main(timestamp);
    

You can also consider the possibility of creating a Partial Index for the attribute timestamp, if your queries only look for specific subset of the values.

If these indexes do not speed up the query in a significant way, then you should follow the answer of @klin.

Sign up to request clarification or add additional context in comments.

5 Comments

This looks more like a comment than an answer. If its an answer expand it or add more details else delete it and add it as a comment to the question.
Thank for the comment @Ram, I expanded the query since I think it could be a solution for the question.
Thanks for expanding the answer. I edited it to improve it.
Thanks to Renzo and klin. Indexing helped improving the performance, queries now take approx. 1/10 of the original time. However, this is still too long. For that reason I will also try using partitioning in the near future
Ok, but I'll give also a possibility to partial indexes in the case in which the timestamp that you are using more frequently are a (relatively) small subset of all the timestamps (for instance if you are making frequent queries starting from a certain time_.
1

I would suggest the scenario: first, create indexes proposed by Renzo. If that does not improve performance enough, try using partitions.

From the documentation:

Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. (...)

If you use partitioning all queries containing references to a specific device (such as in your question) will be much faster. Only those queries that will apply to many device_id (e.g. containg aggregates) may be slower.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.