I have a fairly simple database model. My table "main" looks like this:
| id (PK) | device_id (int) | msg_type (int) | rawdata (text) | timestamp (date+time) |
Therefore each received message is stored within this table, including the message type, timestamp, the device which sent it and the rawdata.
In addition for each possible msg_type (in total approx. 30) I have a separate table storing the parsed raw data. Example for the table "main_type1":
| id (PK) | main_id (FK) | device_id (int) | attribute_1 | attribute_2 | attribute_n |
(Structure differs for each msg_type and the messages are not equally distributed meaning some tables are hugh some tables are small).
Please note that the device_id is always included within the rawdata, so each table has this column.
Now to my problem:
I used to have queries such as:
select attribute_1, attribute_2 from main_type1 inner join main on main_type1.main_id = main.id where timestamp > X and timestamp < Y and main.device_id = Z
At the beginning everything was sufficient and also fast. But now my database has more than 400.000.000 entries in "main". Queries are taking up to 15 minutes now.
Indexing
I tried to use indexing such as:
CREATE INDEX device_id_index ON main (device_id);
Well, now I can retreive data much faster from the main table, but it does not help with joins. My biggest problem here is that I stored the timestamp information only in the main table. So I have to join all the time... is this a general failure of my database model? I tried to avoid storing timestamps twice.
Partitioning
Would one solution be to create a new table with rawdata for each device_id by using partitioning? I would then (of course automatically) create appropriate partitions such as:
main_device_id_343223
main_device_id_4563
main_device_id_92338
main_device_id_4142315
Would this give me speed advantages related to the joins? What other options do I have? For the sake of completeness: I am using PostgreSQL