This is a follow up of another question here on SO.
I have this two database tables (more tables omitted):
acquisitions (acq)
id {PK}
id_cu {FK}
datetime
{ Unique Constraint: id_cu - datetime }
data
id {PK}
id_acq {FK acquisitions}
id_meas
id_elab
value
Every possible id and datetimeare all indexed.
Granted I will not change db structure I need to extract data this way:
- rows grouped by datetime
- each column corresponding
data.valuefor selectedacq.id_cu - data.id_meas - data.id_elabcombination. (see note on bottom of post) - allow empty cells if data is missing for some column but existing for others in a datetime
My current query is built this way (see SO question):
SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
UNION
SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
UNION
SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
) AS T
WHERE datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
Here for retrieving just 3 columns but, as I said, columns are frequently more than 50.
It works flawlessly but I'd like to know if it can be optimized in speed.
This is MySQL EXPLAIN EXTENDED for the query above:
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 82466 | 100.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 18011 | 100.00 | |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | ix_acquisitions_id_cu | 4 | | 20864 | 100.00 | |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 31848 | 100.00 | |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
8 rows in set, 1 warning (8.24 sec)
Currently with (edit: checked today) 390k acquisitions and 9.2M data values (and growing) it takes about 10 minutes to extract a table of 59 columns. I'm aware the precedent software took up to 1 hour to extract data.
Thanks for your patience reading till here :)
Update
After Denis answer I tried his changes 1. and 2., this is the result of the new query:
SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
) AS T GROUP BY datetime
and here the new EXPLAIN EXTENDED:
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.01 sec)
a good gain in performace with no doubt
Update (2)
This comes adding point 3.
EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
) AS T GROUP BY datetime;
and this is the result of EXPLAIN EXTENDED
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.06 sec)
Just a little slower, is this supposed to benefit from large number of coulmns? I'll try it...
Update (3)
I tried with and without MAX(data.value)... GROUP BY datetime and, on a 60 columns query I get better results with. Results vary from try to try, this is one of them.
- original query 9m12.144s
- with Denis'
1.and2.4m6.597s - with Denis'
1.,2.and3.4m0.210s
It's about 57% less time required.
Update (4)
I tried Andiry solution but it is way slower than Denis optimization.
Tested on a 3 combinations/columns:
- unoptimized: 1m3s
- Denis' optimization: 1.7s
- Andiry's
CASE: 9.3s
I also tested on a 12 combinations/columns:
- unoptimized: untested
- Denis' optimization: 3.6s
- Andiry's
CASE: 13.7s
Moreover Andiry's solution also pulls in acquisition dates where there's no data for any of the selected combination but existing for others.
Immagine Control Unit 1 gets data every 30 minutes at :00 and :30 while Control Unit 2 at :15 and :45: I'll double the number of rows with NULL filled empty ones.
Note:
It's all about a sensors system: there are several control units (one for each id_cu) with many sensors each.
A single sensor is identified by a id_cu / id_meas couple and sends different elaborations for each measure, say MIN (id_elab=1), MAX (id_elab=2), AVERAGE (id_elab=3), INSTANT (id_elab=...) etc, one for each id_elab.
The user is free to pickup as many elaborations he wants, say:
- AVERAGE value (3) of sensor #3 of control unit #1 for a results column so
id_cu=1 / id_meas=3 / id_elab=3 - AVERAGE value (3) of sensor #5 of control unit #1 for a results column so
id_cu=1 / id_meas=5 / id_elab=3 - MIN value (1) of sensor #2 of control unit #4 for another column so
id_cu=4 / id_meas=2 / id_elab=1 - (put any valid
id_cu, id_meas, id_elabcombination) - ...
and so on, up to tens of choices...
Here's a partial DDL (irrelevant tables excluded):
CREATE TABLE acquisitions (
id INTEGER NOT NULL AUTO_INCREMENT,
id_cu INTEGER NOT NULL,
datetime DATETIME NOT NULL,
PRIMARY KEY (id),
UNIQUE (id_cu, datetime),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
CREATE TABLE data (
id INTEGER NOT NULL AUTO_INCREMENT,
id_acq INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
value FLOAT,
PRIMARY KEY (id),
FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
)
CREATE TABLE ctrl_units (
id INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE sensors (
id_cu INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
`desc` VARCHAR(80),
PRIMARY KEY (id_cu, id_meas),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
id_elabin notes, yes MIN, MAX refer to different ids, I'll post also DDL as soon as possible (not on server now).