0

I have three tables, with the following setup:

Ph
  ID
  time (in the format: datetime)
  Ph

Total suspended solids
  ID
  time (in the format: datetime)
  solids

TEMPERATURE
  ID
  time (in the format: datetime)
  temp

the three values being, temperature, ph and solids are updated seconds apart from each other as per sample below,

ID          time                    temp
123335  2013-03-29 21:34:05         26.56
123336  2013-03-29 21:35:05         26.56
123337  2013-03-29 21:36:05         26.56

ID          time                    solids
113886  2013-03-29 21:34:19         146
113887  2013-03-29 21:35:19         146
113888  2013-03-29 21:36:19         146

ID          time                    Ph
62147   2013-03-29 21:34:04         7.39
62148   2013-03-29 21:35:04         7.43
62149   2013-03-29 21:36:04         7.43

I would like to query mysql to obtain the result below (note I have omitted the seconds from the table)

time                temp    solids  Ph
2013-03-29 21:34    26.56   146     7.39
2013-03-29 21:35    26.56   146     7.43
2013-03-29 21:36    26.56   146     7.43

Please can you help me with the query?

2
  • So the relationship is on yy/mm/dd h/i ? Commented Mar 29, 2013 at 10:54
  • yes the relationship is the time Commented Mar 29, 2013 at 10:59

3 Answers 3

1

Ok, the query optimizer seems to have some problems with my previous answer with large tables. Try this solution instead, it works with dependent subqueries:

SELECT DATE_FORMAT(timeTable.minuteTime, '%Y-%m-%d %k:%i') time,
T2.temp temp,
S2.solids solids,
P2.Ph Ph
FROM
(
    SELECT minuteTime.minuteTime minuteTime,
    ( SELECT MAX(time) FROM temperature WHERE time <= minuteTime.minuteTime AND time >= NOW() - INTERVAL 1 HOUR) tempTime, 
    ( SELECT MAX(time) FROM ph WHERE time <= minuteTime.minuteTime AND time >= NOW() - INTERVAL 1 HOUR) phTime,  
    ( SELECT MAX(time) FROM solids WHERE time <= minuteTime.minuteTime AND time >= NOW() - INTERVAL 1 HOUR) solidsTime
    FROM  
    (
        SELECT time + INTERVAL 59 - SECOND( time ) SECOND minuteTime
        FROM Ph
        WHERE time >= NOW() - INTERVAL 1 HOUR AND time <= NOW()
        UNION SELECT time + INTERVAL 59 - SECOND( time ) SECOND
        FROM solids
        WHERE time >= NOW() - INTERVAL 1 HOUR AND time <= NOW()
        UNION SELECT time + INTERVAL 59 - SECOND( time ) SECOND
        FROM temperature
        WHERE time >= NOW() - INTERVAL 1 HOUR AND time <= NOW()
        GROUP BY 1
    ) minuteTime
) timeTable
LEFT JOIN temperature T2 ON T2.time = timeTable.tempTime
LEFT JOIN solids S2 ON S2.time = timeTable.solidsTime
LEFT JOIN ph P2 ON P2.time = timeTable.phTime
ORDER BY minuteTime ASC

I've setup the three tables with about 800'000 rows of test data, each. On MySQL 5.5.30, the query above runs in about 3.5 seconds and returns 61 result rows. But you absolutely have to have indexes on each of the time columns:

ALTER TABLE `ph` ADD INDEX ( `time` ) ;
ALTER TABLE `solids` ADD INDEX ( `time` ) ;
ALTER TABLE `temperature` ADD INDEX ( `time` ) ;

Otherwise, the query won't terminate. I believe no query handling this amount of data will.

Sign up to request clarification or add additional context in comments.

5 Comments

you son of a Legend, thanks. I really didn't think i will solve this one. and there you go. Thank you
any change I can get this filter below to get values every 4 hours for last 3 days, rather every minute for last hour. where HOUR(time) % 4 = 0 and DATE(time) >= CURDATE()-5 group by hour(time), date(time)
For the new filter that you suggested, I'd replace all 6 occurences of "time >= NOW() - INTERVAL 1 HOUR" by "time >= NOW() - INTERVAL 3 DAY" and all 3 occurences of "SELECT time + INTERVAL 59 - SECOND( time ) SECOND" by "SELECT DATE(time) + INTERVAL (HOUR(time) DIV 4 * 4) HOUR". That should do the trick of getting the measurement times right. A quick check showed that the query ran in about 9.8 secs.
Hey my pleasure, I'm glad to help. Thanks to you for posing such a challenging question ;-)
Thnx Marcellus. VERY MUCH :)
1

There may be a more efficient solution but something like this should work...

DROP TABLE IF EXISTS ph;
CREATE TABLE Ph(time datetime PRIMARY KEY, Ph DECIMAL(5,2) NOT NULL);

DROP TABLE IF EXISTS tss;
CREATE TABLE tss(time datetime PRIMARY KEY,solids int NOT NULL);

DROP TABLE IF EXISTS temperature;
CREATE TABLE temperature(time datetime NOT NULL PRIMARY KEY,temp DECIMAL(5,2) NOT NULL);

INSERT INTO temperature VALUES 
('2013-03-29 21:34:05',26.56),
('2013-03-29 21:35:05',26.56),
('2013-03-29 21:36:05',26.56);

INSERT INTO tss VALUES 
('2013-03-29 21:34:19',146),
('2013-03-29 21:35:19',146),
('2013-03-29 21:36:19',146);

INSERT INTO ph VALUES
('2013-03-29 21:34:04',7.39),
('2013-03-29 21:35:04',7.43),
('2013-03-29 21:36:04',7.43);

SELECT * 
  FROM ph 
  JOIN tss 
    ON DATE_FORMAT(tss.time,'%Y%m%d%h%i') = DATE_FORMAT(ph.time,'%Y%m%d%h%i') 
  JOIN temperature t 
    ON DATE_FORMAT(t.time,'%Y%m%d%h%i') = DATE_FORMAT(ph.time,'%Y%m%d%h%i');
+---------------------+------+---------------------+--------+---------------------+-------+
| time                | Ph   | time                | solids | time                | temp  |
+---------------------+------+---------------------+--------+---------------------+-------+
| 2013-03-29 21:34:04 | 7.39 | 2013-03-29 21:34:19 |    146 | 2013-03-29 21:34:05 | 26.56 |
| 2013-03-29 21:35:04 | 7.43 | 2013-03-29 21:35:19 |    146 | 2013-03-29 21:35:05 | 26.56 |
| 2013-03-29 21:36:04 | 7.43 | 2013-03-29 21:36:19 |    146 | 2013-03-29 21:36:05 | 26.56 |
+---------------------+------+---------------------+--------+---------------------+-------+

CREATE TABLE normalised AS
SELECT ph.time
     , ph.ph
     , tss.solids
     , t.temp 
  FROM ph 
  JOIN tss 
    ON DATE_FORMAT(tss.time,'%Y%m%d%h%i') = DATE_FORMAT(ph.time,'%Y%m%d%h%i') 
  JOIN temperature t 
    ON DATE_FORMAT(t.time,'%Y%m%d%h%i') = DATE_FORMAT(ph.time,'%Y%m%d%h%i');

SELECT * FROM normalised;
+---------------------+------+--------+-------+
| time                | ph   | solids | temp  |
+---------------------+------+--------+-------+
| 2013-03-29 21:34:04 | 7.39 |    146 | 26.56 |
| 2013-03-29 21:35:04 | 7.43 |    146 | 26.56 |
| 2013-03-29 21:36:04 | 7.43 |    146 | 26.56 |
+---------------------+------+--------+-------+

ALTER TABLE normalised ADD PRIMARY KEY(time);

3 Comments

Thanks, but all i want is a query, i do not wish to create any tables to get the result, is there another way
Surely you can figure that out for yourself.
hehehe, thatnks, i am trying as we speak
0

I might have a solution that works without creating temporary tables:

SELECT DATE_FORMAT(timeTable.minuteTime, '%Y-%m-%d %k:%i') time,
T2.temp temp,
S2.solids solids,
P2.Ph Ph
FROM
(
SELECT MinuteTable.minuteTime minuteTime, 
MAX(T.time) tempTime, 
MAX(S.time) solidsTime, 
MAX(P.time) phTime
FROM  
(
SELECT time + INTERVAL 59 - SECOND( time ) SECOND minuteTime
FROM Ph
WHERE time >= NOW() - INTERVAL 1 HOUR
UNION SELECT time + INTERVAL 59 - SECOND( time ) SECOND
FROM solids
WHERE time >= NOW() - INTERVAL 1 HOUR
UNION SELECT time + INTERVAL 59 - SECOND( time ) SECOND
FROM temperature
WHERE time >= NOW() - INTERVAL 1 HOUR
GROUP BY 1
) MinuteTable
LEFT JOIN temperature T ON T.time <= MinuteTable.minuteTime
LEFT JOIN solids S ON S.time <= MinuteTable.minuteTime
LEFT JOIN ph P ON P.time <= MinuteTable.minuteTime
GROUP BY 1
) timeTable
LEFT JOIN temperature T2 ON T2.time = timeTable.tempTime
LEFT JOIN solids S2 ON S2.time = timeTable.solidsTime
LEFT JOIN ph P2 ON P2.time = timeTable.phTime
ORDER BY minuteTime ASC

In order to make this work efficiently with tables with large numbers of rows, you definitely would want to have a UNIQUE INDEX over the time column of each of the three input tables. Even better, if you don't really need the ID column apart from generating auto_increment ID's, you could even make the time columns the PRIMARY KEY.

5 Comments

my tables are huge, and the query takes forever!
how can i limit this query to the last hour
something is wrong, as it keep searching, and connection is dropped
I've edited the answer, please try again. It should run better now. And you absolutely have to have INDEXes over the time columns on all of the tables, otherwise the query won't be anywhere near efficient.
still getting connection dropped

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.