3

I am trying to query some information from certain large data on connections among a set of clients and servers. Below are sample data from relevant columns in the table (connection_stats):

+---------------------------------------------------------+
|   timestamp         | client_id | server_id |  status   | 
+---------------------------------------------------------+
| 2013-07-06 10:40:30 |   100     |   800     |  SUCCESS  |
+---------------------------------------------------------+
| 2013-07-06 10:40:50 |   101     |   801     |  FAILED   |
+---------------------------------------------------------+
| 2013-07-06 10:42:00 |   100     |   800     |  ABORTED  |
+---------------------------------------------------------+
| 2013-07-06 10:43:30 |   100     |   801     |  SUCCESS  |
+---------------------------------------------------------+
| 2013-07-06 10:56:00 |   100     |   800     |  FAILED   |
+---------------------------------------------------------+

From this table, I am trying to query all instances of the connection status "ABORTED" immediately followed (in the order of timestamp) by connection status "FAILED", for each client_id, server_id pair. I would like to get both the records - the one with status "ABORTED" and that with status "FAILED". There is one such case in the data sample above - for the pair 100, 800, there is a "FAILED" status immediately after "ABORTED".

I am a novice in SQL and databases and I am completely lost on this one. Any pointers to how to approach this will be much appreciated.

The database is mysql.

1
  • This is one of the harder kinds of things to do in SQL. You need to start with a self-join of the table. I hope that you will, as a SQL beginner, take the trouble to understand the solutions that you get from StackOverflow participants, rather than just plugging one of them in to your application software. Commented Jul 6, 2013 at 11:15

7 Answers 7

2

Admittedly not very elegant, but what I can come up with straight off the bat that works with MySQL that does not have CTEs or ranking functions, and without a guaranteed unique row id to work with.

SELECT aborted.* FROM Table1 aborted JOIN Table1 failed
  ON aborted.server_id = failed.server_id 
 AND aborted.client_id = failed.client_id
 AND aborted.timestamp < failed.timestamp
LEFT JOIN Table1 filler
  ON filler.server_id = aborted.server_id
 AND filler.client_id = aborted.client_id
 AND aborted.timestamp < filler.timestamp
 AND filler.timestamp < failed.timestamp
WHERE filler.timestamp IS NULL
  AND aborted.status = 'ABORTED' AND failed.status = 'FAILED'
UNION
SELECT failed.* FROM Table1 aborted JOIN Table1 failed
  ON aborted.server_id = failed.server_id
 AND aborted.client_id = failed.client_id
 AND aborted.timestamp < failed.timestamp
LEFT JOIN Table1 filler
  ON filler.server_id = aborted.server_id
 AND filler.client_id = aborted.client_id
 AND aborted.timestamp < filler.timestamp
 AND filler.timestamp < failed.timestamp
WHERE filler.timestamp IS NULL
  AND aborted.status = 'ABORTED' AND failed.status = 'FAILED'

An SQLfiddle to test with.

If you're happy with just one row with both records summarized, you can just select the fields you want from aborted/failed and skip the entire second half of the union (ie the query will be cut in half)

Since I got comments on the UNION, here's the same thing using a JOIN, assuming the timestamp is unique per client/server combination (a unique row id would help here);

SELECT * FROM Table1 t JOIN
(
 SELECT 
   aborted.server_id asid, aborted.client_id acid, aborted.timestamp ats,
    failed.server_id fsid,  failed.client_id fcid,  failed.timestamp fts
 FROM Table1 aborted JOIN Table1 failed
   ON aborted.server_id = failed.server_id
  AND aborted.client_id = failed.client_id
  AND aborted.timestamp < failed.timestamp
 LEFT JOIN Table1 filler
   ON filler.server_id = aborted.server_id
  AND filler.client_id = aborted.client_id
  AND aborted.timestamp < filler.timestamp
  AND filler.timestamp < failed.timestamp
 WHERE filler.timestamp IS NULL
   AND aborted.status = 'ABORTED' AND failed.status = 'FAILED'
) u
WHERE t.server_id=asid AND t.client_id=acid AND t.timestamp=ats
   OR t.server_id=fsid AND t.client_id=fcid AND t.timestamp=fts
ORDER BY timestamp

An SQLfiddle to test with.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! That's way too much of SQL for me to handle :). The sqlfiddle test gives me what I was looking for. Will test it on the real db and see. Would have been better if the output rows were sorted by timestamp, but this one would do too.
@soorajmr Sadly, MySQL lacks some functionality to make this query simpler. I agree on it really being too much SQL for such a "simple" query, but MySQL has some catching up to do :)
@wildplasser To get both complete rows as separate rows as the question requests, that's the most straight forward way I can see. Any alternate solutions welcome :)
1

I'm answring this question (albeit late) because I want to offer a more general approach. MySQL does not have a lag() or lead() function, but you can implement it using a subquery. The idea is to lookup the next timestamp for the client_id/server_id pair and then join back to the original data to get the full record. This allows you to pull as many records as you want from the "next" record. It also allows you to consider more complicated relationships (say, the "fail" has to be within 3 minutes):

select cs.*, csnext.timestamp as nextTimeStamp, csnext.status as nextStatus
from (select cs.*,
             (select timestamp
              from connection_stats cs2
              where cs2.client_id = cs.client_id and
                    cs2.server_id = cs.server_id and
                    cs2.timestamp > cs.timestamp
              order by cs2.timestamp
              limit 1
             ) as Nexttimestamp
      from connection_stats cs
     ) cs join
     connection_stats csnext
     on csnext.client_id = cs.client_id and
        csnext.server_id = cs.server_id and
        csnext.timestamp = cs.nexttimestamp
where cs.status = 'ABORTED' and
      csnext.status = 'FAILED'

The performance of such a query is greatly improved by having an index on connection_stats(client_id, server_id, timestamp).

Comments

0

select * from table t1, table t2 where t1.server_id = t2.server_id and t1.status = 'ABORTED' and t2= 'FAILED'

1 Comment

That query disregards timestamp completely.
0

Not quite elegant, but should be working. Based on GROUP_CONCAT():

Demo

SELECT client_id,server_id,GROUP_CONCAT(status) as all_statuses
FROM   statuses
GROUP  BY client_id,server_id
HAVING all_statuses LIKE '%ABORTED,FAILED%'
ORDER  BY timestamp

2 Comments

Hmm... I think I kind of get what you are suggesting. But for every match (aborted, failed sequence), I would like to get two complete rows - one with status ABORT and one with status FAILED - as output. This is because there are several other columns in these rows.
I see, I would then recommend Joachim Isaksson's answer
0

I don't have a MySQL DB to test with, but you might give something like this a shot. May need to add some group by columns.

SELECT aborted.*, failed.*
FROM connection_stats aborted
INNER JOIN connection_status nexterror ON aborted.client_id = nexterror.client_id AND nexterror.timestamp > aborted.timestamp
INNER JOIN connection_status failed ON aborted.client_id = failed.client_id AND failed.STATUS = 'FAILED' AND failed.timestamp = MIN(nexterror.timestamp)
WHERE aborted.STATUS = 'ABORTED'

Comments

0

You can group the statuses and can match according to the sequence

SELECT client_id,server_id,GROUP_CONCAT(status) as abort_fail
FROM   `table`    
GROUP  BY client_id,server_id
HAVING abort_fail ='ABORTED,FAILED'
ORDER  BY `timestamp` DESC

Now using GROUP_CONCAT keep in mind there is character limit for 1000 characters so you should take care of it

Comments

0
SELECT t0.clientid, t0.serverid
        , t0.logtime AS abort_time
        , t1.logtime AS fail_time
FROM tmp t0
JOIN tmp t1 ON t1.clientid = t0.clientid AND t1.serverid = t0.serverid
        -- t1 after t0
        AND t1.logtime > t0.logtime
WHERE t0. status = 'ABORTED'
AND t1. status = 'FAILED'
        -- no records inbetween 'aborted' and 'failed'
        -- (not even different 'aborted' and 'failed' records)
AND NOT EXISTS (
        SELECT *
        FROM tmp x
        WHERE x.clientid = t0.clientid AND x.serverid = t0.serverid
        AND x.logtime > t0.logtime
        AND x.logtime < t1.logtime
        )
        ;

UPDATE: if you want to retrieve the two records not joined, but as separate records, you could do:

SELECT t0.*
FROM tmp t0
JOIN (
        SELECT t1.clientid, t1.serverid
        , t1.logtime AS abort_time
        , t2.logtime AS fail_time
        FROM tmp t1
        JOIN tmp t2 ON t2.clientid = t1.clientid AND t2.serverid = t1.serverid
                -- t2 after t1
                AND t2.logtime > t1.logtime
        WHERE t1. status = 'ABORTED'
        AND t2. status = 'FAILED'
                -- no records inbetween 'aborted' and 'failed'
                -- (not even different 'aborted' and 'failed' records)
        AND NOT EXISTS (
                SELECT *
                FROM tmp x
                WHERE x.clientid = t1.clientid AND x.serverid = t1.serverid
                AND x.logtime > t1.logtime
                AND x.LOGTIME < t2.logtime
                )
        ) two ON two.clientid = t0.clientid AND two.serverid = t0.serverid
                AND (two.abort_time = t0.logtime OR two.fail_time = t0.logtime)
        ;

, Or the same rewritten as an EXISTS clause, which is sometimes a bit cleaner, since the t1,t2 tables do not leak into the outer query:

SELECT *
FROM tmp t0
WHERE EXISTS (
        SELECT *
        FROM tmp t1
        JOIN tmp t2 ON t2.clientid = t1.clientid AND t2.serverid = t1.serverid
                -- t2 after t1
                AND t2.logtime > t1.logtime
        WHERE t1. status = 'ABORTED'
        AND t2. status = 'FAILED'
        AND t1.clientid = t0.clientid AND t1.serverid = t0.serverid
        AND t1.logtime = t0.logtime OR t2.logtime = t0.logtime
                -- no records inbetween 'aborted' and 'failed'
                -- (not even different 'aborted' and 'failed' records)
        AND NOT EXISTS (
                SELECT *
                FROM tmp x
                WHERE x.clientid = t1.clientid AND x.serverid = t1.serverid
                AND x.logtime > t1.logtime
                AND x.LOGTIME < t2.logtime
                )
                )
        ;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.