0

My goal is, for each PID, to select 2 records with test_sname values of 'want' and 'want2' that occur on the same entry_date. I do this for the first 5 entry_dates that include both test_snames.

This is my query for accomplishing this:

queryBuilder = 
"""select PID, test_sname, test_value, units, ref_range, entry_date from labs
   where PID=%s and (test_sname='want' or test_sname='want2') and entry_date in

   (select entry_date from labs where PID=%s and test_sname in ('want', 'want2')
   group by entry_date having count(*) = 2) 

   order by entry_date limit 10;""" % (pid, pid)

It works as expected when an entry_date has only two rows that contain a test_sname of 'want' or 'want2'.

PID      |test_sname  |test_value  |units    |entry_date
10000000 | want       |         343 | U/L     | 2008-01-01 01:01:01
10000000 | want2      |      984.34 |         | 2008-01-01 01:01:01
10000000 | NA1        |          56 | %       | 2008-01-01 01:01:01
10000000 | NA2        |         420 | mg/dL   | 2008-01-01 01:01:01
10000000 | NA2        |         420 | mg/dL   | 2008-01-02 01:01:01

10000000 | want       |         343 | U/L     | 2008-01-02 01:01:01
10000000 | want2      |      984.34 |         | 2008-01-02 01:01:01
10000000 | NA1        |          26 | %       | 2008-01-02 01:01:01
10000000 | NA2        |         410 | mg/dL   | 2008-01-02 01:01:01
10000000 | NA2        |         455 | mg/dL   | 2008-01-02 01:01:01

Results of Query (which are correct):

PID      |test_sname  |test_value  |units    |entry_date
10000000 | want       |         343 | U/L     | 2008-01-01 01:01:01
10000000 | want2      |      984.34 |         | 2008-01-01 01:01:01
10000000 | want       |         343 | U/L     | 2008-01-02 01:01:01
10000000 | want2      |      984.34 |         | 2008-01-02 01:01:01

The problem comes when, for instance, there are multiple rows from the test_sname of 'want' on the same entry_date, because the having count(*) = 2 is no longer valid. There are no results for data like this.

PID      |test_sname  |test_value  |units    |entry_date
11111111 | want       |         343 | U/L     | 2009-10-26 07:25:00
11111111 | want2      |      984.34 |         | 2009-10-26 07:25:00
11111111 | want       |        189 | U/L     | 2009-10-26 07:25:00
11111111 | NA1        |         50 | %       | 2009-10-26 07:25:00
11111111 | NA2        |         40 | mg/dL   | 2009-10-26 07:25:00
11111111 | NA3        |      84.55 |         | 2009-10-26 07:25:00
11111111 | NA4        |        4.5 | thou/uL | 2009-10-26 07:25:00
11111111 | NA5        |       14.6 | g/dL    | 2009-10-26 07:25:00
11111111 | NA6        |       0.96 | mg/dL   | 2009-10-26 07:25:00

11111111 | want       |         343 | U/L     | 2009-10-30 07:25:00
11111111 | want2      |      984.34 |         | 2009-10-30 07:25:00
11111111 | want       |        189 | U/L     | 2009-10-30 07:25:00
11111111 | NA1        |          6 | %       | 2009-10-30 07:25:00
11111111 | NA2        |         40 | mg/dL   | 2009-10-30 07:25:00
11111111 | NA3        |      84.55 |         | 2009-10-30 07:25:00
11111111 | NA4        |        4.5 | thou/uL | 2009-10-30 07:25:00
11111111 | NA5        |       14.6 | g/dL    | 2009-10-30 07:25:00
11111111 | NA6        |       0.96 | mg/dL   | 2009-10-30 07:25:00

As a restriction, I tried putting a limit 2 in the subquery (I know that by itself that won't fix the problem), but it gave this error, and I thought I had the most updated version of SQL, so apparently I can't use limit in the subquery.

This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

I realize there are multiple ways to fix this - I could select ALL the values and then programmatically take what I need with Python, but I'm looking for a mySQL query solution written using the Python mySQL-connector. I wouldn't complain about a python solution though.

I am using python v3.4.4 with mySQL-connector v2.1.3 and MySQL server v5.7.11

Thanks for your time!

1 Answer 1

1

Consider using a running count of your grouping via a subquery. Then, filter wherever RowNo is 1 or 2. In this way, you would not need to pass a parameter as all PIDs will be handled. Below assumes the labs table has a unique identifier, ID:

SELECT * 
FROM
   (SELECT PID, test_sname, test_value, units, ref_range, entry_date,    
           (SELECT count(*) FROM labs sub
            WHERE sub.test_sname in ('want', 'want2')
            AND sub.PID = labs.PID
            AND sub.entry_date = labs.entry_date
            AND sub.ID <= labs.ID) As RowNo
    FROM labs
    WHERE test_sname in ('want', 'want2')
   ) As dT
WHERE dT.RowNo <= 2

#  PID     test_sname   test_value      units   ref_range              entry_date   RowNo
#  10000000      want           33        U/L        4-40     2008-01-01 01:01:01       1
#  10000000     want2        98.34                            2008-01-01 01:01:01       2
#  10000000      want           33        U/L        4-40     2008-01-02 01:01:01       1
#  10000000     want2        98.34                            2008-01-02 01:01:01       2
#  11111111      want           33        U/L      Apr-40     2009-10-26 07:25:00       1
#  11111111     want2        98.34                            2009-10-26 07:25:00       2
#  11111111      want           33        U/L      Apr-40     2009-10-30 07:25:00       1
#  11111111     want2        98.34                            2009-10-30 07:25:00       2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.