2016-04-29 171 views
0

我的目标是,为每个PID,选择与同一entry_date发生的“test_sname价值观和“want2” 2分的记录。我为第一个5 entry_dates这样做,包括test_snamesMySQL查询 - 选择2行2特定DISTINCT列的值

这是我完成这个查询:

queryBuilder = 
"""select PID, test_sname, test_value, units, ref_range, entry_date from labs 
    where PID=%s and (test_sname='want' or test_sname='want2') and entry_date in 

    (select entry_date from labs where PID=%s and test_sname in ('want', 'want2') 
    group by entry_date having count(*) = 2) 

    order by entry_date limit 10;""" % (pid, pid) 

它将按预期工作当ENTRY_DATE只有两个包含的“”一个test_sname或“want2”行。查询的

PID  |test_sname |test_value |units |entry_date 
10000000 | want  |   343 | U/L  | 2008-01-01 01:01:01 
10000000 | want2  |  984.34 |   | 2008-01-01 01:01:01 
10000000 | NA1  |   56 | %  | 2008-01-01 01:01:01 
10000000 | NA2  |   420 | mg/dL | 2008-01-01 01:01:01 
10000000 | NA2  |   420 | mg/dL | 2008-01-02 01:01:01 

10000000 | want  |   343 | U/L  | 2008-01-02 01:01:01 
10000000 | want2  |  984.34 |   | 2008-01-02 01:01:01 
10000000 | NA1  |   26 | %  | 2008-01-02 01:01:01 
10000000 | NA2  |   410 | mg/dL | 2008-01-02 01:01:01 
10000000 | NA2  |   455 | mg/dL | 2008-01-02 01:01:01 

结果(这是正确的):

PID  |test_sname |test_value |units |entry_date 
10000000 | want  |   343 | U/L  | 2008-01-01 01:01:01 
10000000 | want2  |  984.34 |   | 2008-01-01 01:01:01 
10000000 | want  |   343 | U/L  | 2008-01-02 01:01:01 
10000000 | want2  |  984.34 |   | 2008-01-02 01:01:01 

的问题出现时,例如,有在同距离的 '' 的test_sname多行entry_date,因为having count(*) = 2不再有效。没有像这样的数据的结果。

PID  |test_sname |test_value |units |entry_date 
11111111 | want  |   343 | U/L  | 2009-10-26 07:25:00 
11111111 | want2  |  984.34 |   | 2009-10-26 07:25:00 
11111111 | want  |  189 | U/L  | 2009-10-26 07:25:00 
11111111 | NA1  |   50 | %  | 2009-10-26 07:25:00 
11111111 | NA2  |   40 | mg/dL | 2009-10-26 07:25:00 
11111111 | NA3  |  84.55 |   | 2009-10-26 07:25:00 
11111111 | NA4  |  4.5 | thou/uL | 2009-10-26 07:25:00 
11111111 | NA5  |  14.6 | g/dL | 2009-10-26 07:25:00 
11111111 | NA6  |  0.96 | mg/dL | 2009-10-26 07:25:00 

11111111 | want  |   343 | U/L  | 2009-10-30 07:25:00 
11111111 | want2  |  984.34 |   | 2009-10-30 07:25:00 
11111111 | want  |  189 | U/L  | 2009-10-30 07:25:00 
11111111 | NA1  |   6 | %  | 2009-10-30 07:25:00 
11111111 | NA2  |   40 | mg/dL | 2009-10-30 07:25:00 
11111111 | NA3  |  84.55 |   | 2009-10-30 07:25:00 
11111111 | NA4  |  4.5 | thou/uL | 2009-10-30 07:25:00 
11111111 | NA5  |  14.6 | g/dL | 2009-10-30 07:25:00 
11111111 | NA6  |  0.96 | mg/dL | 2009-10-30 07:25:00 

作为一个限制,我试图把一个limit 2子查询(我知道,本身不会解决这个问题),但它给了这个错误,我想我有最SQL的更新版本,所以显然我不能在子查询中使用limit

This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' 

我知道有多种方法来解决这个问题 - 我可以选择所有的值,然后编程采取什么我需要Python,但我正在寻找使用Python编写的MySQL查询解决方案MySQL的连接器。虽然我不会抱怨python解决方案。

我使用Python v3.4.4与MySQL连接器V2.1.3和MySQL服务器v5.7.11

感谢您的时间!

回答

1

考虑通过子查询使用您的分组的运行计数。然后,过滤RowNo为1或2的任何位置。这样,您将不需要传递参数,因为所有的PID都将被处理。下面假设实验室表具有唯一标识符,ID

SELECT * 
FROM 
    (SELECT PID, test_sname, test_value, units, ref_range, entry_date,  
      (SELECT count(*) FROM labs sub 
      WHERE sub.test_sname in ('want', 'want2') 
      AND sub.PID = labs.PID 
      AND sub.entry_date = labs.entry_date 
      AND sub.ID <= labs.ID) As RowNo 
    FROM labs 
    WHERE test_sname in ('want', 'want2') 
    ) As dT 
WHERE dT.RowNo <= 2 

# PID  test_sname test_value  units ref_range    entry_date RowNo 
# 10000000  want   33  U/L  4-40  2008-01-01 01:01:01  1 
# 10000000  want2  98.34       2008-01-01 01:01:01  2 
# 10000000  want   33  U/L  4-40  2008-01-02 01:01:01  1 
# 10000000  want2  98.34       2008-01-02 01:01:01  2 
# 11111111  want   33  U/L  Apr-40  2009-10-26 07:25:00  1 
# 11111111  want2  98.34       2009-10-26 07:25:00  2 
# 11111111  want   33  U/L  Apr-40  2009-10-30 07:25:00  1 
# 11111111  want2  98.34       2009-10-30 07:25:00  2