2017-02-13 49 views
2

我有一个数据:(SQL)如何为每个组选择正确的行?

+------------+-----------+-----------+------------+--------------+ 
| first_name | last_name | family_id | is_primary | is_secondary | 
+------------+-----------+-----------+------------+--------------+ 
| a   | b   |   1 |   1 |   0 | 
| aa   | bb  |   1 |   0 |   0 | 
| c   | d   |   1 |   0 |   0 | 
| cc   | dd  |   1 |   0 |   0 | 
| e   | f   |  10 |   0 |   0 | 
| e   | f   |  10 |   0 |   1 | 
| gg   | hh  |  10 |   0 |   1 | 
| gg   | hh  |  10 |   0 |   0 | 
| gg   | hh  |  10 |   0 |   0 | 
| gg   | hh  |  10 |   0 |   0 | 
+------------+-----------+-----------+------------+--------------+ 

我想要做的是:

  • 集团通过family_id(因此,我们将有两个团)
  • 对于每个组,如果有一些行有is_primary等于1,然后选择它们的一个随机行,并获取它的first_name和last_name作为组的两列的输出
  • 对于每个组,如果没有行的is_primary等于1,找到一个行(任何行是确定),其具有is_secondary等于1,并得到它的如first_name和last_name作为该组的两个列的输出

因此,基于上面描述的逻辑和数据,正确结果应该是:

+-----------+------------+-----------+ 
| family_id | first_name | last_name | 
+-----------+------------+-----------+ 
|   1 | a   | b   | 
|  10 | e   | f   | 
+-----------+------------+-----------+ 

或者

+-----------+------------+-----------+ 
| family_id | first_name | last_name | 
+-----------+------------+-----------+ 
|   1 | a   | b   | 
|  10 | gg   | hh  | 
+-----------+------------+-----------+ 

我如何编写查询才能得到正确的结果呢?

下面是创建测试表的脚本。

USE tempdb 
GO 
IF OBJECT_ID('dbo.mytable') IS NOT NULL DROP TABLE dbo.mytable; 
CREATE TABLE mytable (
    first_name VARCHAR(2) NOT NULL, 
    last_name VARCHAR(2) NOT NULL, 
    family_id INTEGER NOT NULL, 
    is_primary INTEGER NOT NULL, 
    is_secondary INTEGER NOT NULL); 

INSERT INTO mytable VALUES ('a','b',1,1,0); 
INSERT INTO mytable VALUES ('aa','bb',1,0,0); 
INSERT INTO mytable VALUES ('c','d',1,0,0); 
INSERT INTO mytable VALUES ('cc','dd',1,0,0); 
INSERT INTO mytable VALUES ('e','f',10,0,0); 
INSERT INTO mytable VALUES ('e','f',10,0,1); 
INSERT INTO mytable VALUES ('gg','hh',10,0,1); 
INSERT INTO mytable VALUES ('gg','hh',10,0,0); 
INSERT INTO mytable VALUES ('gg','hh',10,0,0); 
INSERT INTO mytable VALUES ('gg','hh',10,0,0); 
GO 

SELECT * FROM dbo.mytable; 
+0

你试过了什么 –

+0

是的我试图解决它,但失败了。让我更新这个问题。 –

+0

如果你想要第一个结果,那么它不需要任何的努力,简单的使用它:从mytable 组中选择family_id,min(first_name),min(last_name) family_id –

回答

2

试试这个办法:

;with x as (
    select *, row_number() over(partition by family_id order by is_primary desc, is_secondary desc) rn 
    from mytable 
    where is_primary+is_secondary = 1 
) 
select * from x where rn = 1 

(感谢创造&插入脚本)

编辑: 按OP评论(这两个标志可能是1),改变WHERE子句如下:

where is_primary = 1 or (is_primary = 0 and is_secondary = 1) 
+0

由于'OP'提到'is_primary'和'is_secondary'都可以1,where条件需要改为'> ='1 – Eric

+0

也不是随机选择的(你可以通过非确定性的命令来排序,比如RAND) – Caleth

+0

@ Caleth任何没有显式ORDER BY子句的选择都是非确定性的,你不同意吗?请记住,有“随机”和“随机”,不同级别的“随机性”和不同的相关成本。顺便说一句,RAND()并不是随机的,CHECKSUM(NEWID())在这里会更好。 – dean

1

如果所选行必须为b Ë随机的,那么使用以下命令:

WITH primary_families AS (
    SELECT family_id 
      ,first_name 
      ,last_name 
      ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r 
    FROM familytable 
    WHERE is_primary = 1 
), 
secondary_families AS (
    SELECT family_id 
      ,first_name 
      ,last_name 
      ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r 
    FROM familytable f 
    WHERE is_secondary = 1 
    AND NOT EXISTS (
     SELECT 1 
     FROM familytable 
     WHERE family_id = f.family_id 
     AND is_primary = 1 
    ) 
) 

SELECT f.family_id 
     ,f.first_name 
     ,f.last_name 
FROM primary_families f 
WHERE f.r = 1 

UNION 

SELECT f.family_id 
     ,f.first_name 
     ,f.last_name 
FROM secondary_families f 
WHERE f.r = 1 
0

这不是一个回答您的具体问题,只是一个观察。如果我必须用这样的逻辑开发一个软件或Web应用程序,我会把它从SQL移到可用的编程语言。检索感兴趣的数据集,扫描它,分组并分类。