2015-02-06 89 views
3

样本数据结合GROUP BY和ROW_NUMBER()

userid   email_address    login_name  name    Title  org   phone_number_com 
============= ========================== =============== ================== ========== ============= =================== 
1192   [email protected]  sjobs   Steve Jobs   CEO   Apple   N/A 
1274   [email protected]  sjobs   Steve Jobs   CFO   Apple   697-4686 
1192   [email protected]  sjobs   Steven jobs   CEO   Apple   604-7126 
1885   [email protected] bgates   Bill Gates   CEO   Microsoft  604-7114 
1920   [email protected] bgates   William Gates  CTR   Microsoft  604-7247 
1951   [email protected]  wbuffet   Warren Buffet  CEO   HP    614-9141 
1954   [email protected]  wbuffet   W. Buffet   COO   HP    614-7589 
1951   [email protected]  wbuffet   Warren S Buffet  CIO   Xerox   614-8874 
1956   [email protected]  mzuck   Mark Zuckerberg CEO   FB    614-8295 

QUERY

SELECT * 
FROM 
    (
     SELECT userid, name, login_name, email_address, phone_number_com, 
     ROW_NUMBER() OVER(PARTITION BY [login_name] ORDER BY login_name) Num_Duplicates 
     FROM web_user 
    ) as Rows 
WHERE Num_Duplicates > 1 

这是我的第一篇文章,希望我下面所有的程序。我得到一个结果集,它显示了重复的第2和第3行。我试图GROUP BYlogin_name并只显示最高的行Num_Duplicates。如果一个login_name有一个Num_Duplicates的2和3,只显示行3.我希望这是有道理的!预先感谢您提供的任何指导。

这些都是结果,我想输出查询:

userid | email_address | login_name | name | Title | org phone_number_com | Num_Duplicates  
1192 | [email protected] | sjobs | Steve Jobs | CEO | Apple | N/A | 3  
1885 | [email protected] | bgates | Bill Gates | CEO | Microsoft | 604-7114 | 2  
1951 | [email protected] | wbuffet | Warren Buffet | CEO | HP | 614-9149 | 3 
+1

你为什么需要行号? – serakfalcon 2015-02-06 18:08:37

+1

你会添加你想要的结果吗? – RezaRahmati 2015-02-06 18:11:55

+0

为什么只显示第三个?您正在按login_name进行分组和排序,这意味着每个组内的顺序是任意的,并且每次执行时都会有所不同。所以1,2,3 ..他们都是一样的。为什么只显示3?为什么不只显示2或只显示1? – 2015-02-06 18:16:44

回答

0

如果我明白你正确地做什么,你会被登录名组率先拿到副本的数目:

SELECT login_name, COUNT(*) AS num_duplicates 
    FROM web_user 
GROUP BY login_name 

在这里,您既可以使用子查询与ROW_NUMBER()(虽然我会联系的情况下,推荐使用RANK()),或者你可以只使用总的窗函数:

SELECT login_name, COUNT(*) AS num_duplicates 
    , RANK() OVER (ORDER BY COUNT(*) DESC) AS rn 
    FROM web_user 
GROUP BY login_name; 

那么把它放进一个子查询只得到了login_name最重复的:每OP的评论

SELECT * FROM (
    SELECT login_name, COUNT(*) AS num_duplicates 
     , RANK() OVER (ORDER BY COUNT(*) DESC) AS rn 
     FROM web_user 
    GROUP BY login_name 
) WHERE rn = 1; 

UPDATE,问题编辑:

SELECT userid, name, login_name, email_address, phone_number_com, num_duplicates 
    FROM (
    SELECT userid, name, login_name, email_address, phone_number_com 
     , COUNT(*) OVER (PARTITION BY login_name) AS num_duplicates 
     , ROW_NUMBER() OVER (PARTITION BY login_name ORDER BY userid) AS rn 
     FROM web_user 
) WHERE num_duplicates > 1 AND rn = 1; 

我在做什么以上是使用COUNT(*)作为窗口函数;通过login_name分区将获得每个登录名的计数。我还划分了login_name以获得ROW_NUMBER()并按userid排序,以便我可以返回最小值(您似乎正在执行所需的输出)。

+1

我会在那里添加HAVING COUNT(*)> 2条件,所以你真的知道这些是根据OP的文本重复的 – 2015-02-06 18:24:52

+0

,我敢肯定这不是Ariel想要的。 – 2015-02-06 18:27:18

+0

大卫,我开始用下面的查询: – Ariel 2015-02-06 19:02:40

0

嗯 - 从您的描述听起来像你只是想这样的事情(把我的头顶部):

SELECT login_name, email_address 
FROM web_user 
GROUP BY login_name, email_address 
HAVING count(*) > 2 
+0

在我的结果中,我需要返回userid,name,login_name,email_address,phone_number_com。 – Ariel 2015-02-06 18:36:59

+0

只需根据需要添加 - 例如'login_name,email_address,MAX(phone_number)',等等。 – 2015-02-06 18:38:28

+0

ISE,如果我GROUP BY的所有列我选择它会给我一个不准确的结果。我只需要GROUP BY only login_name并显示我选择的其他字段(例如,用户名,名称,登录名,电子邮件地址,电话号码) – Ariel 2015-02-06 19:12:42

0

下应该给你你需要什么。

ROW_NUMBER窗口函数用于标识login_name的第一行。使用窗口函数COUNT来计算每个login_name的行数。

然后,外部查询将结果限制为具有多于1行的那些login_name,并且仅返回每个login_name的第一行。

DECLARE @users TABLE 
(
    userid    int 
    , email_address  varchar(100) 
    , login_name  varchar(100) 
    , name    varchar(100) 
    , title    varchar(100) 
    , org    varchar(100) 
    , phone_number_com varchar(100) 
) 

INSERT INTO @users 
VALUES 
(1192, '[email protected]', 'sjobs', 'Steve Jobs', 'CEO', 'Apple', 'N/A') 
, (1274, '[email protected]', 'sjobs', 'Steve Jobs', 'CFO', 'Apple', '697-4686') 
, (1192, '[email protected]', 'sjobs', 'Steven jobs', 'CEO', 'Apple', '604-7126') 
, (1885, '[email protected]', 'bgates', 'Bill Gates', 'CEO', 'Microsoft', '604-7114') 
, (1920, '[email protected]', 'bgates', 'William Gates', 'CTR', 'Microsoft', '604-7247') 
, (1951, '[email protected]', 'wbuffet', 'Warren Buffet', 'CEO', 'HP', '614-9141') 
, (1954, '[email protected]', 'wbuffet', 'W. Buffet', 'COO', 'HP', '614-7589') 
, (1951, '[email protected]', 'wbuffet', 'Warren S Buffet', 'CIO', 'Xerox', '614-8874') 
, (1956, '[email protected]', 'mzuck', 'Mark Zuckerberg', 'CEO', 'FB', '614-8295') 
; 

WITH LoginWithWindowFunction AS 
(
    SELECT 
     * 
     , ROW_NUMBER() OVER(PARTITION BY login_name ORDER BY userid) AS LoginOrder 
     , COUNT(*) OVER(PARTITION BY login_name) AS Num_Duplicates 

    FROM 
     @users 
) 

SELECT 
    userid 
    , email_address 
    , login_name 
    , name 
    , title 
    , org 
    , phone_number_com 
    , Num_Duplicates 

FROM 
    LoginWithWindowFunction 

WHERE 
    LoginOrder = 1 
    AND Num_Duplicates > 1 

ORDER BY 
    userid