2011-05-07 197 views
2

我在表中有唯一的键ID键,但我有一个重复值的列?我该如何摆脱这些,而仅保留其中的像这样的:从表中删除重复的行

重复的记录:

id | name | surname | 
1 | test | one  | 
2 | test | two  | 
3 | test3 | three | 
4 | test7 | four | 
5 | test | five | 
6 | test11 | eleven | 

没有重复:

id | name | surname | 
1 | test | one  | 
3 | test3 | three | 
4 | test7 | four | 
6 | test11 | eleven | 

我GOOGLE了这一点,但它似乎没有要工作:

DELETE ct1 
FROM mytable ct1 
     , mytable ct2 
WHERE ct1.name = ct2.name 
     AND ct1.id < ct2.id 

ERROR: syntax error at or near "ct1" 
LINE 1: DELETE ct1 
       ^

********** Error ********** 

我正在使用postgres数据库。

+0

当你清理完数据后,你可能需要在“name”上加上一个UNIQUE约束。 – 2011-05-08 03:18:33

回答

3

你可以试试这个运行多次

delete from mytable where id in (
    select max(id) 
     from mytable 
    group by name 
    having count(1) > 1 
); 

多次等于你在name列有重复的最大数量。

否则,你可以尝试这种更复杂的查询:

delete from mytable where id in (
    select id from mytable 
    except 
    (
    select min(id) 
     from mytable 
    group by name 
    having count(1) > 1 
    union all 
    select min(id) 
     from mytable 
    group by name 
    having count(1) = 1 
    ) 
); 

运行此查询一次只应删除所有你需要的。虽然没有尝试过,但是...

+0

复杂的查询工作,甚至没有尝试它的伟大工作 – 2011-05-07 13:05:07

+2

很高兴帮助。对于像这样的复杂分组,我建议您学习'窗口函数',例如'Rank' @Dalen在其他答案中提示。他们值得学习。 – 2011-05-07 13:07:45

3

使用Rank,实际上我对语法并不完全确定,因为我对PostgreSQL并不擅长,这只是一个提示而已(任何人的更正都将不胜感激):

DELETE FROM mytable 
WHERE id NOT IN 
(
    SELECT x.id FROM 
    (
     SELECT id, RANK() OVER (PARTITION BY name ORDER BY id ASC) AS r 
     FROM mytable 
    ) x 
    WHERE x.r = 1 
)