2011-09-28 97 views
1

我们得到了拥有将近1亿多行的大表。谁能帮如何在表中找到重复的数据,并且可以将其移动到压缩文件SQL Server - 查找表中的重复项

表名:CustomerData
NumberofFields:10

最新一个应该留(这是由END_DATE标识提到NULL在记录)

关注

+4

定义重复。所有列相同的值? – Thilo

回答

3

你只需要动到哪END_DATE不为空行?

在一个单独的事务:

INSERT INTO archive (column1, column2, ... column10) 
SELECT column1, column2, ..., column10 
FROM CustomerData 
WHERE END_DATE IS NOT NULL 

DELETE CustomerData 
WHERE END_DATE IS NOT NULL 
1

您是否试过这种解决方案?

--INSERT Archive (columns) 
SELECT ... 10 columns ... 
FROM CustomerData 
WHERE END_DATE IS NULL 
0

假设CustomerData表结构为: CustomerDAta(的cust_id,CUST_NAME,ADDRESS_ID,START_TIME,结束日期,.....,其他7列);

并假设2个客户有SAme地址ID以获得重复。

插入到存档表: -

INSERT INTO archive (column1, column2, ... column10) 
SELECT cust_id, start_Date, ...,End_Date 
FROM CustomerData 
WHERE END_DATE IS NOT NULL 
AND Address_ID IN(
     SELECT Address_ID FROM 
      (
      SELECT Address(ID),count(Address_ID) 
      FROM customerDAta 
      GROUP BY Address_ID 
      HAVING count(Adddress_ID)>1 
      ) 
     )      
         ) 

要删除: - CustomerDAt表: -

DELETE CustomerData 
WHERE END_DATE IS NOT NULL 
    AND 
    Address_ID IN(
      SELECT Address_ID FROM 
      (
      SELECT Address(ID),count(Address_ID) 
      FROM customerDAta 
      GROUP BY Address_ID 
      HAVING count(Adddress_ID)>1 
      ) 
     ) 

内部子提取重复基于类似于同ADDRESS_ID列Oracle数据库提供的employees表中的DeptID列。