2015-01-20 92 views
0

对于下面给出的数据集,我想删除具有较晚时间戳的行。根据条件删除重复项

**37C1Z2990E5E0 (TRXID) should be UNIQUE** in the below dataSet 

    JKLAMMSDF123 20141112 20141117 5000.0 P 1.22 RT101018 *2014-11-12 10:10:26* 37C1Z2990E5E0 101018 
    JKLAMMSDF123 20141110 20141114 5000.0 P 1.22 RT161002 *2014-11-12 10:11:33* 37C1Z2990E5E0 161002 

-- More rows 
+0

你不能去PK的相同的值在一个表中。这是非规格化的数据集? – 2015-01-20 21:19:56

+0

您是否只对带有时间戳[BETWEEN](https://msdn.microsoft.com/en-us/library/ms187922.aspx)两个其他结果感兴趣? – ryanyuyu 2015-01-20 21:20:38

+0

我的意思是我们可以将TRXID作为唯一的值,并且不允许重复 – SHinny 2015-01-20 21:21:31

回答

1

试试这个:

;WITH DATA AS 
(
    SELECT TRXID, MAX(YourTimestampColumn) AS TS 
    FROM YourTable 
    GROUP BY TRXID 
    HAVING COUNT(*) > 1 
) 
DELETE T 
FROM YourTable AS T 
INNER JOIN DATA AS D 
    ON T.TRXID = D.TRXID 
    AND T.YourTimestampColumn = D.TS; 
+0

这选择所有行,不仅重复... – SHinny 2015-01-20 21:43:18

+0

你现在可以尝试。 – dario 2015-01-20 21:47:31

+0

感谢您的真棒解决方案。 – SHinny 2015-01-21 15:26:59

0

选择timestamp列的min和所有其他列的组。

SELECT MIN(TIMESTAMP), C1, C2, C3... 
FROM YOUR_TABLE 
GROUP BY C1, C2, C3.. 
0

我会用window functionCTE做到这一点。

若要检查删除重复项后的结果使用此。

;WITH DATA 
    AS (SELECT *, 
       Row_number()OVER(partition BY TRXID ORDER BY YourTimestampColumn) rn 
     FROM YourTable) 
select * 
FROM data 
WHERE rn = 1 

delete重复项使用此项。

;WITH DATA 
    AS (SELECT *, 
       Row_number()OVER(partition BY TRXID ORDER BY YourTimestampColumn) rn 
     FROM YourTable) 
DELETE FROM data 
WHERE rn > 1 

这会工作,即使你比一个重复的相同TRXID