我目前正在研究DataImport脚本,该脚本旨在将数据从一个数据库移动到另一个数据库。我遇到的主要问题是所涉及的表格包含大量重复记录,重复字段为产品代码,语言,立法,品牌名称,公式和版本,即我们可能在数据库中具有以下内容:SQL Server 2005--删除重复记录,同时保留第一个记录
我的测试产品,英语,英国,测试品牌,测试式,1(ID 1 - 不包括在组)
我的测试产品,英语,英国,测试品牌,测试式,1(ID 2 - 不包括在组里)
我的测试产品,英国,英国,测试品牌,测试配方,1(ID 3 - 不包括在组中)
我的测试产品, 1(ID 4 - 不包括在组中)
正如你所看到的,这些记录在各方面都是相同的。我的问题是,作为数据加载脚本的一部分,我希望删除ID为1,2和3的记录,同时保留ID为4的记录,因为这将是最新的记录,因此是一个我想保留。要做到这一点,我已经写了T-SQL脚本如下:
-- get the list of items where there is at least one duplicate
DECLARE cDuplicateList CURSOR FOR
SELECT productcode, languageid, legislationid, brandName, versionnumber, formulaid
FROM allproducts
GROUP BY productcode, languageid, legislationid, brandName, versionnumber, formulaid
HAVING COUNT (*) > 1
OPEN cDuplicateList
FETCH cDuplicateList INTO @productCode, @languageId, @legislationId, @brandName, @versionNumber, @formulaId
-- while there are still duplicates
WHILE @@FETCH_STATUS=0
BEGIN
-- delete from the table where the product ID is in the sub-query, which contains all
-- of the records apart from the last one
DELETE FROM AllProducts
WHERE productId IN
(
SELECT productId
FROM allProducts
WHERE productCode = @productCode
AND (languageId = @languageId OR @languageId IS NULL)
AND (legislationId = @legislationId OR @legislationId IS NULL)
AND (brandName = @brandName OR @brandName IS NULL)
AND (versionNumber = @versionNumber OR @versionNumber IS NULL)
AND (formulaId = @formulaId OR @formulaId IS NULL)
EXCEPT
SELECT TOP 1 productId
FROM allProducts
WHERE productCode = @productCode
AND (languageId = @languageId OR @languageId IS NULL)
AND (legislationId = @legislationId OR @legislationId IS NULL)
AND (brandName = @brandName OR @brandName IS NULL)
AND (versionNumber = @versionNumber OR @versionNumber IS NULL)
AND (formulaId = @formulaId OR @formulaId IS NULL)
)
FETCH cDuplicateList INTO @productCode, @languageId, @legislationId, @brandName, @versionNumber, @formulaId
END
现在,这样做的工作 - 它只是慢得令人难以置信,我想不出任何简单的方法,使其更快。有没有人有任何想法,我如何维护相同的功能,但使其运行速度更快?如果你想看到你要删除什么
WITH CTE AS
(
SELECT ProductCode, Language, Legislation, BrandName, Formula, Version,
RN = ROW_NUMBER()
OVER (
PARTITION BY productcode, language, legislation, brandname, formula, version
ORDER BY id DESC)
FROM dbo.Students
)
DELETE FROM CTE WHERE RN > 1
变化DELETE
到SELECT *
:
可能的重复[如何删除重复的行?](http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows) –