2015-07-09 87 views
8

我一直以来都认为不存在就是要走的路,而不是使用不在的状态。然而,我对我一直在使用的查询做了比较,我注意到Not In条件下的执行实际上似乎更快。任何洞悉,为什么这可能是这种情况,或者如果我只是做了一个可怕的假设,直到这一点,将不胜感激!不存在vs不在:效率

QUERY 1:

SELECT DISTINCT 
a.SFAccountID, a.SLXID, a.Name FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK) 
JOIN _SLX_AccountChannel b WITH(NOLOCK) 
ON a.SLXID = b.ACCOUNTID 
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK) 
ON a.SFAccountID = c.SFAccountID 
WHERE b.STATUS IN ('Active','Customer', 'Current') 
AND c.Primary__C = 0 
AND NOT EXISTS 
(
SELECT 1 FROM [dbo].[Salesforce_Contacts] c2 WITH(NOLOCK) 
WHERE a.SFAccountID = c2.SFAccountID 
AND c2.Primary__c = 1 
); 

QUERY 2:

SELECT 
DISTINCT 
a.SFAccountID FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK) 
JOIN _SLX_AccountChannel b WITH(NOLOCK) 
ON a.SLXID = b.ACCOUNTID 
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK) 
ON a.SFAccountID = c.SFAccountID 
WHERE b.STATUS IN ('Active','Customer', 'Current') 
AND c.Primary__C = 0 
AND a.SFAccountID NOT IN (SELECT SFAccountID FROM [dbo].[Salesforce_Contacts] WHERE Primary__c = 1 AND SFAccountID IS NOT NULL); 

用于查询1实际执行计划: Execution plan 1

实际执行计划问题2:Execution plan 2

TIME/IO统计公报:

查询#1(使用未存在):

SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 0 ms. 

SQL Server Execution Times: 
    CPU time = 0 ms, elapsed time = 0 ms. 
SQL Server parse and compile time: 
    CPU time = 532 ms, elapsed time = 533 ms. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Salesforce_Contacts'. Scan count 2, logical reads 3078, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'INFORMATION'. Scan count 1, logical reads 691, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'ACCOUNT'. Scan count 4, logical reads 567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Salesforce_Accounts'. Scan count 1, logical reads 680, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times: 
    CPU time = 250 ms, elapsed time = 271 ms. 
SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 0 ms. 

SQL Server Execution Times: 
    CPU time = 0 ms, elapsed time = 0 ms. 

查询#2(使用未在):

SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 0 ms. 

SQL Server Execution Times: 
    CPU time = 0 ms, elapsed time = 0 ms. 
SQL Server parse and compile time: 
    CPU time = 500 ms, elapsed time = 500 ms. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Salesforce_Contacts'. Scan count 2, logical reads 3079, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'INFORMATION'. Scan count 1, logical reads 691, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'ACCOUNT'. Scan count 4, logical reads 567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Salesforce_Accounts'. Scan count 1, logical reads 680, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times: 
    CPU time = 157 ms, elapsed time = 166 ms. 
SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 0 ms. 

SQL Server Execution Times: 
    CPU time = 0 ms, elapsed time = 0 ms. 
+1

看看这是否有帮助http://stackoverflow.com/questions/173041/not-in-vs-not-exists –

+2

(1)实际计划看起来几乎与我一样。 (2)您需要测量查询的实际性能,而不是计划的性能估计。 –

+0

我对超大型数据库的使用经验让我更喜欢'IN'而不是'EXISTS'。我也停止单独使用'CTE',并使用临时表格 – JamieD77

回答

1

尝试

SELECT DISTINCT a.SFAccountID, a.SLXID, a.Name 
    FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK) 
    JOIN _SLX_AccountChannel b WITH(NOLOCK) 
    ON a.SLXID = b.ACCOUNTID 
    AND b.STATUS IN ('Active','Customer', 'Current') 
    JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK) 
    ON a.SFAccountID = c.SFAccountID 
    AND c.Primary__C = 0 
    LEFT JOIN [dbo].[Salesforce_Contacts] c2 WITH(NOLOCK) 
    on c2.SFAccountID = a.SFAccountID 
    AND c2.Primary__c = 1 
WHERE c2.SFAccountID is null 
+0

这种类型的'join'与'where'在大多数情况下并没有太大的区别 – JamieD77

+2

@ JamieD77当它确实有所改善时,它会更好。我这样做是为了生活。 – Paparazzi

+0

这个更快。我可以简单解释为什么这会更快吗?谢谢! – tchock

1

据我所知,一个没有在作为两个嵌套的指示相同的方式会。

所以,asuming你有两个表:表(1000条记录)和塔布拉(2000年记录),

select * from table where table.field not in (select field from tabla) 

是喜欢做

for (int i = 0; i < 1000; i++) { 
    for (int j = 0; j < 2000; j++) { 
    } 
} 

是1000 * 2000 = 200万次。

左侧与tabla.field连接是空的把戏,再次,据我了解它,也只2000操作

使用左连接。

+2

在没有查询优化器的地方,一切都在内存中,当然。在现实世界中,没有那么多...... –

+1

在本地数据库中做实验。慷慨地填充它,我会说每桌100000个记录。测量两个选项的时间(不参与和左连接),你会得到什么? –

0

这是假设你想找到没有主要联系人帐户和只能有一个主要联系人

SELECT a.SFAccountID, a.SLXID, a.Name 
FROM [dbo].[Salesforce_Accounts] a 
     LEFT JOIN [dbo].[Salesforce_Contacts] c ON a.SFAccountID = c.SFAccountID AND c.Primary__C = 1 
WHERE 
     EXISTS (SELECT * 
       FROM SLX_AccountChannel b 
       WHERE b.ACCOUNTID = a.SLXID 
        AND b.STATUS IN ('Active', 'Customer', 'Current')) 
     AND c.SFContactID IS NULL 

如果你想有接触帐户,但没有主要联系人,你可以使用

SELECT 
    a.SFAccountID , 
    a.SLXID , 
    a.Name 
FROM 
    [dbo].[Salesforce_Accounts] a 
WHERE 
    a.SFAccountID IN (SELECT SFAccountID 
        FROM [Salesforce_Contacts] 
        GROUP BY SFAccountID 
        HAVING SUM(CAST(Primary__c AS INT) = 0)) 

    AND a.SLXID IN (SELECT ACCOUNTID 
        FROM _SLX_AccountChannel 
        WHERE [STATUS] IN ('Active', 'Customer', 'Current')) 
+0

嘿垃圾话。 -1你完全错过了[dbo]。[Salesforce_Contacts] .Primary__C = 0 – Paparazzi

+0

你怎么知道他只希望拥有1个非主要联系人的账户? – JamieD77

+1

有“AND c.Primary__C = 0”。我不知道他想要什么,但我确实知道查询的功能。 – Paparazzi

1

我觉得缺失索引造成的差额为EXISTS()IN操作。

虽然这个问题不要问了一个更好的查询,但对我来说,我会尽量避免含混这样

SELECT 
    a.SFAccountID, a.SLXID, a.Name 
FROM 
    [dbo].[Salesforce_Accounts] a WITH(NOLOCK) 
    CROSS APPLY 
    (
     SELECT SFAccountID 
     FROM [dbo].[Salesforce_Contacts] WITH(NOLOCK) 
     WHERE SFAccountID = a.SFAccountID 
     GROUP BY SFAccountID 
     HAVING MAX(Primary__C + 0) = 0 -- Assume Primary__C is a bit value 
    ) b 
WHERE 
    -- Actually it is the filtering condition for account channel 
    EXISTS 
    (
     SELECT * FROM _SLX_AccountChannel WITH(NOLOCK) 
     WHERE ACCOUNTID = a.SLXID AND STATUS IN ('Active','Customer', 'Current') 
    ) 
1

的问题是:“为什么NOT IN似乎快于NOT EXISTS”。

我的答案是:它看起来似乎更快,但它是一样的。 (在这种情况下)

您是否真的测量了两次查询的时间并确认有差异?

或者你只是看着执行计划?

据我了解,你在截图(53%比47%)看到查询的成本是:

  • 估计查询费用,即使计划实际;
  • 它是查询成本,而不是时间,这是从CPU和IO“成本”结合。

似乎在这种特殊情况下查询优化器为这两个查询生成了几乎相同的计划。计划中某些运营商的估计行数很可能略有不同(略),但实际表现相同,因为平面形状相同。如果估计的行数不同,则会导致您看到的估计查询成本不同。

要查看计划的差异(如果有的话),我会使用像SQL Sentry Plan Explorer这样的工具。它显示了更多细节,您可以更轻松地比较查询的所有方面。


重写查询更快是一个不同的问题,我不试图在这里回答它。

0

您可以不碰/加入Salesforce_Contacts不止一次。这是更紧凑,更快速:

SELECT a.SFAccountID, a.SLXID, a.Name 
FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK) 
JOIN _SLX_AccountChannel b WITH(NOLOCK) 
    ON a.SLXID = b.ACCOUNTID 
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK) 
    ON a.SFAccountID = c.SFAccountID 
WHERE b.STATUS IN ('Active','Customer', 'Current') 
GROUP BY a.SFAccountID, a.SLXID, a.Name 
HAVING MAX(c.Primary__C) = 0 

差异与EXISTS之间IN是不容忽视的。