使用IN（子查询）时性能损失较大。为什么？

我正在使用SQL Server 2005，我注意到一些奇怪的事情，当我想在IN子句中使用子查询时过滤一些结果。例如，这是我目前的查询，并将其70秒运行一次，平均：使用IN（子查询）时性能损失较大。为什么？

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales 
from Archive 
where CustomerID = 20 
and ReportDate = '2/3/2011' 
and Phone in (
    select Phone 
    from PlanDetails 
    where Phone is not null 
    and Length is not null 
    and PlannedImp > 0 
    and CustomerID = 20 
    and (StatusID <> 2 and StatusID <> 7) 
    and SubcategoryID = 88 
) 
group by Phone, ZipCode

但是，如果我把它们分解成2个独立的查询，他们采取低于1秒的每个运行。

select Phone 
from PlanDetails 
where Phone is not null 
and Length is not null 
and PlannedImp > 0 
and CustomerID = 20 
and (StatusID <> 2 and StatusID <> 7) 
and SubcategoryID = 88

和

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales 
from Archive 
where CustomerID = 20 
and ReportDate = '2/3/2011' 
group by Phone, ZipCode

最后，如果我这样做，它返回相同的结果第一次查询，但在约2-3秒：

select Phone 
into #tempTable 
from PlanDetails 
where Phone is not null 
and Length is not null 
and PlannedImp > 0 
and CustomerID = 20 
and (StatusID <> 2 and StatusID <> 7) 
and SubcategoryID = 88 

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales 
from Archive 
where CustomerID = 20 
and ReportDate = '2/3/2011' 
and Phone in (
    select Phone 
    from #tempTable 
) 
group by Phone, ZipCode

在过去的几年几周来，我一直注意到，不仅这个查询很慢，而且在IN子句中使用（稍微复杂）子查询的任何查询都会破坏性能。这是什么原因？

可供这些查询使用的唯一索引是两个表的CustomerID上的非聚簇索引。我查看了慢速查询和快速查询的执行计划，发现存档表上的非聚簇索引查找占了迄今为止最高的成本百分比（80-90％）。但是，唯一的区别是慢速查询中的那一步CPU的成本为7.1，而快速的CPU的成本为1.7。

来源

2011-02-04 Jason

当这种事情发生（错误的查询计划）运行sp_updatestats有时可以解决问题。 – Magnus 2011-02-06 23:39:17

它依赖于数据库系统，版本，设置等，但通常会发生什么是数据库失败（或拒绝）缓存该内部查询，所以它正在执行每个迭代外部查询。您正在将问题从O（n）效率类更改为O（n^2）。

来源

2011-02-04 17:12:01 TheBuzzSaw

这正是发生了什么事情。在查看执行计划作为数据表而不是图形计划后，我注意到子查询方法显示更多的执行（〜326000）与内部联接（〜14244） – Jason 2011-02-04 17:48:17

报价IN vs. JOIN vs. EXISTS：

现在我们看到，违背民意，IN/EXISTS查询并不比在SQL Server连接查询效率较低。

事实上，JOIN查询在非索引表上效率较低，因为Semi Join方法允许对单个哈希表进行聚合和匹配，而JOIN需要两步完成这两个操作。

除此之外，索引和当前表统计如何在优化程序决定执行查询中发挥重要作用。

来源

2011-02-04 17:13:26

如果用连接重写查询会怎么样？

select a.Phone, a.ZipCode, sum(a.Calls) as Calls, sum(a.Sales) as Sales 
from Archive a 
    inner join PlanDetails pd 
     on a.CustomerID = pd.CustomerID 
      and a.Phone = pd.Phone 
where a.CustomerID = 20 
    and a.ReportDate = '2/3/2011' 
    and pd.Length is not null 
    and pd.PlannedImp > 0 
    and (pd.StatusID <> 2 and pd.StatusID <> 7) 
    and pd.SubcategoryID = 88 
group by a.Phone, a.ZipCode

来源

2011-02-04 17:14:01

这很快就会工作（<2秒），但我真的想弄清楚为什么IN（子查询）将所有东西搞乱，所以我将知道什么时候避免这些类型的查询。 – Jason 2011-02-04 17:32:12

我提出2个解决方案：
1.尝试使用的EXISTS代替IN来重写查询。如果您使用较旧的SQL Server版本，则可能会有所帮助（如果我的内存能够很好地为我服务，那么在SQL Server 2005之前EXITST和IN会生成不同的执行计划）。
2.尽量使用INNER JOIN（你也可以使用CTE）：

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales 
from Archive 
INNER JOIN 
(
    select DISTINCT Phone // DISTINCT to avoid duplicates 
    from PlanDetails 
    where Phone is not null 
    and Length is not null 
    and PlannedImp > 0 
    and CustomerID = 20 
    and (StatusID <> 2 and StatusID <> 7) 
    and SubcategoryID = 88 
)XX ON (XX.Phone = Archive.Phone) 
where CustomerID = 20 and ReportDate = '2/3/2011'  
group by Phone, ZipCode

个人而言，我希望第二个办法，给你更好的结果。

来源

2011-02-04 17:21:33 a1ex07

奇怪的是，这需要大致相同的时间作为第一个查询。我认为它是从子查询中的PlanDetails中进行选择的事实。 – Jason 2011-02-04 17:30:47

使用IN（子查询）时性能损失较大。为什么？

回答

相关问题