2016-11-09 45 views
0

我是新来的性能问题。所以我不确定我的方法应该是什么。如何提高哈希匹配的外部连接的SQL Server性能问题

这是超过7分钟运行的查询。

INSERT INTO SubscriberToEncounterMapping(PatientEncounterID, InsuranceSubscriberID) 
    SELECT 
     PV.PatientVisitId AS PatientEncounterID, 
     InsSub.InsuranceSubscriberID 
    FROM 
     DB1.dbo.PatientVisit PV 
    JOIN 
     DB1.dbo.PatientVisitInsurance PVI ON PV.PatientVisitId = PVI.PatientVisitId 
    JOIN 
     DB1.dbo.PatientInsurance PatIns on PatIns.PatientInsuranceId = PVI.PatientInsuranceId 
    JOIN 
     DB1.dbo.PatientProfile PP On PP.PatientProfileId = PatIns.PatientProfileId 
    LEFT OUTER JOIN 
     DB1.dbo.Guarantor G ON PatIns.PatientProfileId = G.PatientProfileId 
    JOIN 
     Warehouse.dbo.InsuranceSubscriber InsSub ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId 
         AND InsSub.OrderForClaims = PatIns.OrderForClaims 
         AND ((InsSub.GuarantorID = G.GuarantorId) OR (InsSub.GuarantorID IS NULL AND G.GuarantorId IS NULL)) 
    JOIN 
     Warehouse.dbo.Encounter E ON E.PatientEncounterID = PV.PatientVisitId  

执行计划指出,有一个

哈希匹配右外连接,成本89%

查询

enter image description here

没有一个右外连接查询,所以我不明白问题出在哪里。

如何使查询更有效?

这里是哈希地图详情: enter image description here

+0

首先:我没有看到你的语句使用您在.....也行的你'SELECT'列表使用'InsSub'别名任何表:你*真的*需要加入所有这些表格才能得到这两条信息? –

+0

你可以显示哈希匹配的细节吗?什么是探测器,输出是什么?从屏幕截图中不清楚。我猜想这个谓词会导致你的问题 - '(InsSub.GuarantorID = G.GuarantorId)或(InsSub.GuarantorID IS NULL AND G.GuarantorId IS NULL)',你可能想要考虑使用两个查询,并且结合结果通常当你有这样的OR或谓词时,它会导致次优计划,而且这两个单独的查询能够更好地利用索引。 – GarethD

+0

@GarethD也许在where子句中使用EXISTS而不是在连接中使用这两个谓词? – dfundako

回答

1

要阐述我的意见,你可以尝试它分裂成两个查询,第一个匹配GuarantorID和第二匹配当它在InsuranceSubscriberNULL,并在Guarantor,或者如果记录完全丢失从Guarantor

INSERT INTO SubscriberToEncounterMapping(PatientEncounterID, InsuranceSubscriberID) 
SELECT PV.PatientVisitId AS PatientEncounterID, InsSub.InsuranceSubscriberID 
FROM DB1.dbo.PatientVisit PV 
     JOIN DB1.dbo.PatientVisitInsurance PVI 
      ON PV.PatientVisitId = PVI.PatientVisitId 
     JOIN DB1.dbo.PatientInsurance PatIns 
      ON PatIns.PatientInsuranceId = PVI.PatientInsuranceId 
     JOIN DB1.dbo.PatientProfile PP 
      ON PP.PatientProfileId = PatIns.PatientProfileId 
     JOIN DB1.dbo.Guarantor G 
      ON PatIns.PatientProfileId = G.PatientProfileId 
     JOIN Warehouse.dbo.InsuranceSubscriber InsSub 
      ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId 
      AND InsSub.OrderForClaims = PatIns.OrderForClaims 
      AND InsSub.GuarantorID = G.GuarantorId 
     JOIN Warehouse.dbo.Encounter E 
      ON E.PatientEncounterID = PV.PatientVisitId 
UNION ALL 
SELECT PV.PatientVisitId AS PatientEncounterID, InsSub.InsuranceSubscriberID 
FROM DB1.dbo.PatientVisit PV 
     JOIN DB1.dbo.PatientVisitInsurance PVI 
      ON PV.PatientVisitId = PVI.PatientVisitId 
     JOIN DB1.dbo.PatientInsurance PatIns 
      ON PatIns.PatientInsuranceId = PVI.PatientInsuranceId 
     JOIN DB1.dbo.PatientProfile PP 
      ON PP.PatientProfileId = PatIns.PatientProfileId 
     JOIN Warehouse.dbo.InsuranceSubscriber InsSub 
      ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId 
      AND InsSub.OrderForClaims = PatIns.OrderForClaims 
      AND InsSub.GuarantorID IS NULL 
     JOIN Warehouse.dbo.Encounter E 
      ON E.PatientEncounterID = PV.PatientVisitId 
WHERE NOT EXISTS 
     ( SELECT 1 
      FROM DB1.dbo.Guarantor G 
      WHERE PatIns.PatientProfileId = G.PatientProfileId 
      AND  InsSub.GuarantorID IS NOT NULL 
     ); 
+0

这绝对快很多!但是返回的记录与原始查询不同。所以我将不得不推迟,但这绝对是要走的路!! –

-2

的联接基础上,以减少每个返回的记录数加入的能力我会重新排序。无论哪个加入可以减少返回的数量或记录都会提高效率。然后执行外部连接。此外,表锁定总是可能是一个问题,所以添加(nolock)以防止记录被锁定。

也许像这样的东西将工作与一点点调整。

INSERT INTO SubscriberToEncounterMapping (
    PatientEncounterID 
    , InsuranceSubscriberID 
    ) 
SELECT PV.PatientVisitId AS PatientEncounterID 
    , InsSub.InsuranceSubscriberID 
FROM DB1.dbo.PatientVisit PV WITH (NOLOCK) 
INNER JOIN Warehouse.dbo.Encounter E WITH (NOLOCK) 
    ON E.PatientEncounterID = PV.PatientVisitId 
INNER JOIN DB1.dbo.PatientVisitInsurance PVI WITH (NOLOCK) 
    ON PV.PatientVisitId = PVI.PatientVisitId 
INNER JOIN DB1.dbo.PatientInsurance PatIns WITH (NOLOCK) 
    ON PatIns.PatientInsuranceId = PVI.PatientInsuranceId 
INNER JOIN DB1.dbo.PatientProfile PP WITH (NOLOCK) 
    ON PP.PatientProfileId = PatIns.PatientProfileId 
INNER JOIN Warehouse.dbo.InsuranceSubscriber InsSub WITH (NOLOCK) 
    ON InsSub.InsuranceCarriersID = PatIns.InsuranceCarriersId 
     AND InsSub.OrderForClaims = PatIns.OrderForClaims 
LEFT JOIN DB1.dbo.Guarantor G WITH (NOLOCK) 
    ON PatIns.PatientProfileId = G.PatientProfileId 
     AND (
      (InsSub.GuarantorID = G.GuarantorId) 
      OR (
       InsSub.GuarantorID IS NULL 
       AND G.GuarantorId IS NULL 
       ) 
      ) 
+1

添加NOLOCK如何影响执行计划中的散列连接运算符? – dfundako

+1

连接写入的顺序与它们被执行的顺序无关(除非你使用'OPTION(FORCEORDER)'),所以这没有任何区别。你也可以阅读[不良习惯:把NOLOCK放在任何地方](https://blogs.sentryone.com/aaronbertrand/bad-habits-nolock-everywhere/),这不是一个神奇的性能修复,应该谨慎使用通常是由那些了解并意识到风险的人。 – GarethD

+0

我发现连接顺序很重要,如果你想要优化器去做它的工作,那么它可以自行优化或自行优化Joins。同意没有锁可能不需要或理想,但如果有东西被锁定,它将通过防止等待锁来更快地执行。如果它不帮助删除它们。哈希匹配将始终存在,但减少操作中的记录集大小应该有所帮助。 – KH1229