2010-01-06 161 views
1

我有一个移植到SQLite的800MB MS Access数据库。数据库的结构如下(SQLite数据库迁移后大约为330MB):SQLite查询比MSAccess查询运行速度慢10倍

Occurrence有1,600,000条记录。该表是这样的:

CREATE TABLE Occurrence 
(
SimulationID INTEGER, SimRunID INTEGER, OccurrenceID INTEGER, 
OccurrenceTypeID INTEGER, Period INTEGER, HasSucceeded BOOL, 
PRIMARY KEY (SimulationID, SimRunID, OccurrenceID) 
) 

它具有以下指标:

CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded" ASC) 

CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" ("OccurrenceID" ASC) 

CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" ASC) 

CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" ("SimulationID" ASC) 

OccurrenceParticipant有340万分的记录。该表是这样的:

CREATE TABLE OccurrenceParticipant 
(
SimulationID INTEGER,  SimRunID INTEGER, OccurrenceID  INTEGER, 
RoleTypeID  INTEGER,  ParticipantID INTEGER 
) 

它具有以下指标:

CREATE INDEX "OccurrenceParticipant_OccurrenceID_idx" ON "OccurrenceParticipant" ("OccurrenceID" ASC) 

CREATE INDEX "OccurrenceParticipant_ParticipantID_idx" ON "OccurrenceParticipant" ("ParticipantID" ASC) 

CREATE INDEX "OccurrenceParticipant_RoleType_idx" ON "OccurrenceParticipant" ("RoleTypeID" ASC) 

CREATE INDEX "OccurrenceParticipant_SimRunID_idx" ON "OccurrenceParticipant" ("SimRunID" ASC) 

CREATE INDEX "OccurrenceParticipant_SimulationID_idx" ON "OccurrenceParticipant" ("SimulationID" ASC) 

InitialParticipant有130条记录。该表的结构是

CREATE TABLE InitialParticipant 
(
ParticipantID INTEGER PRIMARY KEY,  ParticipantTypeID INTEGER, 
ParticipantGroupID  INTEGER 
) 

表有以下指标:

CREATE INDEX "initialpart_participantTypeID_idx" ON "InitialParticipant" ("ParticipantGroupID" ASC) 

CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant" ("ParticipantID" ASC) 

ParticipantGroup有22条记录。它看起来像

CREATE TABLE ParticipantGroup (
ParticipantGroupID INTEGER, ParticipantGroupTypeID  INTEGER, 
Description varchar (50),  PRIMARY KEY( ParticipantGroupID ) 
) 

表有以下指标: CREATE INDEX “ParticipantGroup_ParticipantGroupID_idx” ON “ParticipantGroup”( “ParticipantGroupID” ASC)

tmpSimArgs有18条记录。它具有以下结构:

CREATE TABLE tmpSimArgs (SimulationID varchar, SimRunID int(10)) 

与以下指标:

CREATE INDEX tmpSimArgs_SimRunID_idx ON tmpSimArgs(SimRunID ASC) 

CREATE INDEX tmpSimArgs_SimulationID_idx ON tmpSimArgs(SimulationID ASC) 

表“tmpPartArgs”有80条记录。它具有以下结构:

CREATE TABLE tmpPartArgs(participantID INT) 

及以下指标:

CREATE INDEX tmpPartArgs_participantID_idx ON tmpPartArgs(participantID ASC) 

我有一个涉及到多个内部连接的查询,我所面临的问题是查询的Access版本大约需要一秒,而相同查询的SQLite版本需要10秒(大约慢10倍!)我不可能迁移回Access,SQLite是我唯一的选择。

我是新来编写数据库查询,因此这些查询可能看起来很愚蠢,所以请告诉任何你看到错误或孩子的东西。

在访问查询是(整个查询需要1秒来执行):

SELECT ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period, Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID FROM 
( 
    ParticipantGroup INNER JOIN InitialParticipant ON ParticipantGroup.ParticipantGroupID = InitialParticipant.ParticipantGroupID 
) INNER JOIN 
(
tmpPartArgs INNER JOIN 
    (
    (
     tmpSimArgs INNER JOIN Occurrence ON (tmpSimArgs.SimRunID = Occurrence.SimRunID) AND (tmpSimArgs.SimulationID = Occurrence.SimulationID) 
    ) INNER JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID = OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID = OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID = OccurrenceParticipant.SimulationID) 
) ON tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID 
) ON InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID WHERE (((OccurrenceParticipant.RoleTypeID)=52 Or (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period; 

SQLite的查询如下(此查询需要大约10秒):

SELECT ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period, Count(ij2.occpParticipantID) AS CountOfParticipantID FROM 
(
    SELECT ip.ParticipantGroupID AS ipParticipantGroupID, ip.ParticipantID AS ipParticipantID, ip.ParticipantTypeID, pg.ParticipantGroupID AS pgParticipantGroupID, pg.ParticipantGroupTypeID, pg.Description FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID 
) AS ij1 INNER JOIN 
(
    SELECT tpa.participantID AS tpaParticipantID, ij3.* FROM tmpPartArgs AS tpa INNER JOIN 
    (
     SELECT ij4.*, occp.SimulationID as occpSimulationID, occp.SimRunID AS occpSimRunID, occp.OccurrenceID AS occpOccurrenceID, occp.ParticipantID AS occpParticipantID, occp.RoleTypeID FROM 
      (
       SELECT tsa.SimulationID AS tsaSimulationID, tsa.SimRunID AS tsaSimRunID, occ.SimulationID AS occSimulationID, occ.SimRunID AS occSimRunID, occ.OccurrenceID AS occOccurrenceID, occ.OccurrenceTypeID, occ.Period, occ.HasSucceeded FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID) 
     ) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID =  occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID) 
    ) AS ij3 ON tpa.participantID = ij3.occpParticipantID 
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1 GROUP BY ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period; 

我不知道我在这里做错了什么。我有所有的索引,但我想我缺少宣布一些关键指标,将为我做的伎俩。有趣的是,在迁移之前,我在SQLite上的'研究'表明,与Access相比,SQLite在各个方面都更快,更小,更好。但我似乎无法让SQLite在查询方面比Access更快地工作。我重申,我是SQLite的新手,显然没有太多的想法和经验,所以如果有任何学习的灵魂可以帮助我,这将是非常感激。

+1

该查询令我头疼。我不明白你为什么要做所有的查询(子选择)。你能用英文(而不是SQL)来解释你试图从查询中返回的内容吗?会让你更容易回答你的问题。 – JohnFx 2010-01-06 23:37:25

+0

我将解释每个子选择语句在英语中的作用。由于此评论框只能容纳600个字符,因此我将解释发布为我的问题的答案。 – 2010-01-07 00:11:38

回答

0

我提出了一个较小的缩小版本的查询。希望这比我以前的更清晰明了。

SELECT5 * FROM 
(
SELECT4 FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID 
) AS ij1 INNER JOIN 
(
    SELECT3 * FROM tmpPartArgs AS tpa INNER JOIN 
     (
      SELECT2 * FROM 
       (
        SELECT1 * FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID) 
      ) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID =  occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID) 
    ) AS ij3 ON tpa.participantID = ij3.occpParticipantID 
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1 

,我工作的应用程序是一个模拟的应用程序,为了了解上述查询的方面,我认为有必要,给应用程序的简要说明。让我们假设有一个拥有一些初始资源和生命代理的星球。这个星球被允许存在1000年,并且由代理人执行的行为被监视并存储在数据库中。 1000年后,这颗行星被摧毁,并再次以同样的初始资源和生活代理重新创建,这是第一次。这(创建和销毁)重复了18次,并且在这1000年中执行的所有代理的所有行为都存储在数据库中。因此,我们的整个实验由18个被称为“模拟”的重新创建组成。这个星球18次被重新创建的每一次都被称为一次奔跑,1000年的每一次奔跑都被称为一段时间。所以“模拟”包含18次运行,每次运行包含1000次。在每次运行开始时,我们将“模拟”分配为一组初始知识项目和动态代理,这些知识项目和动态代理可以相互交互并与项目交互。知识项目由代理存储在知识库中。知识库也被认为是我们模拟中的参与实体。但是这个概念(关于知识商店)并不重要。我试图详细说明每个SELECT语句和涉及的表。选择1:我认为这个查询可以替换为'发生'表,因为它没有什么用处。表发生存储代理在特定“模拟”的每次模拟运行的每个周期中采取的不同动作。通常每个“模拟”包含18次运行。每次运行由1000个周期组成。在“模拟”中,代理可以在每次运行的每个时段采取行动。但是“发生”表不存储任何有关执行操作的代理的详细信息。发生表可能存储与多个“模拟”相关的数据。

SELECT2:该查询只是简单地返回“模拟”每次运行的每个周期中执行的操作的细节,以及“模拟”的所有参与者的详细信息,如其各自的ParticipantID。对于模拟的每一个参与实体的OccurrenceParticipant表存储记录,包括代理商,知识存量,知识项目等

选择三:该查询返回只从伪表ij3是由于代理和知识的项目的记录。 ij3中关于知识项目的所有记录都将被过滤掉。

SELECT4:此查询将'Description'字段附加到'InitialParticipant'的每个记录。请注意,'Description'列是整个查询的输出列。 InitialParticipant表包含每个代理和每个知识项的记录,这些记录最初分配给'模拟'。SELECT5:此最终查询返回参与实体的RoleType(可能为代理或知识项)是49或52.

+3

为什么不只是编辑您的问题而不是此答案“? – 2010-01-07 03:31:21

2

我已经重新格式化您的代码(使用我的家庭冲煮sql formatter),希望能够让别人更容易阅读..

重新格式化查询:

SELECT 
    ij1.Description, 
    ij2.occSimulationID, 
    ij2.occSimRunID, 
    ij2.Period, 
    Count(ij2.occpParticipantID) AS CountOfParticipantID 

FROM (

    SELECT 
     ip.ParticipantGroupID AS ipParticipantGroupID, 
     ip.ParticipantID AS ipParticipantID, 
     ip.ParticipantTypeID, 
     pg.ParticipantGroupID AS pgParticipantGroupID, 
     pg.ParticipantGroupTypeID, 
     pg.Description 

    FROM ParticipantGroup AS pg 

    INNER JOIN InitialParticipant AS ip 
      ON pg.ParticipantGroupID = ip.ParticipantGroupID 

) AS ij1 

INNER JOIN (

    SELECT 
     tpa.participantID AS tpaParticipantID, 
     ij3.* 

    FROM tmpPartArgs AS tpa 

    INNER JOIN (

     SELECT 
      ij4.*, 
      occp.SimulationID AS occpSimulationID, 
      occp.SimRunID AS occpSimRunID, 
      occp.OccurrenceID AS occpOccurrenceID, 
      occp.ParticipantID AS occpParticipantID, 
      occp.RoleTypeID 

     FROM (

      SELECT 
       tsa.SimulationID AS tsaSimulationID, 
       tsa.SimRunID AS tsaSimRunID, 
       occ.SimulationID AS occSimulationID, 
       occ.SimRunID AS occSimRunID, 
       occ.OccurrenceID AS occOccurrenceID, 
       occ.OccurrenceTypeID, 
       occ.Period, 
       occ.HasSucceeded 

      FROM tmpSimArgs AS tsa 

      INNER JOIN Occurrence AS occ 
        ON (tsa.SimRunID = occ.SimRunID) 
        AND (tsa.SimulationID = occ.SimulationID) 

     ) AS ij4 

     INNER JOIN OccurrenceParticipant AS occp 
       ON (occOccurrenceID = occpOccurrenceID) 
       AND (occSimRunID = occpSimRunID) 
       AND (occSimulationID = occpSimulationID) 

    ) AS ij3 
     ON tpa.participantID = ij3.occpParticipantID 

) AS ij2 
    ON ij1.ipParticipantID = ij2.occpParticipantID 

WHERE (

    (

     (ij2.RoleTypeID) = 52 
     OR 
     (ij2.RoleTypeID) = 49 

    ) 

) 
    AND ij2.HasSucceeded = 1 

GROUP BY 
    ij1.Description, 
    ij2.occSimulationID, 
    ij2.occSimRunID, 
    ij2.Period; 

作为每JohnFx(上文),I是由派生视图混淆。我认为实际上并不需要它,尤其是因为它们都是内部联接。所以,下面我试图减少复杂性。请检查并测试性能。我不得不使用tmpSimArgs进行交叉连接,因为它只与Occurence连接 - 我认为这是期望的行为。

SELECT 
    pg.Description, 
    occ.SimulationID, 
    occ.SimRunID, 
    occ.Period, 
    COUNT(occp.ParticipantID) AS CountOfParticipantID 

FROM ParticipantGroup AS pg 

INNER JOIN InitialParticipant AS ip 
     ON pg.ParticipantGroupID = ip.ParticipantGroupID 

CROSS JOIN tmpSimArgs AS tsa 

INNER JOIN Occurrence AS occ 
     ON tsa.SimRunID = occ.SimRunID 
     AND tsa.SimulationID = occ.SimulationID 

INNER JOIN OccurrenceParticipant AS occp 
     ON occ.OccurrenceID = occp.OccurrenceID 
     AND occ.SimRunID = occp.SimRunID 
     AND occ.SimulationID = occp.SimulationID 

INNER JOIN tmpPartArgs AS tpa 
     ON tpa.participantID = occp.ParticipantID 

WHERE occ.HasSucceeded = 1 
    AND (occp.RoleTypeID = 52 OR occp.RoleTypeID = 49) 

GROUP BY 
    pg.Description, 
    occ.SimulationID, 
    occ.SimRunID, 
    occ.Period; 
0

我建议移动ij2.RoleTypeID从最外面的查询过滤,ij3,使用IN而不是OR和移动HasSucceeded查询ij4。