2017-06-13 159 views
0

我试图在8个表格之间进行连接,并且因为每个表格都有超过500,000个条目,所以它非常缓慢。我想知道,你有什么最好的方法来加入这些表?加入具有相同结构但数据不同的多个表格

所有表具有这样的结构:

data_temprature:

+----+----------+-----+-----------+----------+ 
| ID_geo | NAME  | Value | Date   | 
+--------+----------+-------+-----------------+ 
| 10005 | Madrid | 32 | 2017-06-12 08:00| 
| 10005 | Madrid | 25 | 2017-06-12 09:00| 
| 12701 | Paris | 23 | 2017-06-12 08:00| 
| 13006 | Tokyo | 25 | 2017-06-12 11:00| 
| 11132 | Sevilla | 27 | 2017-06-12 16:00| 
| 21333 | London | 22 | 2017-06-12 17:00| 
+--------+----------+-------+-----------------+ 

data_WeatherSimbol

+----+----------+-----+-----------+----------+ 
| ID_geo | NAME  | Value | Date   | 
+--------+----------+-------+-----------------+ 
| 10005 | Madrid | A+ | 2017-06-12 08:00| 
| 10005 | Madrid | A | 2017-06-12 09:00| 
| 12701 | Paris | A- | 2017-06-12 08:00| 
| 13006 | Tokyo | C- | 2017-06-12 11:00| 
| 11132 | Sevilla | I+ | 2017-06-12 16:00| 
| 21333 | London | D- | 2017-06-12 17:00| 
+--------+----------+-------+-----------------+ 

我想打一个加盟得到这样的结果:

+----+----------+-----+-----------+----------+-----------------+ 
| ID_geo | NAME  | Temperature | Simboles |  Date  | 
+--------+----------+-------------+----------+-----------------+ 
| 10005 | Madrid |  32  | A+ | 2017-06-12 08:00| 
| 10005 | Madrid |  25  | A  | 2017-06-12 09:00| 
| 12701 | Paris |  23  | A- | 2017-06-12 08:00| 
| 13006 | Tokyo |  25  | C- | 2017-06-12 11:00| 
| 11132 | Sevilla |  27  | I+ | 2017-06-12 16:00| 
| 21333 | London |  22  | D- | 2017-06-12 17:00| 
+--------+----------+-------------+----------+-----------------+ 

ŧ汉克斯

UPDATE REAL数据提供:

执行计划: https://files.fm/u/b4besk27

这是查询:

SELECT 
    cielo.data_value AS cielo, 
    lluv.data_value AS lluvia, 
    temp.data_value AS temp, 
    vientos.data_value AS viento, 
    tmin.data_value AS tempmin, 
    tmax.data_value AS tempmax, 
    cielo.data_date AS DiaPrev 
FROM 
    data_cielo AS cielo 
INNER JOIN data_lluvia AS lluv ON cielo.data_geo = lluv.data_geo 
INNER JOIN data_presion AS pres ON cielo.data_geo = pres.data_geo 
INNER JOIN data_temp AS temp ON cielo.data_geo = temp.data_geo 
LEFT JOIN data_tempmax AS tmax ON cielo.data_geo = tmax.data_geo 
LEFT JOIN data_tempmin AS tmin ON cielo.data_geo = tmin.data_geo 
INNER JOIN data_viento AS vientos ON cielo.data_geo = vientos.data_geo 

WHERE 
    cielo.data_date = lluv.data_date 
AND pres.data_date = cielo.data_date 
AND vientos.data_date = pres.data_date 
AND temp.data_date = vientos.data_date 
AND cielo.data_geo = 46 ORDER BY cielo.data_date; 
and this is the result: 

E+ 0.0461028 29.6937088 S2 19.408 36.39 2017-06-13 12:00:00.000 
E+ 0.0461028 29.6937088 S2 21.422 36.39 2017-06-13 12:00:00.000 
E+ 0.0461028 29.6937088 S2 19.408 37.853 2017-06-13 12:00:00.000 
E+ 0.0461028 29.6937088 S2 21.422 37.853 2017-06-13 12:00:00.000 
E+ 0.0461028 30.7593854 S2 19.408 36.39 2017-06-13 13:00:00.000 
E+ 0.0461028 30.7593854 S2 21.422 36.39 2017-06-13 13:00:00.000 
E+ 0.0461028 30.7593854 S2 19.408 37.853 2017-06-13 13:00:00.000 
E+ 0.0461028 30.7593854 S2 21.422 37.853 2017-06-13 13:00:00.000 
A+ 0.0461028 31.6310774 SSW2 19.408 36.39 2017-06-13 14:00:00.000 
A+ 0.0461028 31.6310774 SSW2 21.422 36.39 2017-06-13 14:00:00.000 
A+ 0.0461028 31.6310774 SSW2 19.408 37.853 2017-06-13 14:00:00.000 
A+ 0.0461028 31.6310774 SSW2 21.422 37.853 2017-06-13 14:00:00.000 
A 0.0461028 32.2647927 S2 19.408 36.39 2017-06-13 15:00:00.000 
A 0.0461028 32.2647927 S2 21.422 36.39 2017-06-13 15:00:00.000 
A 0.0461028 32.2647927 S2 19.408 37.853 2017-06-13 15:00:00.000 

它should't做出这样这,我需要的resualt就像我所说的温度图,压力,Percipitation,天空中的每个小时的数据值,......

+0

国际海事组织,糟糕的设计没有任何规范化。 –

+0

@PrabhatG这是因为它从txt文件插入8个表(8个计量变量),我不知道他们为什么要这样设计它,但这就是它的任何建议? –

+0

尝试在ID_Geo上创建索引。这会减少查询执行时间。 – Debabrata

回答

0

我想你可以刚刚加入的地理和日期:

select t.*, ws.simboles 
from data_temperature t join 
    data_WeatherSimbol ws 
    on t.ID_geo = ws.ID_geo and t.date = ws.date; 
+0

这就是它的超级慢的问题 –

+0

为什么'join'会“超级慢”? –

+0

我猜是因为对很多连接而言,最好从这些表中查看视图?或者用索引集群管理它? –

0

试试这个

;With data_temprature(ID_geo,NAME,Value,[Date]) 
AS 
(
SELECT 10005 , 'Madrid' , 32 , '2017-06-12 08:00' Union all 
SELECT 10005 , 'Madrid' , 25 , '2017-06-12 09:00' Union all 
SELECT 12701 , 'Paris' , 23 , '2017-06-12 08:00' Union all 
SELECT 13006 , 'Tokyo' , 25 , '2017-06-12 11:00' Union all 
SELECT 11132 , 'Sevilla' , 27 , '2017-06-12 16:00' Union all 
SELECT 21333 , 'London' , 22 , '2017-06-12 17:00' 
) 
,data_WeatherSimbol(ID_geo,NAME,Value,[Date]) 
AS 
(
SELECT 10005 , 'Madrid' , 'A+' , '2017-06-12 08:00' Union all 
SELECT 10005 , 'Madrid' , 'A' , '2017-06-12 09:00' Union all 
SELECT 12701 , 'Paris' , 'A-' , '2017-06-12 08:00' Union all 
SELECT 13006 , 'Tokyo' , 'C-' , '2017-06-12 11:00' Union all 
SELECT 11132 , 'Sevilla' , 'I+' , '2017-06-12 16:00' Union all 
SELECT 21333 , 'London' , 'D-' , '2017-06-12 17:00' 
) 
SELECT ID_geo, 
     NAME, 
     Temperature, 
     Symboles, 
     [Date] From 
(
SELECT t.ID_geo , 
     t.NAME , 
     t.Value AS Temperature, 
     w.Value AS Symboles,t.[Date] , 
     ROW_NUMBER()OVER(PARTITION BY t.Value,t.[Date] ORDER BY t.[Date]) AS Rno 
FROM data_temprature t 
INNER join data_WeatherSimbol w 
On t.ID_geo=w.ID_geo 
)Dt 
WHERE Dt.Rno=1 
ORDER BY ID_geo 
0

无论[ID_geo]也不[Date]似乎是不够的独特的加盟,让:

  1. 创建的两列的索引的所有表像

    create index IX_data_temprature on data_temprature ([ID_geo], [Date])

  2. 通过[ID_geo]加入所有的表,[Date]

0

大部分查询的负载是由RID引起查找。

当索引没有包含查询时,SID查找被使用(Sql必须查找表中的值,因为它们不包含在索引中)并且索引是非集群的。

如果您使用覆盖索引,则查询速度可能会更快,您可能未在索引中包含值。更多关于包括可以在Microsoft docs中找到。

如果您将非聚簇索引更改为聚簇索引,它也可能有所帮助。

相关问题