2009-11-11 148 views
0

我已经从一些渠道获得温度采样的表随着时间的推移,我想找到的最小,最大和平均温度在所有数据源在设定的时间间隔。乍一看,这是很容易,像这样做:自联接,交叉联接和分组

SELECT MIN(temp), MAX(temp), AVG(temp) FROM samples GROUP BY time; 

然而,事情变得更加复杂(给我难倒点在哪里!)如果源下降进出而非期间忽略丢失的来源有问题的间隔我想使用来源的最后知道的温度为缺失的样本。使用日期时间和建设的时间间隔(比如每分钟)跨分布不均随着时间的推移进一步样品复杂的事情。

我认为应该可以通过在样本表上进行自联接来创建结果,其中第一个表的时间大于或等于第二个表的时间,然后计算聚合值对于按源分组的行。然而,我很难理解如何真正做到这一点。

这里是我的测试表:

+------+------+------+ 
| time | source | temp | 
+------+------+------+ 
| 1 | a | 20 | 
| 1 | b | 18 | 
| 1 | c | 23 | 
| 2 | b | 21 | 
| 2 | c | 20 | 
| 2 | a | 18 | 
| 3 | a | 16 | 
| 3 | c | 13 | 
| 4 | c | 15 | 
| 4 | a | 4 | 
| 4 | b | 31 | 
| 5 | b | 10 | 
| 5 | c | 16 | 
| 5 | a | 22 | 
| 6 | a | 18 | 
| 6 | b | 17 | 
| 7 | a | 20 | 
| 7 | b | 19 | 
+------+------+------+ 
INSERT INTO samples (time, source, temp) VALUES (1, 'a', 20), (1, 'b', 18), (1, 'c', 23), (2, 'b', 21), (2, 'c', 20), (2, 'a', 18), (3, 'a', 16), (3, 'c', 13), (4, 'c', 15), (4, 'a', 4), (4, 'b', 31), (5, 'b', 10), (5, 'c', 16), (5, 'a', 22), (6, 'a', 18), (6, 'b', 17), (7, 'a', 20), (7, 'b', 19); 

要尽我的最大,最小和平均计算,我想在中间表看起来像这样:

+------+------+------+ 
| time | source | temp | 
+------+------+------+ 
| 1 | a | 20 | 
| 1 | b | 18 | 
| 1 | c | 23 | 
| 2 | b | 21 | 
| 2 | c | 20 | 
| 2 | a | 18 | 
| 3 | a | 16 | 
| 3 | b | 21 | 
| 3 | c | 13 | 
| 4 | c | 15 | 
| 4 | a | 4 | 
| 4 | b | 31 | 
| 5 | b | 10 | 
| 5 | c | 16 | 
| 5 | a | 22 | 
| 6 | a | 18 | 
| 6 | b | 17 | 
| 6 | c | 16 | 
| 7 | a | 20 | 
| 7 | b | 19 | 
| 7 | c | 16 | 
+------+------+------+ 

下面的查询让我靠近我想要什么,但它需要源的第一个结果的温度值,而不是在给定的时间间隔最近的一个:

SELECT s.dt as sdt, s.mac, ss.temp, MAX(ss.dt) as maxdt FROM (SELECT DISTINCT dt FROM samples) AS s CROSS JOIN samples AS ss WHERE s.dt >= ss.dt GROUP BY sdt, mac HAVING maxdt <= s.dt ORDER BY sdt ASC, maxdt ASC; 

+------+------+------+-------+ 
| sdt | mac | temp | maxdt | 
+------+------+------+-------+ 
| 1 | a | 20 |  1 | 
| 1 | c | 23 |  1 | 
| 1 | b | 18 |  1 | 
| 2 | a | 20 |  2 | 
| 2 | c | 23 |  2 | 
| 2 | b | 18 |  2 | 
| 3 | b | 18 |  2 | 
| 3 | a | 20 |  3 | 
| 3 | c | 23 |  3 | 
| 4 | a | 20 |  4 | 
| 4 | c | 23 |  4 | 
| 4 | b | 18 |  4 | 
| 5 | a | 20 |  5 | 
| 5 | c | 23 |  5 | 
| 5 | b | 18 |  5 | 
| 6 | c | 23 |  5 | 
| 6 | a | 20 |  6 | 
| 6 | b | 18 |  6 | 
| 7 | c | 23 |  5 | 
| 7 | b | 18 |  7 | 
| 7 | a | 20 |  7 | 
+------+------+------+-------+ 

更新:(!伟大的名字,顺便说一句) chadhoc给出了一个很好的解决方案,遗憾的是没有在MySQL的工作,因为它不支持他所使用的FULL JOIN。幸运的是,我相信一个简单的UNION是一种有效的替代:

-- Unify the original samples with the missing values that we've calculated 
(
    SELECT time, source, temp 
    FROM samples 
) 
UNION 
(-- Pull all the time/source combinations that we are missing from the sample set, along with the temp 
    -- from the last sampled interval for the same time/source combination if we do not have one 
    SELECT a.time, a.source, (SELECT t2.temp FROM samples AS t2 WHERE t2.time < a.time AND t2.source = a.source ORDER BY t2.time DESC LIMIT 1) AS temp 
    FROM  
    (-- All values we want to get should be a cross of time/temp 
    SELECT t1.time, s1.source 
    FROM 
    (SELECT DISTINCT time FROM samples) AS t1 
    CROSS JOIN 
    (SELECT DISTINCT source FROM samples) AS s1 
) AS a 
    LEFT JOIN samples s 
    ON a.time = s.time 
    AND a.source = s.source 
    WHERE s.source IS NULL 
) 
ORDER BY time, source; 

更新2:的MySQL提供了以下EXPLAIN输出chadhoc代码:

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+ 
| id | select_type  | table  | type | possible_keys | key | key_len | ref | rows | Extra      | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+ 
| 1 | PRIMARY   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 |        | 
| 2 | UNION    | <derived4> | ALL | NULL   | NULL | NULL | NULL | 21 |        | 
| 2 | UNION    | s   | ALL | NULL   | NULL | NULL | NULL | 18 | Using where     | 
| 4 | DERIVED   | <derived6> | ALL | NULL   | NULL | NULL | NULL | 3 |        | 
| 4 | DERIVED   | <derived5> | ALL | NULL   | NULL | NULL | NULL | 7 |        | 
| 6 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary    | 
| 5 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary    | 
| 3 | DEPENDENT SUBQUERY | t2   | ALL | NULL   | NULL | NULL | NULL | 18 | Using where; Using filesort | 
| NULL | UNION RESULT  | <union1,2> | ALL | NULL   | NULL | NULL | NULL | NULL | Using filesort    | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+ 

我能得到查尔斯的代码工作像这样:

SELECT T.time, S.source, 
    COALESCE(
    D.temp, 
    (
     SELECT temp FROM samples 
     WHERE source = S.source AND time = (
     SELECT MAX(time) 
     FROM samples 
     WHERE 
      source = S.source 
      AND time < T.time 
    ) 
    ) 
) AS temp 
FROM (SELECT DISTINCT time FROM samples) AS T 
CROSS JOIN (SELECT DISTINCT source FROM samples) AS S 
    LEFT JOIN samples AS D 
ON D.source = S.source AND D.time = T.time 

它的解释是:

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+ 
| id | select_type  | table  | type | possible_keys | key | key_len | ref | rows | Extra   | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+ 
| 1 | PRIMARY   | <derived5> | ALL | NULL   | NULL | NULL | NULL | 3 |     | 
| 1 | PRIMARY   | <derived4> | ALL | NULL   | NULL | NULL | NULL | 7 |     | 
| 1 | PRIMARY   | D   | ALL | NULL   | NULL | NULL | NULL | 18 |     | 
| 5 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary | 
| 4 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary | 
| 2 | DEPENDENT SUBQUERY | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using where  | 
| 3 | DEPENDENT SUBQUERY | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using where  | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+ 

回答

1

我想你会在mySql中使用排名/窗口函数获得更好的性能,但不幸的是我不知道那些以及TSQL实现。下面是一个符合ANSI标准的解决方案,虽然工作:

-- Full join across the sample set and anything missing from the sample set, pulling the missing temp first if we do not have one 
select coalesce(c1.[time], c2.[time]) as dt, coalesce(c1.source, c2.source) as source, coalesce(c2.temp, c1.temp) as temp 
from samples c1 
full join (-- Pull all the time/source combinations that we are missing from the sample set, along with the temp 
      -- from the last sampled interval for the same time/source combination if we do not have one 
      select a.time, a.source, 
        (select top 1 t2.temp from samples t2 where t2.time < a.time and t2.source = a.source order by t2.time desc) as temp 
      from  
       ( -- All values we want to get should be a cross of time/samples 
        select t1.[time], s1.source 
        from 
        (select distinct [time] from samples) as t1 
        cross join 
        (select distinct source from samples) as s1 
       ) a 
      left join samples s 
      on a.[time] = s.time 
      and a.source = s.source 
      where s.source is null 
     ) c2 
on c1.time = c2.time 
and c1.source = c2.source 
order by dt, source 
0

我知道这看起来很复杂,但它的格式来解释自己...... 它应该工作...希望你只有三个来源...如果你有源比这个任意数量将无法正常工作......在这种情况下,看到第二个查询... 编辑:删除第一次尝试

编辑:如果你不知道来源的时间提前,你必须做,你创建一个中间结果集“填补”缺失值东西.. 这样的事情:

第二次编辑:通过移动逻辑删除需要合并,以检索每个来源的最新临时读数从Select条款进入连接条件。

Select T.Time, Max(Temp) MaxTemp, 
    Min(Temp) MinTemp, Avg(Temp) AvgTemp 
From 
    (Select T.TIme, S.Source, D.Temp 
    From (Select Distinct Time From Samples) T 
    Cross Join 
     (Select Distinct Source From Samples) S 
    Left Join Samples D 
     On D.Source = S.Source 
      And D.Time = 
       (Select Max(Time) 
       From Samples 
       Where Source = S.Source 
        And Time <= T.Time)) Z 
Group By T.Time 
+0

谢谢,查尔斯,但您的解决方案假定所有来源都提前知道。当他们不知道时你有什么建议吗? – pr1001 2009-11-11 22:28:29

+0

如果您不知道源文件,则添加另一个sql查询... – 2009-11-12 00:50:57

+0

将IsNull更改为COALESCE后,我能够使查询在我的MySQL数据库上工作。谢谢。 – pr1001 2009-11-12 01:24:02