2

我跑的Postgres 9.2,并有温度和时间标记,每分钟一个时间戳毫秒时代时间的表:Postgres的:获得最大值和最小值,和时间戳他们发生

weather=# \d weather_data 
     Table "public.weather_data" 
    Column |  Type  | Modifiers 
-------------+--------------+----------- 
timestamp | bigint  | not null 
sensor_id | integer  | not null 
temperature | numeric(4,1) | 
humidity | integer  | 
date  | date   | not null 
Indexes: 
    "weather_data_pkey" PRIMARY KEY, btree ("timestamp", sensor_id) 
    "weather_data_date_idx" btree (date) 
    "weather_data_humidity_idx" btree (humidity) 
    "weather_data_sensor_id_idx" btree (sensor_id) 
    "weather_data_temperature_idx" btree (temperature) 
    "weather_data_time_idx" btree ("timestamp") 
Foreign-key constraints: 
    "weather_data_sensor_id_fkey" FOREIGN KEY (sensor_id) REFERENCES weather_sensors(sensor_id) 

weather=# select * from weather_data order by timestamp desc; 
    timestamp | sensor_id | temperature | humidity | date  
---------------+-----------+-------------+----------+------------ 
1483272420000 |   2 |  22.3 |  57 | 2017-01-01 
1483272420000 |   1 |  24.9 |  53 | 2017-01-01 
1483272360000 |   2 |  22.3 |  57 | 2017-01-01 
1483272360000 |   1 |  24.9 |  58 | 2017-01-01 
1483272300000 |   2 |  22.4 |  57 | 2017-01-01 
1483272300000 |   1 |  24.9 |  57 | 2017-01-01 
[...] 

我有这个现有的查询得到的高点和每一天的低点,但不是具体时间是高还是低发生:

WITH t AS (
    SELECT date, highest, lowest 
    FROM (
     SELECT date, max(temperature) AS highest 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
     GROUP BY date 
     ORDER BY date ASC 
    ) h 
    INNER JOIN (
     SELECT date, min(temperature) AS lowest 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
     GROUP BY date 
     ORDER BY date ASC 
    ) l 
    USING (date) 
    ORDER BY date DESC 
) 
SELECT * from t ORDER BY date ASC; 

有一点超过两个百万行的数据库,它需要〜1.2秒运行,这不是 太糟糕了。我想现在得到的具体时间,高或低的是,我想出了这个利用窗口函数,这确实工作,但需要〜5.6秒时:

SELECT h.date, high_time, high_temp, low_time, low_temp FROM (
    SELECT date, high_temp, high_time FROM (
     SELECT date, temperature AS high_temp, timestamp AS high_time, row_number() 
     OVER (PARTITION BY date ORDER BY temperature DESC, timestamp DESC) 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
    ) highs 
    WHERE row_number = 1 
) h 
INNER JOIN (
    SELECT * FROM (
     SELECT date, temperature AS low_temp, timestamp AS low_time, row_number() 
     OVER (PARTITION BY date ORDER BY temperature ASC, timestamp DESC) 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
    ) lows 
    WHERE row_number = 1 
) l 
ON h.date = l.date 
ORDER BY h.date ASC; 

有一些相对简单的除我可以做的第一个查询不会增加大量的执行时间?我假设有,但我认为我处于这个问题太久的地步了!

+1

[的PostgreSQL可能的复制 - 获取行具有最大值的列](http://stackoverflow.com/questions/586781/postgresql-fetch-the-row-which-has-the-max-value- for-a-column) – Joe

+1

不相关,但是:第一个查询中派生表中的“order by”无用 –

+0

@a_horse_with_no_name注意,谢谢! – VirtualWolf

回答

2
SELECT 
     DISTINCT ON (zdate) zdate 
     , first_value(ztimestamp) OVER www AS stamp_at_min 
     , first_value(temperature) OVER www AS tmin 
     , last_value(ztimestamp) OVER www AS stamp_at_max 
     , last_value(temperature) OVER www AS tmax 
FROM weather_data 
WHERE sensor_id = 2 
WINDOW www AS (PARTITION BY zdate ORDER BY temperature, ztimestamp 
       ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
       ) 
     ; 

  • 前缀ž日期和ž时间戳
  • 我加ztimestamp到排序作为决胜
+0

很好用,谢谢!是否有任何额外的索引相关的技巧可以做到加快速度(需要大约3.7秒的时间来运行),还是在没有太多可以针对这类事情进行优化的地方? – VirtualWolf

+0

您的表基本上有两个候选键:您的PK和可能{zdate,sensor_id,温度,...},这不完全是唯一的。无论如何,我认为你应该摆脱单列索引。 zdate *可以*在功能上依赖于ztimestamp(其中*可以是时间戳而不是int) – wildplasser

+0

获取单列索引的_rid_吗?有趣。我有一些其他(更简单)不相关的查询,我在这张表上运行,我猜测最终会变得很慢而没有索引,不是吗? – VirtualWolf

2

这确实与您的第二个查询,但只需要在weather_data表中的单个扫描:

select date, 
     max(case when high_rn = 1 then timestamp end) as high_time, 
     max(case when high_rn = 1 then temperature end) as high_temp, 
     max(case when low_rn = 1 then timestamp end) as low_time, 
     max(case when low_rn = 1 then temperature end) as low_temp 
from (
    select timestamp, temperature, date, 
     row_number() OVER (PARTITION BY date ORDER BY temperature DESC, timestamp DESC) as high_rn, 
     row_number() OVER (PARTITION BY date ORDER BY temperature ASC, timestamp DESC) as low_rn 
    from weather_data 
    where sensor_id = ... 
) t 
where (high_rn = 1 or low_rn = 1) 
group by date; 

它使用条件聚集做一个交叉表(又名“转动”)的结果查询只包含最低和最高温度。


无关,而是:datetimestamp是列名可怕。首先是因为它们是关键字,但更重要的是因为它们没有记录列的实际含义。它是“到期日期”吗? “阅读日期”? “处理日期”?

+0

谢谢!这个运行需要5.2秒,而上面的则需要3.7秒。列名是读取特定温度读数的所有时间和日期,所以我想读取日期和读取时间。这是一个个人项目,只是我的工作(只需保持我家内外的当前温度)。 :) – VirtualWolf

+0

呵呵,我只记得我需要加一个'温度!= 21.8',因为温度传感器偶尔会变得奇怪,并且发送21.8的值给我的应用程序。为窗口函数添加一个子查询后,运行到@ wildplasser的查询,并向您的用户添加简单的“where temperature!= 21.8”,它们现在都在彼此的大约100ms内! – VirtualWolf

相关问题