从Influx转到Postgres，需要提示

我用Influx来存储我们的时间序列数据。它工作的时候很酷，然后大约一个月后，它停止工作，我不知道为什么。（类似于这个问题https://github.com/influxdb/influxdb/issues/1386）从Influx转到Postgres，需要提示

也许Influx将有一天会很棒，但现在我需要使用更稳定的东西。我在想Postgres。我们的数据来自许多传感器，每个传感器都有一个传感器ID。所以我想我们的结构化数据，这样的：

（PK），sensorId（串），时间（时间戳），价值（浮动）

涌入是专为时间序列数据，因此它可能有一些内置优化。我是否需要自己进行优化以使Postgres高效？更具体地说，我有这些问题：

Influx拥有'系列'这个概念，创建新系列很便宜。所以我对每个传感器都有一个单独的系列。我应该为每个传感器创建一个单独的Postgres表格吗？
我应该如何设置索引来快速查询？一个典型的查询是：在过去的3天中选择sensor123的所有数据。
我应该在时间列中使用时间戳还是整数？
如何设置保留策略？例如。删除超过一周的数据。
Will Postgres会水平放大吗？我可以设置ec2群集进行数据复制和负载平衡吗？
可以在Postgres中下载样本吗？我读过一些我可以使用date_trunc的文章。但似乎我无法将它date_trunc到特定的时间间隔，例如25秒。
我错过了其他的注意事项吗？

在此先感谢！

更新将时间列存储为大整数比将其存储为时间戳更快。难道我做错了什么？

把它作为时间戳：

postgres=# explain analyze select * from test where sensorid='sensor_0'; 

Bitmap Heap Scan on test (cost=3180.54..42349.98 rows=75352 width=25) (actual time=10.864..19.604 rows=51840 loops=1) 
    Recheck Cond: ((sensorid)::text = 'sensor_0'::text) 
    Heap Blocks: exact=382 
    -> Bitmap Index Scan on sensorindex (cost=0.00..3161.70 rows=75352 width=0) (actual time=10.794..10.794 rows=51840 loops=1) 
     Index Cond: ((sensorid)::text = 'sensor_0'::text) 
Planning time: 0.118 ms 
Execution time: 22.984 ms 

postgres=# explain analyze select * from test where sensorid='sensor_0' and addedtime > to_timestamp(1430939804); 

Bitmap Heap Scan on test (cost=2258.04..43170.41 rows=50486 width=25) (actual time=22.375..27.412 rows=34833 loops=1) 
    Recheck Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > '2015-05-06 15:16:44-04'::timestamp with time zone)) 
    Heap Blocks: exact=257 
    -> Bitmap Index Scan on sensorindex (cost=0.00..2245.42 rows=50486 width=0) (actual time=22.313..22.313 rows=34833 loops=1) 
     Index Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > '2015-05-06 15:16:44-04'::timestamp with time zone)) 
Planning time: 0.362 ms 
Execution time: 29.290 ms

把它作为大整数：

postgres=# explain analyze select * from test where sensorid='sensor_0'; 


Bitmap Heap Scan on test (cost=3620.92..42810.47 rows=85724 width=25) (actual time=12.450..19.615 rows=51840 loops=1) 
    Recheck Cond: ((sensorid)::text = 'sensor_0'::text) 
    Heap Blocks: exact=382 
    -> Bitmap Index Scan on sensorindex (cost=0.00..3599.49 rows=85724 width=0) (actual time=12.359..12.359 rows=51840 loops=1) 
     Index Cond: ((sensorid)::text = 'sensor_0'::text) 
Planning time: 0.130 ms 
Execution time: 22.331 ms 

postgres=# explain analyze select * from test where sensorid='sensor_0' and addedtime > 1430939804472; 


Bitmap Heap Scan on test (cost=2346.57..43260.12 rows=52489 width=25) (actual time=10.113..14.780 rows=31839 loops=1) 
    Recheck Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > 1430939804472::bigint)) 
    Heap Blocks: exact=235 
    -> Bitmap Index Scan on sensorindex (cost=0.00..2333.45 rows=52489 width=0) (actual time=10.059..10.059 rows=31839 loops=1) 
     Index Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > 1430939804472::bigint)) 
Planning time: 0.154 ms 
Execution time: 16.589 ms

来源

2015-05-03 user1657624

你的问题是**方式过于宽泛**，触及多个问题，而不是遵循SO在编程问题上提出具体问题的实践，指定你自己做了什么。我建议你编辑这个帖子，在适当的论坛上提出具体问题并发布其他问题的其他问题（例如Q.5属于dba.stackexchange）。 – Patrick

对于每个版本，只有一次运行时，16ms与29ms是无法证明“* integer快于时间戳*”的。（小）差异很可能是由系统中的缓存或其他事情引起的（例如，您应该使用'explain（analyze，verbose，buffers）'重复陈述） –

我多次重复该语句，整数总是更快比时间戳。但是，如果我不做to_timestamp（1430939804），而是事先转换它，那么它就像整数一样快。也许to_timestamp被多次调用并且没有优化？ – user1657624

你不应该为每个传感器创建一个表。相反，您可以在表中添加一个字段来标识它所在的系列。您还可以使用另一个表来描述有关该系列的其他属性。如果数据点可能属于多个系列，那么您需要完全不同的结构。

对于在Q2中描述的查询，您recorded_at列的索引应该工作（时间是SQL保留关键字，所以最好避免，作为一个名字）

你应该使用TIMESTAMP WITH TIME ZONE为您的时间数据类型。

保留取决于您。

Postgres有多种分片/复制选项。这是一个很大的话题。

不知道我理解你的目标＃6，但我相信你可以找出一些东西。

来源

2015-05-03 23:55:18 Bill

谢谢比尔。如果我有数百个传感器每5秒向Postgres发送一次数据点，您是否会看到潜在的性能问题？（这是每天约800万点）如果我需要从数百个系列中选择一个系列，每个系列包含数十万个点，该怎么办？我会进行一些测试，但我也很看重你的意见。 – user1657624

你可能想看看这样的东西。 http://zaiste.net/2014/07/table_inheritance_and_partitioning_with_postgresql/你可以根据你的分割时间。然后删除旧数据，只需删除适当的子表。我想你会发现Postgres在性能方面会让你大吃一惊。 – Bill

我用100个传感器进行了测试，每个传感器增加50k点，性能非常合理。唯一有点慢的是批量插入数据库的部分。我在（sensorId，recorded_at）上创建了一个索引，然后使用COPY插入点，并且花了2分钟添加所有点。这是正常的吗？另一件事是，将records_at存储为大整数比将其存储为时间戳要快。我用psql的输出更新了原来的问题。 – user1657624

从Influx转到Postgres，需要提示

回答

相关问题