Postgres分区修剪

我在Postgres有一张大桌子。Postgres分区修剪

表名是bigtable，列有：

integer |timestamp |xxx |xxx |...|xxx 
category_id|capture_time|col1|col2|...|colN

我已经划分的capture_time列CATEGORY_ID和日期部分的模10表。

的分区表是这样的：

CREATE TABLE myschema.bigtable_d000h0(
    CHECK (category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable); 

CREATE TABLE myschema.bigtable_d000h1(
    CHECK (category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable);

当我运行在where子句中使用CATEGORY_ID和capture_time查询，预期分区不修剪。

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100; 

"Result (cost=0.00..9476.87 rows=1933 width=216)" 
" -> Append (cost=0.00..9476.87 rows=1933 width=216)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..1921.63 rows=1923 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h1 bigtable (cost=0.00..776.93 rows=1 width=218)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h2 bigtable (cost=0.00..974.47 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h3 bigtable (cost=0.00..1351.92 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h4 bigtable (cost=0.00..577.04 rows=1 width=217)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h5 bigtable (cost=0.00..360.67 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h6 bigtable (cost=0.00..1778.18 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h7 bigtable (cost=0.00..315.82 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h8 bigtable (cost=0.00..372.06 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h9 bigtable (cost=0.00..1048.16 rows=1 width=215)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"

但是，如果我在where子句中添加的确切模标准（category_id%10=0），它完美

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100 and category_id%10=0; 

"Result (cost=0.00..2154.09 rows=11 width=215)" 
" -> Append (cost=0.00..2154.09 rows=11 width=215)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..2154.09 rows=10 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"

有没有什么办法让分区修剪工作正常，而无需添加模每个查询中的条件？

来源

2012-04-03 Dojo

？我认为规划师在9.x – 2012-04-03 17:31:16

中进行了分区方面的一些改进。您可以使约束略少一点：'CHECK（category_id％10 = 1 AND date_trunc（'month'，capture_time）='2012-01 -01':: date）' – 2012-04-03 17:40:16

@a_horse_with_no_name我正在使用9.1 – Dojo 2012-04-03 17:47:19

对于任何人谁具有相同的问题：我得出的结论是，从最简单的方法是改变的查询，包括您正在使用哪个版本的模条件category_id%10=0

来源

2012-04-08 06:38:33 Dojo

事情是：用于排除约束PostgreSQL will create an implicit index。在你的情况下，这个索引将是一个部分，'因为你在列上使用expresion，而不仅仅是它的价值。而且它在documentation规定（寻找11-2例）：

PostgreSQL没有复杂的理论校能够识别那些形式不同但数学上等价的表达。（不仅是这样的一般定理证明者极难创建，它可能太慢而不能真正用到）。系统可以识别简单的不等式含义，例如“x < 1”意味着“x < 2”; 否则谓词条件必须与查询的WHERE条件的一部分完全匹配，否则索引将不会被识别为可用。匹配发生在查询计划时间，而不是在运行时。

因此，您的结果 - 您应该有创建CHECK约束时所使用的完全相同的表达式。

对于基于散列的分区我更喜欢2点的方法：

添加可（在壳体10）取一组有限值中的一个字段，最好在由设计存在这样一个;
指定哈希范围指定时间戳以同样的方式范围：MINVALUE < = CATEGORY_ID < MAXVALUE

此外，还可以创建一个2级分区：第一个层次，你

根据category_id HASH创建10个分区;
在第二级上，您可以根据日期范围创建必要数量的分区。

尽管我总是试图只使用1列进行分区，但更容易管理。

来源

2012-04-03 19:57:02 vyegorov

感谢您的输入。我发布的代码是使用1级继承进行2级分区。性能方面，它比实际的2级继承运行得更快。我知道它应该比另一种方式更快（您建议的方式），因为在第一级检查的表数量较少，而在下一级，只有从合格的第一级表继承的表必须被扫描。但实际上它比较慢。 – Dojo 2012-04-08 06:24:00

减慢分区的是分区修剪逻辑，而不是实际的表扫描。在这两种情况下，优化器都会正确地修剪表，但是在2级继承的情况下决定要修剪哪些分区需要更长的时间。 – Dojo 2012-04-08 06:27:40

关于2级分区的一个有趣的事情是，您可以查询一级表，现在修剪二级分区需要更少的时间。我可以使用它来以不妨碍性能的方式对数据进行存档。 – Dojo 2012-04-08 06:31:43

Postgres分区修剪

回答

相关问题