2

我在Postgres有一张大桌子。Postgres分区修剪

表名是bigtable,列有:

integer |timestamp |xxx |xxx |...|xxx 
category_id|capture_time|col1|col2|...|colN 

我已经划分的capture_time列CATEGORY_ID和日期部分的模10表。

的分区表是这样的:

CREATE TABLE myschema.bigtable_d000h0(
    CHECK (category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable); 

CREATE TABLE myschema.bigtable_d000h1(
    CHECK (category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable); 

当我运行在where子句中使用CATEGORY_ID和capture_time查询,预期分区不修剪。

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100; 

"Result (cost=0.00..9476.87 rows=1933 width=216)" 
" -> Append (cost=0.00..9476.87 rows=1933 width=216)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..1921.63 rows=1923 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h1 bigtable (cost=0.00..776.93 rows=1 width=218)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h2 bigtable (cost=0.00..974.47 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h3 bigtable (cost=0.00..1351.92 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h4 bigtable (cost=0.00..577.04 rows=1 width=217)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h5 bigtable (cost=0.00..360.67 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h6 bigtable (cost=0.00..1778.18 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h7 bigtable (cost=0.00..315.82 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h8 bigtable (cost=0.00..372.06 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h9 bigtable (cost=0.00..1048.16 rows=1 width=215)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 

但是,如果我在where子句中添加的确切模标准(category_id%10=0),它完美

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100 and category_id%10=0; 

"Result (cost=0.00..2154.09 rows=11 width=215)" 
" -> Append (cost=0.00..2154.09 rows=11 width=215)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..2154.09 rows=10 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))" 

有没有什么办法让分区修剪工作正常,而无需添加模每个查询中的条件?

+0

?我认为规划师在9.x – 2012-04-03 17:31:16

+0

中进行了分区方面的一些改进。您可以使约束略少一点:'CHECK(category_id%10 = 1 AND date_trunc('month',capture_time)='2012-01 -01':: date)' – 2012-04-03 17:40:16

+0

@a_horse_with_no_name我正在使用9.1 – Dojo 2012-04-03 17:47:19

回答

1

对于任何人谁具有相同的问题: 我得出的结论是,从最简单的方法是改变的查询,包括您正在使用哪个版本的模条件category_id%10=0

4

事情是:用于排除约束PostgreSQL will create an implicit index。在你的情况下,这个索引将是一个部分,'因为你在列上使用expresion,而不仅仅是它的价值。而且它在documentation规定(寻找11-2例):

PostgreSQL没有复杂的理论校能够识别那些形式不同但数学上等价的表达。 (不仅是这样的一般定理证明者极难创建,它可能太慢而不能真正用到)。系统可以识别简单的不等式含义,例如“x < 1”意味着“x < 2”; 否则谓词条件必须与查询的WHERE条件的一部分完全匹配,否则索引将不会被识别为可用。匹配发生在查询计划时间,而不是在运行时。

因此,您的结果 - 您应该有创建CHECK约束时所使用的完全相同的表达式。

对于基于散列的分区我更喜欢2点的方法:

  • 添加可(在壳体10)取一组有限值中的一个字段,最好在由设计存在这样一个;
  • 指定哈希范围指定时间戳以同样的方式范围:MINVALUE < = CATEGORY_ID < MAXVALUE

此外,还可以创建一个2级分区:第一个层次,你

  • 根据category_id HASH创建10个分区;
  • 在第二级上,您可以根据日期范围创建必要数量的分区。

尽管我总是试图只使用1列进行分区,但更容易管理。

+0

感谢您的输入。我发布的代码是使用1级继承进行2级分区。性能方面,它比实际的2级继承运行得更快。我知道它应该比另一种方式更快(您建议的方式),因为在第一级检查的表数量较少,而在下一级,只有从合格的第一级表继承的表必须被扫描。但实际上它比较慢。 – Dojo 2012-04-08 06:24:00

+0

减慢分区的是分区修剪逻辑,而不是实际的表扫描。在这两种情况下,优化器都会正确地修剪表,但是在2级继承的情况下决定要修剪哪些分区需要更长的时间。 – Dojo 2012-04-08 06:27:40

+0

关于2级分区的一个有趣的事情是,您可以查询一级表,现在修剪二级分区需要更少的时间。我可以使用它来以不妨碍性能的方式对数据进行存档。 – Dojo 2012-04-08 06:31:43