Postgres的错误的查询计划

该查询运行在Postgres的速度很慢：Postgres的错误的查询计划

SELECT 
    class_service.name AS "classServiceName", 
    market.name AS "marketName", 
    market_pricing.day_x AS "dayX", 
    station_1.iata AS "odDestination", 
    coalesce(market_pricing.availability, -1) AS "marketAvailability",   
    station_2.iata AS "odOrigin" 
FROM market_pricing 
JOIN class_service ON class_service.id = market_pricing.class_service_id 
JOIN market ON market.id = market_pricing.market_id 
JOIN od ON market.id = od.market_id 
JOIN train_stop AS train_stop_1 ON train_stop_1.id = od.stop_destination_id 
JOIN station AS station_1 ON train_stop_1.station_id = station_1.id 
JOIN train_stop AS train_stop_2 ON train_stop_2.id = od.stop_origin_id 
JOIN station AS station_2 ON train_stop_2.station_id = station_2.id 
JOIN train ON train.id = market.train_id 
WHERE train.departure_date IN ('2016-01-16') 
AND train.train_number IN (2967)

基本上我只是在桌子上的一个连接一堆表格的一个条件。这个查询返回少量的行（〜2000），因为条件非常有选择性。

当我试穿的Postgres的解释，我得到这个计划：

Hash Join (cost=29575.77..905867.89 rows=849 width=32) 
    Hash Cond: (market.train_id = train.id) 
    -> Hash Join (cost=29567.45..810779.82 rows=25352335 width=36) 
     Hash Cond: (market_pricing.market_id = market.id) 
     -> Hash Join (cost=1.99..232335.84 rows=6578983 width=14) 
       Hash Cond: (market_pricing.class_service_id = class_service.id) 
       -> Seq Scan on market_pricing (cost=0.00..141872.83 rows=6578983 width=16) 
       -> Hash (cost=1.44..1.44 rows=44 width=6) 
        -> Seq Scan on class_service (cost=0.00..1.44 rows=44 width=6) 
     -> Hash (cost=27373.77..27373.77 rows=107895 width=34) 
       -> Hash Join (cost=12462.88..27373.77 rows=107895 width=34) 
        Hash Cond: (train_stop_2.station_id = station_2.id) 
        -> Hash Join (cost=12459.97..25887.30 rows=107895 width=34) 
          Hash Cond: (train_stop_1.station_id = station_1.id) 
          -> Hash Join (cost=12457.06..24400.84 rows=107895 width=34) 
           Hash Cond: (od.market_id = market.id) 
           -> Hash Join (cost=11596.08..21228.71 rows=109529 width=12) 
             Hash Cond: (od.stop_origin_id = train_stop_2.id) 
             -> Hash Join (cost=5798.04..11642.00 rows=109529 width=12) 
              Hash Cond: (od.stop_destination_id = train_stop_1.id) 
              -> Seq Scan on od (cost=0.00..2055.29 rows=109529 width=12) 
              -> Hash (cost=3005.24..3005.24 rows=170224 width=8) 
                -> Seq Scan on train_stop train_stop_1 (cost=0.00..3005.24 rows=170224 width=8) 
             -> Hash (cost=3005.24..3005.24 rows=170224 width=8) 
              -> Seq Scan on train_stop train_stop_2 (cost=0.00..3005.24 rows=170224 width=8) 
           -> Hash (cost=510.99..510.99 rows=27999 width=22) 
             -> Seq Scan on market (cost=0.00..510.99 rows=27999 width=22) 
          -> Hash (cost=1.85..1.85 rows=85 width=8) 
           -> Seq Scan on station station_1 (cost=0.00..1.85 rows=85 width=8) 
        -> Hash (cost=1.85..1.85 rows=85 width=8) 
          -> Seq Scan on station station_2 (cost=0.00..1.85 rows=85 width=8) 
    -> Hash (cost=8.31..8.31 rows=1 width=4) 
     -> Index Scan using train_unique on train (cost=0.29..8.31 rows=1 width=4) 
       Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967))

我不是在查询规划方面的专家，但我猜昂贵的部分是，Postgres的散列整个表（200万〜行）只是在右侧连接一行，而它应该只是使用一个嵌套循环，在这种情况下它要快得多。查询计划中使用的统计信息非常准确。什么是背后的行为背后的原因？

编辑

EXPLAIN ANALYZE

Hash Join (cost=29575.77..905867.89 rows=849 width=32) (actual time=919.433..20674.305 rows=2028 loops=1) 
    Hash Cond: (market.train_id = train.id) 
    -> Hash Join (cost=29567.45..810779.82 rows=25352335 width=36) (actual time=861.335..17606.129 rows=24711872 loops=1) 
     Hash Cond: (market_pricing.market_id = market.id) 
     -> Hash Join (cost=1.99..232335.84 rows=6578983 width=14) (actual time=0.085..5699.519 rows=6845943 loops=1) 
       Hash Cond: (market_pricing.class_service_id = class_service.id) 
       -> Seq Scan on market_pricing (cost=0.00..141872.83 rows=6578983 width=16) (actual time=0.020..2463.255 rows=6845943 loops=1) 
       -> Hash (cost=1.44..1.44 rows=44 width=6) (actual time=0.045..0.045 rows=44 loops=1) 
        Buckets: 1024 Batches: 1 Memory Usage: 2kB 
        -> Seq Scan on class_service (cost=0.00..1.44 rows=44 width=6) (actual time=0.016..0.032 rows=44 loops=1) 
     -> Hash (cost=27373.77..27373.77 rows=107895 width=34) (actual time=861.166..861.166 rows=107132 loops=1) 
       Buckets: 8192 Batches: 2 Memory Usage: 3549kB 
       -> Hash Join (cost=12462.88..27373.77 rows=107895 width=34) (actual time=217.318..814.250 rows=107132 loops=1) 
        Hash Cond: (train_stop_2.station_id = station_2.id) 
        -> Hash Join (cost=12459.97..25887.30 rows=107895 width=34) (actual time=217.237..776.679 rows=107132 loops=1) 
          Hash Cond: (train_stop_1.station_id = station_1.id) 
          -> Hash Join (cost=12457.06..24400.84 rows=107895 width=34) (actual time=217.162..739.602 rows=107132 loops=1) 
           Hash Cond: (od.market_id = market.id) 
           -> Hash Join (cost=11596.08..21228.71 rows=109529 width=12) (actual time=188.590..578.450 rows=107132 loops=1) 
             Hash Cond: (od.stop_origin_id = train_stop_2.id) 
             -> Hash Join (cost=5798.04..11642.00 rows=109529 width=12) (actual time=106.059..312.845 rows=107132 loops=1) 
              Hash Cond: (od.stop_destination_id = train_stop_1.id) 
              -> Seq Scan on od (cost=0.00..2055.29 rows=109529 width=12) (actual time=0.006..41.699 rows=107132 loops=1) 
              -> Hash (cost=3005.24..3005.24 rows=170224 width=8) (actual time=105.850..105.850 rows=171096 loops=1) 
                Buckets: 16384 Batches: 2 Memory Usage: 3357kB 
                -> Seq Scan on train_stop train_stop_1 (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.005..45.071 rows=171096 loops=1) 
             -> Hash (cost=3005.24..3005.24 rows=170224 width=8) (actual time=82.340..82.340 rows=171096 loops=1) 
              Buckets: 16384 Batches: 2 Memory Usage: 3357kB 
              -> Seq Scan on train_stop train_stop_2 (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.007..37.142 rows=171096 loops=1) 
           -> Hash (cost=510.99..510.99 rows=27999 width=22) (actual time=28.538..28.538 rows=29839 loops=1) 
             Buckets: 4096 Batches: 1 Memory Usage: 1606kB 
             -> Seq Scan on market (cost=0.00..510.99 rows=27999 width=22) (actual time=0.004..16.594 rows=29839 loops=1) 
          -> Hash (cost=1.85..1.85 rows=85 width=8) (actual time=0.054..0.054 rows=85 loops=1) 
           Buckets: 1024 Batches: 1 Memory Usage: 4kB 
           -> Seq Scan on station station_1 (cost=0.00..1.85 rows=85 width=8) (actual time=0.003..0.026 rows=85 loops=1) 
        -> Hash (cost=1.85..1.85 rows=85 width=8) (actual time=0.063..0.063 rows=85 loops=1) 
          Buckets: 1024 Batches: 1 Memory Usage: 4kB 
          -> Seq Scan on station station_2 (cost=0.00..1.85 rows=85 width=8) (actual time=0.006..0.032 rows=85 loops=1) 
    -> Hash (cost=8.31..8.31 rows=1 width=4) (actual time=0.094..0.094 rows=1 loops=1) 
     Buckets: 1024 Batches: 1 Memory Usage: 1kB 
     -> Index Scan using train_unique on train (cost=0.29..8.31 rows=1 width=4) (actual time=0.087..0.090 rows=1 loops=1) 
       Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967)) 
Planning time: 12.338 ms 
Execution time: 20676.057 ms

EDIT 2

我注意到，改变连接顺序修复它。但我不明白。我认为postgres在内部重新排序连接以选择最佳顺序。

来源

2016-01-20 Mehdi GMIRA

你可以做EXPLAIN ANALYSE。这增加了成本实际，而不仅仅是计划。 – Thilo

我添加了解释分析的输出 –

您是否拥有所有需要嵌套循环联接的索引？ – Thilo