0
该查询运行在Postgres的速度很慢:Postgres的错误的查询计划
SELECT
class_service.name AS "classServiceName",
market.name AS "marketName",
market_pricing.day_x AS "dayX",
station_1.iata AS "odDestination",
coalesce(market_pricing.availability, -1) AS "marketAvailability",
station_2.iata AS "odOrigin"
FROM market_pricing
JOIN class_service ON class_service.id = market_pricing.class_service_id
JOIN market ON market.id = market_pricing.market_id
JOIN od ON market.id = od.market_id
JOIN train_stop AS train_stop_1 ON train_stop_1.id = od.stop_destination_id
JOIN station AS station_1 ON train_stop_1.station_id = station_1.id
JOIN train_stop AS train_stop_2 ON train_stop_2.id = od.stop_origin_id
JOIN station AS station_2 ON train_stop_2.station_id = station_2.id
JOIN train ON train.id = market.train_id
WHERE train.departure_date IN ('2016-01-16')
AND train.train_number IN (2967)
基本上我只是在桌子上的一个连接一堆表格的一个条件。这个查询返回少量的行(〜2000),因为条件非常有选择性。
当我试穿的Postgres的解释,我得到这个计划:
Hash Join (cost=29575.77..905867.89 rows=849 width=32)
Hash Cond: (market.train_id = train.id)
-> Hash Join (cost=29567.45..810779.82 rows=25352335 width=36)
Hash Cond: (market_pricing.market_id = market.id)
-> Hash Join (cost=1.99..232335.84 rows=6578983 width=14)
Hash Cond: (market_pricing.class_service_id = class_service.id)
-> Seq Scan on market_pricing (cost=0.00..141872.83 rows=6578983 width=16)
-> Hash (cost=1.44..1.44 rows=44 width=6)
-> Seq Scan on class_service (cost=0.00..1.44 rows=44 width=6)
-> Hash (cost=27373.77..27373.77 rows=107895 width=34)
-> Hash Join (cost=12462.88..27373.77 rows=107895 width=34)
Hash Cond: (train_stop_2.station_id = station_2.id)
-> Hash Join (cost=12459.97..25887.30 rows=107895 width=34)
Hash Cond: (train_stop_1.station_id = station_1.id)
-> Hash Join (cost=12457.06..24400.84 rows=107895 width=34)
Hash Cond: (od.market_id = market.id)
-> Hash Join (cost=11596.08..21228.71 rows=109529 width=12)
Hash Cond: (od.stop_origin_id = train_stop_2.id)
-> Hash Join (cost=5798.04..11642.00 rows=109529 width=12)
Hash Cond: (od.stop_destination_id = train_stop_1.id)
-> Seq Scan on od (cost=0.00..2055.29 rows=109529 width=12)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8)
-> Seq Scan on train_stop train_stop_1 (cost=0.00..3005.24 rows=170224 width=8)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8)
-> Seq Scan on train_stop train_stop_2 (cost=0.00..3005.24 rows=170224 width=8)
-> Hash (cost=510.99..510.99 rows=27999 width=22)
-> Seq Scan on market (cost=0.00..510.99 rows=27999 width=22)
-> Hash (cost=1.85..1.85 rows=85 width=8)
-> Seq Scan on station station_1 (cost=0.00..1.85 rows=85 width=8)
-> Hash (cost=1.85..1.85 rows=85 width=8)
-> Seq Scan on station station_2 (cost=0.00..1.85 rows=85 width=8)
-> Hash (cost=8.31..8.31 rows=1 width=4)
-> Index Scan using train_unique on train (cost=0.29..8.31 rows=1 width=4)
Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967))
我不是在查询规划方面的专家,但我猜昂贵的部分是,Postgres的散列整个表(200万〜行)只是在右侧连接一行,而它应该只是使用一个嵌套循环,在这种情况下它要快得多。 查询计划中使用的统计信息非常准确。 什么是背后的行为背后的原因?
编辑
EXPLAIN ANALYZE
Hash Join (cost=29575.77..905867.89 rows=849 width=32) (actual time=919.433..20674.305 rows=2028 loops=1)
Hash Cond: (market.train_id = train.id)
-> Hash Join (cost=29567.45..810779.82 rows=25352335 width=36) (actual time=861.335..17606.129 rows=24711872 loops=1)
Hash Cond: (market_pricing.market_id = market.id)
-> Hash Join (cost=1.99..232335.84 rows=6578983 width=14) (actual time=0.085..5699.519 rows=6845943 loops=1)
Hash Cond: (market_pricing.class_service_id = class_service.id)
-> Seq Scan on market_pricing (cost=0.00..141872.83 rows=6578983 width=16) (actual time=0.020..2463.255 rows=6845943 loops=1)
-> Hash (cost=1.44..1.44 rows=44 width=6) (actual time=0.045..0.045 rows=44 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Seq Scan on class_service (cost=0.00..1.44 rows=44 width=6) (actual time=0.016..0.032 rows=44 loops=1)
-> Hash (cost=27373.77..27373.77 rows=107895 width=34) (actual time=861.166..861.166 rows=107132 loops=1)
Buckets: 8192 Batches: 2 Memory Usage: 3549kB
-> Hash Join (cost=12462.88..27373.77 rows=107895 width=34) (actual time=217.318..814.250 rows=107132 loops=1)
Hash Cond: (train_stop_2.station_id = station_2.id)
-> Hash Join (cost=12459.97..25887.30 rows=107895 width=34) (actual time=217.237..776.679 rows=107132 loops=1)
Hash Cond: (train_stop_1.station_id = station_1.id)
-> Hash Join (cost=12457.06..24400.84 rows=107895 width=34) (actual time=217.162..739.602 rows=107132 loops=1)
Hash Cond: (od.market_id = market.id)
-> Hash Join (cost=11596.08..21228.71 rows=109529 width=12) (actual time=188.590..578.450 rows=107132 loops=1)
Hash Cond: (od.stop_origin_id = train_stop_2.id)
-> Hash Join (cost=5798.04..11642.00 rows=109529 width=12) (actual time=106.059..312.845 rows=107132 loops=1)
Hash Cond: (od.stop_destination_id = train_stop_1.id)
-> Seq Scan on od (cost=0.00..2055.29 rows=109529 width=12) (actual time=0.006..41.699 rows=107132 loops=1)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8) (actual time=105.850..105.850 rows=171096 loops=1)
Buckets: 16384 Batches: 2 Memory Usage: 3357kB
-> Seq Scan on train_stop train_stop_1 (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.005..45.071 rows=171096 loops=1)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8) (actual time=82.340..82.340 rows=171096 loops=1)
Buckets: 16384 Batches: 2 Memory Usage: 3357kB
-> Seq Scan on train_stop train_stop_2 (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.007..37.142 rows=171096 loops=1)
-> Hash (cost=510.99..510.99 rows=27999 width=22) (actual time=28.538..28.538 rows=29839 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 1606kB
-> Seq Scan on market (cost=0.00..510.99 rows=27999 width=22) (actual time=0.004..16.594 rows=29839 loops=1)
-> Hash (cost=1.85..1.85 rows=85 width=8) (actual time=0.054..0.054 rows=85 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 4kB
-> Seq Scan on station station_1 (cost=0.00..1.85 rows=85 width=8) (actual time=0.003..0.026 rows=85 loops=1)
-> Hash (cost=1.85..1.85 rows=85 width=8) (actual time=0.063..0.063 rows=85 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 4kB
-> Seq Scan on station station_2 (cost=0.00..1.85 rows=85 width=8) (actual time=0.006..0.032 rows=85 loops=1)
-> Hash (cost=8.31..8.31 rows=1 width=4) (actual time=0.094..0.094 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using train_unique on train (cost=0.29..8.31 rows=1 width=4) (actual time=0.087..0.090 rows=1 loops=1)
Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967))
Planning time: 12.338 ms
Execution time: 20676.057 ms
EDIT 2
我注意到,改变连接顺序修复它。但我不明白。我认为postgres在内部重新排序连接以选择最佳顺序。
你可以做EXPLAIN ANALYSE。这增加了成本实际,而不仅仅是计划。 – Thilo
我添加了解释分析的输出 –
您是否拥有所有需要嵌套循环联接的索引? – Thilo