2016-12-06 48 views
1

我有一个批处理过程,找到near_link每个avl位置。 avl分布是随机的,但在城市周围呈正态分布。问题是第一批需要很多时间。但后批更快。如何预热/准备表/索引统计信息?

地图并没有改变,所以我的猜测是创建一些统计数据。因为尝试一遍又一遍搜索同一张地图上的x,y。

所以我怎么能帮助创建这些统计批次开始前?或者我如何检查引擎盖后面发生了什么。

的事情是我得到这样的结果单独运行批处理,和Im担心如果在生产服务器上运行的统计数据的arent好,因为都是那种其他请求地图。

-- Executing query: 
SELECT * FROM avl_db.process_near_link(); 

NOTICE: Duration in seconds= 163.4609 , Rows= 400 
NOTICE: Duration in seconds= 68.73396 , Rows= 400 
NOTICE: Duration in seconds= 36.93196 , Rows= 400 
NOTICE: Duration in seconds= 17.58829 , Rows= 400 
NOTICE: Duration in seconds= 12.94885 , Rows= 400 
NOTICE: Duration in seconds= 9.509757 , Rows= 400 

Total query runtime: 05:09 minutes -- 2400 rows 
1 row retrieved. 

-- Executing query: 
SELECT * FROM avl_db.process_near_link(); 

NOTICE: Duration in seconds= 8.03767 , Rows= 400 
NOTICE: Duration in seconds= 8.51031 , Rows= 400 
NOTICE: Duration in seconds= 5.45953 , Rows= 400 
NOTICE: Duration in seconds= 4.08547 , Rows= 400 
NOTICE: Duration in seconds= 4.19483 , Rows= 400 
NOTICE: Duration in seconds= 3.85986 , Rows= 400 

Total query runtime: 34.1 secs -- 2400 rows 
1 row retrieved. 

-- Executing query: 
SELECT * FROM avl_db.process_near_link(); 

NOTICE: Duration in seconds= 3.66540 , Rows= 400 
NOTICE: Duration in seconds= 3.55134 , Rows= 400 
NOTICE: Duration in seconds= 3.17400 , Rows= 400 
NOTICE: Duration in seconds= 3.06982 , Rows= 400 
NOTICE: Duration in seconds= 2.96954 , Rows= 400 
NOTICE: Duration in seconds= 3.05310 , Rows= 400 
NOTICE: Duration in seconds= 2.88948 , Rows= 400 
NOTICE: Duration in seconds= 2.77269 , Rows= 400 
NOTICE: Duration in seconds= 2.88940 , Rows= 400 
NOTICE: Duration in seconds= 2.94150 , Rows= 400 
NOTICE: Duration in seconds= 2.84522 , Rows= 400 
NOTICE: Duration in seconds= 2.86770 , Rows= 400 
NOTICE: Duration in seconds= 2.74608 , Rows= 400 

Total query runtime: 39.4 secs -- 5200 
1 row retrieved. 

这是批量查询:

UPDATE avl_db.avl_pool a 
SET near_link = map.get_near_link(sq.X, sq.Y, sq.AZIMUTH), 
    has_link = true 
FROM (
    SELECT avl_id, x, y, azimuth 
    FROM avl_db.avl_pool 
    WHERE NOT has_link 
    ORDER BY avl_id 
    LIMIT 400 
    ) sq 
    WHERE a.avl_id = sq.avl_id; 

Explain Plan

"Update on avl_pool a (cost=0.84..3395.28 rows=400 width=151) (actual time=2779.889..2779.889 rows=0 loops=1)" 
" -> Nested Loop (cost=0.84..3395.28 rows=400 width=151) (actual time=11.253..2738.711 rows=400 loops=1)" 
"  -> Subquery Scan on sq (cost=0.42..34.28 rows=400 width=80) (actual time=6.882..8.496 rows=400 loops=1)" 
"    -> Limit (cost=0.42..30.28 rows=400 width=28) (actual time=6.871..7.964 rows=400 loops=1)" 
"     -> Index Scan using avl_pool_pkey on avl_pool (cost=0.42..29185.30 rows=391017 width=28) (actual time=6.869..7.873 rows=400 loops=1)" 
"       Filter: (NOT has_link)" 
"       Rows Removed by Filter: 10800" 
"  -> Index Scan using avl_pool_pkey on avl_pool a (cost=0.42..8.14 rows=1 width=79) (actual time=0.003..0.029 rows=1 loops=400)" 
"    Index Cond: (avl_id = sq.avl_id)" 
"Planning time: 0.372 ms" 
"Execution time: 2779.970 ms" 

回答

0

我会说,你正在经历高速缓存的影响。

在第一次运行期间,数据必须从磁盘中获取,稍后的运行可以从已经缓存的数据(主要是avl_pool_pkey索引的块,以及之前更新期间访问的表块)中获益。

如果使用EXPLAIN (ANALYZE, BUFFERS),它会告诉你有多少块是从磁盘中读取多少在高速缓存中找到您可以验证这一点。