我有一个存储过程(下面提供),我写了处理大约2500万条记录。这个存储过程正在做的是取一个给定的距离,一个距离(25英里)和要分配的记录的数量(12),根据25英里的距离查找给定边界内的所有记录,然后分配到12个记录给一个还没有记录的用户。而且,用户每个类别只能有一个记录(因此每个记录具有不同的类别)。MySQL存储过程 - 处理数百万条记录 - 如何加速
存储过程效果很好。唯一的问题是它需要很长时间。我创建了8个总处理器,除了工作表(POSTSINAREATBL [1-8])以外,每个处理器都相同,所以我可以加快处理速度。我已经将脚本运行了4天,并且只处理了2500万条记录中的350万条记录。
我希望有人可能有一些见解,并帮助如何加快这一点。我真的需要在接下来的1-2天内处理所有记录,并以现在的速度进行处理,这将需要将近一个月的时间!
另外,随着8个脚本的运行,我的CPU运行在99.8%,所以我在最大容量。
DELIMITER $$
CREATE PROCEDURE `get_pins_in_boundaries`(IN mylon double, IN mylat double, IN dist int, IN numrecords int)
BEGIN
declare isDone INT;
declare lat float;
declare lng float;
declare lon1 float;
declare lon2 float;
declare lat1 float;
declare lat2 float;
declare this_iter_pin_id int;
declare use_this_user_id int;
DECLARE num_results_in_area int;
DECLARE cur_posts_to_assign_to_user CURSOR FOR select pin_id from POSTSINAREATBL group by category_id limit numrecords;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET isDone = 1;
IF mylon = 0.000000 OR mylat = 0.000000 THEN
SELECT CONCAT('complete') AS results;
ELSE
SET lat=mylon;
SET lng=mylat;
-- calculate lon and lat for the rectangle:
set lon1 = mylon-dist/abs(cos(radians(mylat))*69);
set lon2 = mylon+dist/abs(cos(radians(mylat))*69);
set lat1 = mylat-(dist/69); set lat2 = mylat+(dist/69);
-- calculate lon and lat for the rectangle:
set lon1 = lng - dist/ABS(COS(RADIANS(lat)) * 111.04);
set lon2 = lng + dist/ABS(COS(RADIANS(lat)) * 111.04);
set lat1 = lat - dist/(111.04);
set lat2 = lat + dist/(111.04);
-- create temp table and store records matching criteria into table
CREATE TABLE IF NOT EXISTS POSTSINAREATBL(
pin_id BIGINT NOT NULL,
category_id BIGINT NOT NULL,
distance DECIMAL(6,1)
);
INSERT INTO POSTSINAREATBL (
SELECT pin_id,category_id, (3959 * acos(cos(radians(lat)) * cos(radians(latitude)) * cos(radians(longitude) - radians(lng)) + sin(radians(lat)) * sin(radians(latitude)))) as distance
FROM skoovy_prd.pins
WHERE longitude between lon1 and lon2
and latitude between lat1 and lat2
and user_id =0
);
select count(*) INTO num_results_in_area from POSTSINAREATBL;
WHILE num_results_in_area > 0 DO
SET use_this_user_id = (SELECT user_id from skoovy_prd.users WHERE user_id NOT IN(select user_id from skoovy_prd.posts_users_processed) LIMIT 1);
INSERT INTO skoovy_prd.posts_users_processed (user_id) VALUES(use_this_user_id);
SET isDone = 0;
OPEN cur_posts_to_assign_to_user;
REPEAT
FETCH cur_posts_to_assign_to_user INTO this_iter_pin_id;
UPDATE skoovy_prd.pins SET pins.user_id = use_this_user_id WHERE pins.pin_id = this_iter_pin_id;
DELETE FROM POSTSINAREATBL WHERE pin_id = this_iter_pin_id;
SET num_results_in_area = num_results_in_area - 1;
UNTIL isDone END REPEAT;
CLOSE cur_posts_to_assign_to_user;
END WHILE;
TRUNCATE TABLE POSTSINAREATBL;
SELECT CONCAT('complete') AS results;
END IF;
END
感谢。我已经运行了一些测试,但没有显示性能“问题”在地理定位范围内的数据查询中。 – kambythet 2014-09-27 00:12:25
如果您想查找存储过程的瓶颈,请针对您的过程中的每个查询运行“解释”。找到问题后 - 只需使用索引对其进行优化即可。 http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html – xardas 2014-09-28 03:39:46
已完成。这并不是说缺少索引,索引没有被正确使用,等等。这就是有数百万条记录,我试图看看是否有更高效的方法来做到这一点,并迅速执行目前proc的操作执行。 – kambythet 2014-09-29 15:54:49