2013-03-20 94 views
1

有两个表如何从一个表插入/更新数据到另一个表(postgresql)?

tmp_stat: 
date, site_id, ip, block_id, count 
Primary Key (date, site_id, ip, block_id) 

main_stat: 
date, site_id, ip, block_id, count 
Primary Key (date, site_id, ip, block_id) 

我需要 插入行从tmp_stat main_stat时,有没有这样的(日,SITE_ID等) 和更新计数时,它们已经存在 尽可能快

tmp_stat包含大约500000行,main_stat包含毫秒

+1

在这里看到:http://stackoverflow.com/q/1109061/330315 – 2013-03-20 07:39:45

+0

而看到文档:http://docs.oracle.com/cd/B28359_01/server。 111/b28286/statements_9016.htm#i2081218 – 2013-03-20 08:17:37

+0

a_horse_with_no_name我希望看到一个从表到多的插入/更新的例子。我不想循环500K行,我希望有更快的方法来做到这一点。 – varan 2013-03-20 14:47:46

回答

1

我正在gsimes的答案上建立,因为我理解这个问题。

with agg_temp_stat as (
    select date, site_id, ip, block_id, sum(counter)::integer counter 
    from temp_stat 
    group by 1, 2, 3, 4 
), upd as (
    update main_stat t 
    set counter = counter + s.counter 
    from agg_tmp_stat s 
    where 
     (t.date, t.site_id, t.ip, t.block_id) 
     = (s.date, s.site_id, s.ip, s.block_id) 
    returning s.date, s.site_id, s.ip, s.block_id 
) 
insert into main_stat 
select s.date, s.site_id, s.ip, s.block_id, s.counter 
from 
    agg_tmp_stat s 
    left join 
    upd on 
     upd.date = s.date 
     and upd.site_id = s.site_id 
     and upd.ip = s.ip 
     and upd.block_id = s.block_id 
where upd.date is null 

基本上汇总临时表并将结果计数器与已有计数器相加。

6

以下是否有效?

WITH upd AS (
    UPDATE main_stat t 
     SET counter = s.counter 
     FROM tmp_stat s 
    WHERE t.date = s.date 
      AND t.site_id = s.site_id 
      AND t.ip = s.ip 
      AND t.block_id = s.block_id 
RETURNING s.date, s.site_id, s.ip, s.block_id, s.counter 
) 
INSERT INTO main_stat 
    SELECT s.mydate, s.site_id, s.ip, s.block_id, s.counter 
     FROM tmp_stat s 
     LEFT JOIN upd ON (upd.date = s.date and upd.site_id = s.site_id and upd.ip = s.ip and upd.block_id = s.block_id) 
     WHERE upd.date IS NULL 
; 

更新:

看起来这仅适用于9.1或更新版本。

仅仅使用某人的建议WHERE (t.date, t.site_id, t.ip, t.block_id) = (s.date, s.site_id, s.ip, s.block_id)似乎会带来更好的性能。

WITH upd AS (
    UPDATE main_stat t 
     SET counter = s.counter 
     FROM tmp_stat s 
    WHERE (t.date, t.site_id, t.ip, t.block_id) = (s.date, s.site_id, s.ip, s.block_id) 
RETURNING s.date, s.site_id, s.ip, s.block_id 
) 
INSERT INTO main_stat 
    SELECT s.date, s.site_id, s.ip, s.block_id, s.counter 
     FROM tmp_stat s 
     LEFT JOIN upd 
      ON (upd.date = s.date 
       AND upd.site_id = s.site_id 
       AND upd.ip = s.ip 
       AND upd.block_id = s.block_id) 
     WHERE upd.date IS NULL 
; 

这里发生了什么是我们正在使用CTE做更新与CTE返回更新行的标识列。

INSERT然后使用更新的行信息来过滤tmp_stat以仅插入新的记录。

Dimitri Fontaine在此blog条目中涵盖了一些并发警告。

关于CTEs的更多信息可以在Postgresql documentation中找到。

+2

+1;我建议'WHERE(t.date,t.site_id,t.ip,t.block_id)=(s.date,s.site_id,s.ip,s.block_id)'和'LEFT JOIN upd USING(date ,site_id,ip,block_id)'。或者可能是'SELECT ... FROM tmp_stat s EXCEPT SELECT ... FROM upd' – 2013-03-20 22:53:16

1

这似乎很简单Exists查询...如果列索引它应该足够快。

exmple:

-- insert missing rows 
INSERT INTO main_stat (date, site_id, ip, block_id) 
SELECT date, site_id, ip, block_id FROM tmp_stat tmp 
WHERE NOT EXISTS (SELECT 1 FROM main_stats main 
          WHERE tmp.date = main.date 
          AND tmp.site_id = main.site_id 
          AND tmp.ip  = main.ip 
          AND tmp.block_id = main.block_id 
       ); 
-- update count for existing rows 
UPDATE main_stat main 
SET count = main.count + (SELECT count FROM tmp_stats tmp 
          WHERE tmp.date = main.date 
          AND tmp.site_id = main.site_id 
          AND tmp.ip  = main.ip 
          AND tmp.block_id = main.block_id 
          LIMIT 1) 

WHERE EXISTS (SELECT 1 FROM main_stats main 
          WHERE tmp.date = main.date 
          AND tmp.site_id = main.site_id 
          AND tmp.ip  = main.ip 
          AND tmp.block_id = main.block_id 
相关问题