我需要将很多实体保存到数据库中。保存一个实体包括将行添加到不同的表中,并通过在一个表中插入一行用于将某行插入到另一个表中来自动生成键。这样的逻辑使我创建和使用存储过程。分别为每个实体调用这个存储过程(即通过statement.execute(...))可以正常工作,除非有数十亿个实体要保存。所以我试图分批做到这一点。但是,如果是批处理,则批处理执行会导致抛出org.postgresql.util.PSQLException,并显示一条消息'如果没有预期结果,则返回结果。'在PostgreSQL中批量存储过程
我的存储过程是这样的:
CREATE OR REPLACE FUNCTION insertSentence(warcinfoID varchar, recordID varchar, sentence varchar,
sent_timestamp bigint, sect_ids smallint[]) RETURNS void AS $$
DECLARE
warcinfoIdId integer := 0;
recordIdId integer := 0;
sentId integer := 0;
id integer := 0;
BEGIN
SELECT warcinfo_id_id INTO warcinfoIdId FROM warcinfo_id WHERE warcinfo_id_value = warcinfoID;
IF NOT FOUND THEN
INSERT INTO warcinfo_id (warcinfo_id_value) VALUES (warcinfoID)
RETURNING warcinfo_id_id INTO STRICT warcinfoIdId;
END IF;
SELECT record_id_id INTO recordIdId FROM record_id WHERE record_id_value = recordID;
IF NOT FOUND THEN
INSERT INTO record_id (record_id_value) VALUES (recordID)
RETURNING record_id_id INTO STRICT recordIdId;
END IF;
LOOP
SELECT sent_id INTO sentId FROM sentence_text
WHERE md5(sent_text) = md5(sentence) AND sent_text = sentence;
EXIT WHEN FOUND;
BEGIN
INSERT INTO sentence_text (sent_text) VALUES (sentence) RETURNING sent_id INTO STRICT sentId;
EXCEPTION WHEN unique_violation THEN
sentId := 0;
END;
END LOOP;
INSERT INTO sentence_occurrence (warcinfo_id, record_id, sent_id, timestamp, sect_ids)
VALUES (warcinfoIdId, recordIdId, sentId, TO_TIMESTAMP(sent_timestamp), sect_ids)
RETURNING entry_id INTO STRICT id;
END;
$$ LANGUAGE plpgsql;
和Scala代码是这样的:
def partition2DB(iterator: Iterator[(String, String, String, Long, Array[Int])]): Unit = {
Class.forName(driver)
val conn = DriverManager.getConnection(connectionString)
try {
val statement = conn.createStatement()
var i = 0
iterator.foreach(r => {
i += 1
statement.addBatch(
"select insertSentence('%s', '%s', '%s', %d, '{%s}');".format(
r._1, r._2, r._3.replaceAll("'", "''"), r._4, r._5.mkString(","))
)
if (i % 1000 == 0) statement.executeBatch()
})
if (i % 1000 != 0) statement.executeBatch()
} catch {
case e: SQLException => println("exception caught: " + e.getNextException());
} finally {
conn.close
}
}
奇怪的是,即使statement.executeBatch()抛出一个异常,它在此之前保存的实体。所以这种解决方法,使事情的工作:
def partition2DB(iterator: Iterator[(String, String, String, Long, Array[Int])]): Unit = {
Class.forName(driver)
val conn = DriverManager.getConnection(connectionString)
try {
var statement = conn.createStatement()
var i = 0
iterator.foreach(r => {
i += 1
statement.addBatch(
"select insertSentence('%s', '%s', '%s', %d, '{%s}');".format(
r._1, r._2, r._3.replaceAll("'", "''"), r._4, r._5.mkString(","))
)
if (i % 1000 == 0) {
i = 0
try {
statement.executeBatch()
} catch {
case e: SQLException => statement = conn.createStatement()
}
}
})
if (i % 1000 != 0) {
try {
statement.executeBatch()
} catch {
case e: SQLException => statement = conn.createStatement()
}
}
} catch {
case e: SQLException => println("exception caught: " + e.getNextException());
} finally {
conn.close
}
}
不过,我希望不要轻信的PostgreSQL无证功能我目前使用。 我看到其他人也碰到这个问题来了:
- https://www.postgresql.org/message-id/[email protected]
- http://grokbase.com/t/postgresql/pgsql-jdbc/113g9ygydb/problem-with-executebatch-and-a-result-was-returned-when-none-was-expected
有人能提出一个解决办法?
干得好。如果插入操作的是多组输入,而不是逐个调用,那么您将获得更大的改进,但它应该已经是一种改进。理想情况下,您可以使用PgJDBC的CopyManager加载临时表,然后处理临时表。 –