2017-05-09 252 views
0

我们面临着一个很奇怪的问题,当我们使用Liquibase执行以下变更:liquibase无法更新DB和/或很慢

<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.3.xsd"> 

    <changeSet id="add companyinfo.identifier" author="domi"> 
     <addColumn tableName="companyinfo"> 
      <column name="identifier" type="VARCHAR(50)" /> 
     </addColumn> 
    </changeSet> 

    <changeSet id="add uk companyinfo.identifier" author="domi"> 
     <addUniqueConstraint columnNames="identifier" 
           tableName="companyinfo" 
           constraintName="uk_companyinfo_identifier"/> 
    </changeSet> 

</databaseChangeLog> 

变更集添加新列(identifier),以表companyinfo并在同一列上定义一个唯一约束。该表有约1'500'000条记录。

这些变更集针对MySQL(版本5.5.54)执行,org.liquibase:liquibase-maven-plugin:3.5.3:update maven插件/目标由Jenkins管道脚本触发。 (Jenkins和DB位于美国东海岸)。 我们使用JDBC驱动程序是:mysql:mysql-connector-java:5.1.41

通常我们没有任何问题liquibase,但是当我们运行这个变更,我们总是恰好2个小时后获得类似这样的错误:

[INFO] --- liquibase-maven-plugin:3.5.3:update (default) @ persistence --- 
[INFO] ------------------------------------------------------------------------ 
[INFO] Executing on Database: jdbc:mysql://yyyy.xxxx.net:3306/my_db?characterEncoding=utf8 
INFO 5/2/17 5:11 PM: liquibase: Successfully acquired change log lock 
INFO 5/2/17 5:11 PM: liquibase: Reading from DATABASECHANGELOG 
SEVERE 5/2/17 7:11 PM: liquibase: src/main/resources/db/db.changelog-master.xml: db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi: Change Set db/changelog/db.changelog-companyinfo_identifier.xml::add add companyinfo.identifier::domi failed. Error: Communications link failure 

The last packet successfully received from the server was 7,200,055 milliseconds ago. The last packet sent successfully to the server was 7,200,054 milliseconds ago. [Failed SQL: ALTER TABLE companyinfo ADD identifier VARCHAR(50) NULL] 
INFO 5/2/17 7:11 PM: liquibase: db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi: Successfully released change log lock 
SEVERE 5/2/17 7:11 PM: liquibase: db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi: Could not release lock 
liquibase.exception.LockException: liquibase.exception.DatabaseException: Error executing SQL UPDATE DATABASECHANGELOGLOCK SET LOCKED = 0, LOCKEDBY = NULL, LOCKGRANTED = NULL WHERE ID = 1: No operations allowed after connection closed. 
    at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:283) 
    at liquibase.Liquibase.update(Liquibase.java:218) 
    at liquibase.Liquibase.update(Liquibase.java:192) 
    at liquibase.Liquibase.update(Liquibase.java:335) 
    at org.liquibase.maven.plugins.LiquibaseUpdate.doUpdate(LiquibaseUpdate.java:33) 
    at org.liquibase.maven.plugins.AbstractLiquibaseUpdateMojo.performLiquibaseTask(AbstractLiquibaseUpdateMojo.java:30) 
    at org.liquibase.maven.plugins.AbstractLiquibaseMojo.execute(AbstractLiquibaseMojo.java:394) 
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) 
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) 
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) 
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) 
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) 
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) 
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) 
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) 
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307) 
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193) 
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106) 
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862) 
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286) 
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:197) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) 
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) 
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) 
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) 
Caused by: liquibase.exception.DatabaseException: Error executing SQL UPDATE DATABASECHANGELOGLOCK SET LOCKED = 0, LOCKEDBY = NULL, LOCKGRANTED = NULL WHERE ID = 1: No operations allowed after connection closed. 
    at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:68) 
    at liquibase.executor.jvm.JdbcExecutor.update(JdbcExecutor.java:231) 
    at liquibase.executor.jvm.JdbcExecutor.update(JdbcExecutor.java:205) 
    at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:267) 
    ... 28 more 
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed. 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) 
    at com.mysql.jdbc.Util.getInstance(Util.java:408) 
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918) 
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897) 
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886) 
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860) 
    at com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1198) 
    at com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1193) 
    at com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2388) 
    at com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2372) 
    at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:52) 
    ... 31 more 
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure 

The last packet successfully received from the server was 7,200,055 milliseconds ago. The last packet sent successfully to the server was 7,200,054 milliseconds ago. 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) 

当我运行的命令与我的笔记本电脑上执行的Liquibase相同,不是通过JDBC,而是通过连接到上述相同数据库的SequelPro MySql客户端(瑞士的客户端,美国东海岸的DB)运行相同的命令,它成功完成,没有任何问题15分钟。

这些Liquibase执行语句:

-- ********************************************************************* 
-- Update Database Script 
-- ********************************************************************* 
-- Change Log: src/main/resources/db/db.changelog-master.xml 
-- Ran at: 5/3/17 7:55 AM 
-- Against: [email protected]@jdbc:mysql://yyyy.xxxx.net:3306/my_db?characterEncoding=utf8 
-- Liquibase version: 3.5.3 
-- ********************************************************************* 

-- Lock Database 
UPDATE DATABASECHANGELOGLOCK SET LOCKED = 1, LOCKEDBY = 'xxxxx (192.168.1.24)', LOCKGRANTED = '2017-05-03 07:55:44.564' WHERE ID = 1 AND LOCKED = 0; 

-- Changeset db/changelog/db.changelog-companyinfo_identifier.xml::add companyinfo.identifier::domi 
ALTER TABLE companyinfo ADD identifier VARCHAR(50) NULL; 

INSERT INTO DATABASECHANGELOG (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('add companyinfo.identifier', 'domi', 'db/changelog/db.changelog-companyinfo_identifier.xml', NOW(), 838, '7:b2d3082917bf3ff3aecb6cbc363a5e9c', 'add companyinfo.identifier', '', 'EXECUTED', NULL, NULL, '3.5.3', '3790945685'); 

-- Changeset db/changelog/db.changelog-companyinfo_identifier.xml::add uk companyinfo.identifier::domi 
ALTER TABLE companyinfo ADD CONSTRAINT uk_companyinfo_identifier UNIQUE (identifier); 

INSERT INTO DATABASECHANGELOG (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('add uk companyinfo.identifier', 'domi', 'db/changelog/db.changelog-companyinfo_identifier.xml', NOW(), 839, '7:5d98affa45f814b9ad32bc9c954ed32b', 'addUniqueConstraint constraintName=uk_companyinfo_identifier, tableName=companyinfo', '', 'EXECUTED', NULL, NULL, '3.5.3', '3790945685'); 

-- Release Database Lock 
UPDATE DATABASECHANGELOGLOCK SET LOCKED = 0, LOCKEDBY = NULL, LOCKGRANTED = NULL WHERE ID = 1; 

据我所知,在DB没有超时组将匹配2个小时的限制:

  • wait_timeout设置为28.8K秒(8小时)
  • interactive_timeout:28800
  • salve_net_timeout:3600(1小时)
  • connect_timeout:10
  • lock_wait_timeout:31536000
  • net_read_timeout:30
  • net_write_timeout:60
  • innodb_lock_wait_timeout:50

我觉得有两个问题,这很有意思:

  • 什么导致liquibase失败后失败两个小时?
  • 为什么DDL执行速度比liquibase慢得多?

...但也许这两个问题都有相同的答案


更新:我已经执行完全相同的行家/ liquibase命令,我从詹金斯触发现在从我的本地ENV(也连接到相同的远程数据库) - 并猜测:它的作用就像一个魅力,根本没有问题 - 命令在20分钟内完成!

mvn -f pom.xml process-resources -Pupdate-db -Dliquibase.username=xxx -Dliquibase.password=xxxx -Dliquibase.url=jdbc:mysql://yyyy.xxxx.net:3306/my_db:3306/yooture_ci?characterEncoding=utf8 

所以我最后的猜测是我们的CI基础设施和数据库之间必然存在一些导致问题的东西,但是什么?它的工作原理永远,只是没有在这种情况下...

回答

0

我们终于找到了问题...

问题是由默认的AWS NAT配置造成的。 The AWS documentation说:

如果那是使用NAT网关的连接闲置5分钟或 以上,连接超时

这是非常低,它肯定打破了大多数操作系统出来的默认配置那里。