2017-04-04 93 views
0

我在AWS C4实例上安装了arangodb 3.1.16。我有一个Foxx服务试图在生产环境中运行。它每秒获得平均10个200个八比特组的数据包,并返回每秒200个八比特组的20个数据包。Arangodb停止并且在dev-xvdb超时后不会重新启动

每次我开始运行我的进程时,foxx服务将以一致的性能运行一个小时,然后突然停止。我无法访问我的foxx api:所有请求都会出现连接超时错误,并且不会在foxx日志中打印。我再也无法访问网络界面:该页面无法加载。

一分钟左右后,福克斯日志显示我的错误信息:“ArangoError 18:锁定超时”

的其他分钟后,日志告诉我,通常是快速的要求,但花了很长的时间(警告{}查询慢查询:花:1770.862498)

使用 “journalctl -xe”,我得知后,国外IP试图连接,我得到= “工作DEV-xvdb.device /启动超时”

我设法重新启动arango使用:

ps -eaf |grep arangod 
sudo kill # 
sudo apt-get --reinstall install arangodb3=3.1.16 

我该如何解决这个经常性问题?

“journalctl -xe” 给我:

Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Failed with result 'exit-code’. 
-- Subject: Unit arangodb3.service has begun start-up 
-- Defined-By: systemd 
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel 
-- 
-- Unit arangodb3.service has begun starting up. 
Apr 04 15:03:10 my-ip arangodb3[11481]: * Starting arango database server arangod 
Apr 04 15:03:10 my-ip arangodb3[11481]: * database version check failed, maybe you need to run 'upgrade'? 
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Control process exited, code=exited status=1 
Apr 04 15:03:10 my-ip systemd[1]: Failed to start LSB: arangodb. 
-- Subject: Unit arangodb3.service has failed 
-- Defined-By: systemd 
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel 
-- 
-- Unit arangodb3.service has failed. 
-- 
-- The result is failed. 
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Unit entered failed state. 
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Failed with result 'exit-code'. 
Apr 04 15:03:10 my-ip sudo[11346]: pam_unix(sudo:session): session closed for user root 
Apr 04 15:03:17 my-ip sshd[11502]: Did not receive identification string from UNKNOWN IP 1 
Apr 04 15:03:21 my-ip sshd[11503]: Connection closed by UNKNOWN IP 2 port 54736 [preauth] 
Apr 04 15:03:21 my-ip sshd[11507]: Did not receive identification string from UNKNOWN IP 2 
Apr 04 15:03:21 my-ip sshd[11506]: fatal: Unable to negotiate with UNKNOWN IP 2 port 54730: no matching host key type found. Their offer: ssh-dss [preauth] 
Apr 04 15:03:21 my-ip sshd[11504]: Connection closed by UNKNOWN IP 2 port 54732 [preauth] 
Apr 04 15:03:22 my-ip sshd[11505]: Connection closed by UNKNOWN IP 2 port 54734 [preauth] 
Apr 04 15:03:40 my-ip systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start timed out. 
Apr 04 15:03:40 my-ip systemd[1]: Timed out waiting for device dev-xvdb.device. 
-- Subject: Unit dev-xvdb.device has failed 
-- Defined-By: systemd 
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel 
-- 
-- Unit dev-xvdb.device has failed. 
-- 
-- The result is timeout. 
Apr 04 15:03:40 my-ip systemd[1]: Dependency failed for File System Check on /dev/xvdb. 
-- Subject: Unit [email protected] has failed 
-- Defined-By: systemd 
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel 
-- 
-- Unit [email protected] has failed. 
-- 
-- The result is dependency. 
Apr 04 15:03:40 my-ip systemd[1]: Dependency failed for /mnt. 
-- Subject: Unit mnt.mount has failed 
-- Defined-By: systemd 
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel 
-- 
-- Unit mnt.mount has failed. 
-- 
-- The result is dependency. 
Apr 04 15:03:40 my-ip systemd[1]: mnt.mount: Job mnt.mount/start failed with result 'dependency'. 
Apr 04 15:03:40 my-ip systemd[1]: [email protected]: Job [email protected]/start failed with result 'dependency'. 
Apr 04 15:03:40 my-ip systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start failed with result 'timeout'. 

我想:

sudo curl --dump - -X GET http://127.0.0.1:8529/_api/version && echo 

它给我:

HTTP/1.1 401 Unauthorized 
Www-Authenticate: Bearer token_type="JWT", realm="ArangoDB" 
Server: ArangoDB 
Connection: Keep-Alive 
Content-Type: text/plain; charset=utf-8 
Content-Length: 0 

我想:

ps auxw | fgrep arangod 

它给我:

root  10439 0.0 0.1 82772 8664 ?  Ss 10:09 0:00 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor 
arangodb 10440 5.7 94.5 12901776 7242340 ? Sl 10:09 16:36 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor 
ubuntu 11339 0.0 0.0 12916 1000 pts/0 R+ 14:59 0:00 grep -F --color=auto arangod 

arangod重启给我:

2017-04-04T15:01:16Z [11344] INFO ArangoDB 3.1.16 [linux] 64bit, using VPack 0.1.30, ICU 54.1, V8 5.0.71.39, OpenSSL 1.0.2g 1 Mar 2016 
2017-04-04T15:01:16Z [11344] INFO using SSL options: SSL_OP_CIPHER_SERVER_PREFERENCE, SSL_OP_TLS_ROLLBACK_BUG 
2017-04-04T15:01:16Z [11344] FATAL could not open shutdown file '/var/log/arangodb3/restart/SHUTDOWN': internal error 

'服务arangodb3重启' 给我(之后短暂的等待时间):

Job for arangodb3.service failed because the control process exited with error code. See "systemctl status arangodb3.service" and "journalctl -xe" for details. 

'systemctl状态arangodb3.service' 给我:

arangodb3.service - LSB: arangodb 
Loaded: loaded (/etc/init.d/arangodb3; bad; vendor preset: enabled) 
Active: failed (Result: exit-code) since Tue 2017-04-04 15:03:10 UTC; 34s ago 
Docs: man:systemd-sysv-generator(8) 
Process: 11352 ExecStop=/etc/init.d/arangodb3 stop (code=exited, status=0/SUCCESS) 
Process: 11481 ExecStart=/etc/init.d/arangodb3 start (code=exited, status=1/FAILURE) 

Tasks: 83 

Memory: 6.5G 

CPU: 73ms 
CGroup: /system.slice/arangodb3.service 
├─10439 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor 
└─10440 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor 
Apr 04 15:03:10 my-ip systemd[1]: Starting LSB: arangodb... 
Apr 04 15:03:10 my-ip arangodb3[11481]: * Starting arango database server arangod 
Apr 04 15:03:10 my-ip arangodb3[11481]: * database version check failed, maybe you need to run 'upgrade'? 
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Control process exited, code=exited status=1 
Apr 04 15:03:10 my-ip systemd[1]: Failed to start LSB: arangodb. 
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Unit entered failed state. 

回答

1

从你的日志输出似乎安装的磁盘卷消失。

如果存储在任何类型的数据库中消失,则没有合理的方法继续工作。

因此,您所看到的效果是,ArangoDB无法再处理其数据 - 从它的角度来看,它根本就不存在。

其他人观察到的一个效应是AWS上的I/O信用枯竭了,这也可能是您在上面看到的原因。

https://aws.amazon.com/blogs/aws/new-burst-balance-metric-for-ec2s-general-purpose-ssd-gp2-volumes/

如果我得到了正确的,你可以得到更多的积分,如果你选择一个更大的音量大小。如果这没有帮助,您可能需要降低测试场景,或者选择不受I/O操作限制的其他托管方法。

+0

谢谢!我会尝试一个具有更好I/O的服务器并对此提供反馈 – user6403833

相关问题