2012-03-30 151 views
3

我们有一个间歇性的问题挂在工作本身完成后的奴隶。在后处理步骤(?)中,我们看到的是控制台日志有以下行:詹金斯奴隶挂/詹金斯楔形

Description set: vap_current_iter-2012_03_29_19_01_03 

然后什么也没有。通常情况下,它看起来就像这样:

Description set: prod_pull-2012_03_28_19_01_03 
Notifying upstream build armada_Launch_prod_pull #13 of job completion 
Project armada_Launch_prod_pull still waiting for 1 builds to complete 
Notifying upstream projects of job completion 
Notifying upstream of completion: armada_Launch_prod_pull #13 
Finished: SUCCESS 

我设置一个记录器为hudson.model.Run,它目前有这样的:

at java.lang.Thread.run(Thread.java:619) 

Mar 30, 2012 12:44:00 PM hudson.model.Run run 
INFO: galleon_allUnit #1134 main build action completed: SUCCESS 
Mar 30, 2012 12:44:00 PM hudson.model.Run setResult 
FINE: galleon_allUnit #1134 : result is set to SUCCESS 
java.lang.Exception 
    at hudson.model.Run.setResult(Run.java:352) 
    at hudson.model.Run.run(Run.java:1410) 
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 
    at hudson.model.ResourceController.execute(ResourceController.java:88) 
    at hudson.model.Executor.run(Executor.java:238) 

重复每一个挂奴隶。

主哈德森日志没有任何附加信息。

断开从站不起作用。

试图做一个有序关闭詹金斯没有任何影响(jenkins实际上似乎挂在关机)。

我们发现恢复的唯一方法是杀死tomcat进程。

胎面转储奴隶之一(他们都是一样的)是:

Thread Dump 
Channel reader thread: channel 

"Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native) 
    at java.io.FileInputStream.readBytes(Native Method) 
    at java.io.FileInputStream.read(FileInputStream.java:199) 
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) 
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237) 
    - locked [email protected] 
    at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) 
    at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) 
    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) 
    at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) 


main 

"main" Id=1 Group=main WAITING on [email protected] 
    at java.lang.Object.wait(Native Method) 
    - waiting on [email protected] 
    at java.lang.Object.wait(Object.java:485) 
    at hudson.remoting.Channel.join(Channel.java:766) 
    at hudson.remoting.Launcher.main(Launcher.java:420) 
    at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) 
    at hudson.remoting.Launcher.run(Launcher.java:206) 
    at hudson.remoting.Launcher.main(Launcher.java:168) 


Ping thread for channel [email protected]:channel 

"Ping thread for channel [email protected]:channel" Id=10 Group=main TIMED_WAITING 
    at java.lang.Thread.sleep(Native Method) 
    at hudson.remoting.PingThread.run(PingThread.java:86) 


Pipe writer thread: channel 

"Pipe writer thread: channel" Id=12 Group=main WAITING on java.u[email protected]14263ed 
    at sun.misc.Unsafe.park(Native Method) 
    - waiting on java.u[email protected]14263ed 
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) 
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) 
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
    at java.lang.Thread.run(Thread.java:619) 


pool-1-thread-267 

"pool-1-thread-267" Id=285 Group=main RUNNABLE 
    at sun.management.ThreadImpl.dumpThreads0(Native Method) 
    at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) 
    at hudson.Functions.getThreadInfos(Functions.java:872) 
    at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93) 
    at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89) 
    at hudson.remoting.UserRequest.perform(UserRequest.java:118) 
    at hudson.remoting.UserRequest.perform(UserRequest.java:48) 
    at hudson.remoting.Request$2.run(Request.java:287) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
    at java.lang.Thread.run(Thread.java:619) 

    Number of locked synchronizers = 1 
    - [email protected] 


Finalizer 

"Finalizer" Id=3 Group=system WAITING on [email protected] 
    at java.lang.Object.wait(Native Method) 
    - waiting on [email protected] 
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) 
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) 
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) 


Reference Handler 

"Reference Handler" Id=2 Group=system WAITING on [email protected] 
    at java.lang.Object.wait(Native Method) 
    - waiting on [email protected] 
    at java.lang.Object.wait(Object.java:485) 
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) 


Signal Dispatcher 

"Signal Dispatcher" Id=4 Group=system RUNNABLE 

就如何更好地恢复或防止这种情况的任何想法,将不胜感激。

+0

讨厌。什么是操作系统? – 2012-03-30 19:44:03

+0

看起来像一个错误。 [报告](https://wiki.jenkins-ci.org/display/JENKINS/Issue+Tracking)。 – 2012-03-31 20:52:15

+0

我们在所有盒子上运行linux(RHEL 5)。 – 2012-04-03 13:56:08

回答

0

我们诚实地写了一个脚本,每天下午4点重新启动jenkins。我们发现我们的破损发生在凌晨3点左右,大概需要半小时左右。由于此时重新启动服务器,我们还没有看到任何进一步的挂起。这是一种防止问题的方法,虽然它不会明显“修复”问题!

+0

我们尝试过 - 没有运气。没有停止tomcat并等待10至15分钟才重新启动,没有任何修复它。而且,由于这里的目标是24小时制,所以每天重新启动不是一种选择。 – 2012-04-03 13:56:47