2016-09-20 113 views
1

我看到下面kubernetes集成测试失败相当一致,时间约90%的RHEL 7.2,Fedora的24,并CentOS7.1:Kubernetes复制控制器集成测试失败

test/integration/garbagecollector 
test/integration/replicationcontroller 

他们似乎由于etcd失败。我的在线查询让我相信这也可能包含一个apiserver问题。我的设置很简单,我安装/启动docker,安装go,从github克隆kubernetes回购,使用回购中的hack/install-etcd.sh并将其添加到路径,获取银杏,gomega和go-bindata,然后运行'进行测试集成“。我不会手动更改任何内容或添加任何自定义文件/配置。有没有人遇到这些问题,并知道解决方案?我在网上看到的这个问题的唯一提及被认为是片状的,没有列出的解决方案,但几乎每一次测试都会遇到这个问题。错误的片下面,如果需要的话我可以给更多:

垃圾收集器:

\*many lines from garbagecollector.go that look good* 

I0920 14:42:39.725768 11823 garbagecollector.go:479] create storage for resource { v1 secrets} 

I0920 14:42:39.725786 11823 garbagecollector.go:479] create storage for resource { v1 serviceaccounts} 

I0920 14:42:39.725803 11823 garbagecollector.go:479] create storage for resource { v1 services} 

I0920 14:43:09.565529 11823 trace.go:61] Trace "List *rbac.ClusterRoleList" (started 2016-09-20 14:42:39.565113203 -0400 EDT): 

[2.564µs] [2.564µs] About to list etcd node 

[30.000353492s] [30.000350928s] Etcd node listed 

[30.000361771s] [8.279µs] END 

E0920 14:43:09.566770 11823 cacher.go:258] unexpected ListAndWatch error: pkg/storage/cacher.go:198: Failed to list *rbac.RoleBinding: client: etcd cluster is unavailable or misconfigured 

\*repeats over and over with different thing failed to list* 

复制控制器:

I0920 14:35:16.907283 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907293 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907298 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907303 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907307 10482 replication_controller.go:481] replication controller worker shutting down 

E0920 14:35:16.948417 10482 util.go:45] Metric for replication_controller already registered 

--- FAIL: TestUpdateLabelToBeAdopted (30.07s) 

replicationcontroller_test.go:270: Failed to create replication controller rc: Timeout: request did not complete within allowed duration 

E0920 14:44:06.820506 12053 storage_rbac.go:116] unable to initialize clusterroles: client: etcd cluster is unavailable or misconfigured 

有在/ var没有文件/记录,即使启动与kube。

在此先感谢!

+1

在你的主人的etcd日志上显示什么有趣的东西? “etcd集群不可用或配置错误”消息表明,在etcd中可能出现问题。 –

+0

集成测试正在运行时,我在测试期间得到以下结果: 集群正常 成员ce2a822cea30bfca健康:从http://127.0.0.1:2379获得健康结果但由于测试失败开始,我得到 群集可能不健康:未能列出成员 错误:客户端:etcd群集不可用或配置错误 错误#0:客户端:端点http://127.0.0.1:2379超过标头超时 错误#1:拨号tcp 127.0.0.1:4001:getsockopt:连接被拒绝 我试过运行etcdctl --no-sync但没有帮助 –

+0

我在失败的测试中也发现了这个输出: etcdserver:80%的文件记述使用tor极限[used = 886,limit = 1024] –

回答

0

我增加了文件描述符的数量限制,并且自此以后就没有看到过这个问题。所以,要继续解决这个问题