2
我收到kubernetes上的DNS插件服务错误。Kubernetes DNS错误列表服务和端点
如果我运行这个命令我看到KUBE-DNS服务正在重新启动:
kubectl get pods --namespace=kube-system -o wide
当我得到的日志:
kubectl logs kube-dns-v9-7mi17 -c kube2sky --namespace=kube-system
我得到这个幸福重复多次:
E0305 04:39:30.837572 1 reflector.go:136] Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints: dial tcp 10.3.0.1:443: i/o timeout
E0305 04:39:30.948322 1 reflector.go:136] Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services: dial tcp 10.3.0.1:443: i/o timeout
E0305 04:40:01.838219 1 reflector.go:136] Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints: dial tcp 10.3.0.1:443: i/o timeout
E0305 04:40:01.948954 1 reflector.go:136] Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services: dial tcp 10.3.0.1:443: i/o timeout
kubernetes服务已分配了虚拟IP,但kubernetes的端点具有该服务的真实IP。 DNS服务不应该尝试使用端点IP而不是虚拟IP来联系API服务器吗?
这是我使用创建DNS服务的定义:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.3.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
这对于DNS复制控制器:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v9
namespace: kube-system
labels:
k8s-app: kube-dns
version: v9
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v9
template:
metadata:
labels:
k8s-app: kube-dns
version: v9
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd:2.0.9
resources:
limits:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.11
resources:
limits:
cpu: 100m
memory: 50Mi
args:
# command = "/kube2sky"
- -domain=cluster.local
- name: skydns
image: gcr.io/google_containers/skydns:2015-03-11-001
resources:
limits:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://localhost:4001
- -addr=0.0.0.0:53
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 1
timeoutSeconds: 5
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
limits:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default
使用端点IP击败甚至创建服务的目的。 kube2sky应该使用服务IP,并且节点上的kube-proxy应该在容器外插入规则,拦截服务ip的请求并将其发送到适当的端点。因此,您的容器中的日志应显示服务IP。 kube-proxy是否在节点上运行?如果是这样,你有一个规则,拦截流量到10.3.0.10(在iptables-save中查找-d 10.3.0.10) –
@PrashanthB我不知道发生了什么,我最终重新创建了集群并且它工作正常。我第一次在证书上遇到了一些问题,也许这是导致一些问题的原因。谢谢。 – user1845791