2017-09-02 83 views
0

我遇到cAdvisor问题,在查询指标端点时,并非所有指标都可靠地返回。具体而言,通过Prometheus查询container_fs_limit_bytes{device=~"^/dev/.*$",id="/",kubernetes_io_hostname=~"^.*"}经常只显示我的Kubernetes集群中一小部分节点的结果。这种情况发生在相应指标未超过5分钟(由于度量标准变为stale)时,但我不确定为什么每次成功查询端点时都没有显示所有指标。Kubelet的cAdvisor指标端点不能可靠地返回所有指标

一次又一次地卷起端点显示某些度量仅在特定时间返回,因此上述普罗米修斯查询将返回所有节点的数据,只有在最后5分钟内发生一次刮擦时,不是这样的。

一种解决方法是在超过5分钟的较长时间段内取平均值,但这并不理想。

kubectl版本:

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} 
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} 

普罗米修斯版本:1.7.1

普罗米修斯配置:

global: 
    scrape_interval: 15s 
    scrape_timeout: 10s 
    evaluation_interval: 1m 
alerting: 
    alertmanagers: 
    - static_configs: 
    - targets: 
     - alertmanager:9093 
    scheme: http 
    timeout: 10s 
rule_files: 
- /etc/prometheus-rules/alert.rules 
scrape_configs: 
- job_name: kubernetes-nodes 
    scrape_interval: 15s 
    scrape_timeout: 10s 
    metrics_path: /metrics 
    scheme: https 
    kubernetes_sd_configs: 
    - api_server: null 
    role: node 
    namespaces: 
     names: [] 
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 
    tls_config: 
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 
    insecure_skip_verify: false 
    relabel_configs: 
    - source_labels: [] 
    separator: ; 
    regex: __meta_kubernetes_node_label_(.+) 
    replacement: $1 
    action: labelmap 
    - source_labels: [] 
    separator: ; 
    regex: (.*) 
    target_label: __address__ 
    replacement: kubernetes.default.svc:443 
    action: replace 
    - source_labels: [__meta_kubernetes_node_name] 
    separator: ; 
    regex: (.+) 
    target_label: __metrics_path__ 
    replacement: /api/v1/nodes/${1}:4194/proxy/metrics 
    action: replace 
    metric_relabel_configs: 
    - source_labels: [id] 
    separator: ; 
    regex: ^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$ 
    target_label: rkt_container_name 
    replacement: ${2}-${1} 
    action: replace 
    - source_labels: [id] 
    separator: ; 
    regex: ^/system\.slice/(.+)\.service$ 
    target_label: systemd_service_name 
    replacement: ${1} 
    action: replace 

回答

2

这是cAdvisor如何使用普罗米修斯客户端库一known bug