在诊断Kubernetes集群问题的时候,我们经常注意到集群中某一节点在闪烁*,而这通常是随机的且以奇怪的方式发生。这就是为什么我们一直需要一种工具,它可以测试一个节点与另一个节点之间的可达性,并以Prometheus度量形式呈现结果。有了这个工具,我们还希望在Grafana中创建图表并快速定位发生故障的节点(并在必要时将该节点上所有Pod进行重新调度并进行必要的维护)。
“闪烁”这里我是指某个节点随机变为“NotReady”但之后又恢复正常的某种行为。例如部分流量可能无法到达相邻节点上的Pod。
为什么会发生这种情况?常见原因之一是数据中心交换机中的连接问题。例如,我们曾经在Hetzner中设置一个vswitch,其中一个节点已无法通过该vswitch端口使用,并且恰好在本地网络上完全不可访问。
我们的最后一个要求是可直接在Kubernetes中运行此服务,因此我们将能够通过Helm图表部署所有内容。(例如在使用Ansible的情况下,我们必须为各种环境中的每个角色定义角色:AWS、GCE、裸机等)。由于我们尚未找到针对此环境的现成解决方案,因此我们决定自己来实现。
脚本和配置
我们解决方案的主要组件是一个脚本,该脚本监视每个节点的.status.addresses值。如果某个节点的该值已更改(例如添加了新节点),则我们的脚本使用Helm value方式将节点列表以ConfigMap的形式传递给Helm图表:
apiVersion: v1
kind: ConfigMap
metadata:
name: ping-exporter-config
namespace: d8-system
data:
nodes.json: >
{{ .Values.pingExporter.targets | toJson }}
.Values.pingExporter.targets类似以下:
"cluster_targets":[{"ipAddress":"192.168.191.11","name":"kube-a-3"},{"ipAddress":"192.168.191.12","name":"kube-a-2"},{"ipAddress":"192.168.191.22","name":"kube-a-1"},{"ipAddress":"192.168.191.23","name":"kube-db-1"},{"ipAddress":"192.168.191.9","name":"kube-db-2"},{"ipAddress":"51.75.130.47","name":"kube-a-4"}],"external_targets":[{"host":"8.8.8.8","name":"google-dns"},{"host":"youtube.com"}]}
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
下面是Python脚本:
#!/usr/bin/env python3
import subprocess
import prometheus_client
import re
import statistics
import os
import json
import glob
import better_exchook
import datetime
better_exchook.install()
FPING_CMDLINE = "/usr/sbin/fping -p 1000 -C 30 -B 1 -q -r 1".split(" ")
FPING_REGEX = re.compile(r"^(\S*)\s*: (.*)$", re.MULTILINE)
CONFIG_PATH = "/config/targets.json"
registry = prometheus_client.CollectorRegistry()
prometheus_exceptions_counter = \
prometheus_client.Counter('kube_node_ping_exceptions', 'Total number of exceptions', [], registry=registry)
prom_metrics_cluster = {"sent": prometheus_client.Counter('kube_node_ping_packets_sent_total',
'ICMP packets sent',
['destination_node', 'destination_node_ip_address'],
registry=registry),
"received": prometheus_client.Counter('kube_node_ping_packets_received_total',
'ICMP packets received',
['destination_node', 'destination_node_ip_address'],
registry=registry),
"rtt": prometheus_client.Counter('kube_node_ping_rtt_milliseconds_total',
'round-trip time',
['destination_node', 'destination_node_ip_address'],
registry=registry),
"min": prometheus_client.Gauge('kube_node_ping_rtt_min', 'minimum round-trip time',
['destination_node', 'destination_node_ip_address'],
registry=registry),
"max": prometheus_client.Gauge('kube_node_ping_rtt_max', 'maximum round-trip time',
['destination_node', 'destination_node_ip_address'],
registry=registry),
"mdev": prometheus_client.Gauge('kube_node_ping_rtt_mdev',
'mean deviation of round-trip times',
['destination_node', 'destination_node_ip_address'],
registry=registry)}
prom_metrics_external = {"sent": prometheus_client.Counter('external_ping_packets_sent_total',
'ICMP packets sent',
['destination_name', 'destination_host'],
registry=registry),
"received": prometheus_client.Counter('external_ping_packets_received_total',
'ICMP packets received',
['destination_name', 'destination_host'],
registry=registry),
"rtt": prometheus_client.Counter('external_ping_rtt_milliseconds_total',
'round-trip time',
['destination_name', 'destination_host'],
registry=registry),
"min": prometheus_client.Gauge('external_ping_rtt_min', 'minimum round-trip time',
['destination_name', 'destination_host'],
registry=registry),
"max": prometheus_client.Gauge('external_ping_rtt_max', 'maximum round-trip time',
['destination_name', 'destination_host'],
registry=registry),
"mdev": prometheus_client.Gauge('external_ping_rtt_mdev',
'mean deviation of round-trip times',
['destination_name', 'destination_host'],
registry=registry)}
def validate_envs():
envs = {"MY_NODE_NAME": os.getenv("MY_NODE_NAME"), "PROMETHEUS_TEXTFILE_DIR": os.getenv("PROMETHEUS_TEXTFILE_DIR"),
"PROMETHEUS_TEXTFILE_PREFIX": os.getenv("PROMETHEUS_TEXTFILE_PREFIX")}
for k, v in envs.items():
if not v:
raise ValueError("{} environment variable is empty".format(k))
return envs
@prometheus_exceptions_counter.count_exceptions()
def compute_results(results):
computed = {}
matches = FPING_REGEX.finditer(results)
for match in matches:
host = match.group(1)
ping_results = match.group(2)
if "duplicate" in ping_results:
continue
splitted = ping_results.split(" ")
if len(splitted) != 30:
raise ValueError("ping returned wrong number of results: \"{}\"".format(splitted))
positive_results = [float(x) for x in splitted if x != "-"]
if len(positive_results) > 0:
computed[host] = {"sent": 30, "received": len(positive_results),
"rtt": sum(positive_results),
"max": max(positive_results), "min": min(positive_results),
"mdev": statistics.pstdev(positive_results)}
else:
computed[host] = {"sent": 30, "received": len(positive_results), "rtt": 0,
"max": 0, "min": 0, "mdev": 0}
if not len(computed):
raise ValueError("regex match\"{}\" found nothing in fping output \"{}\"".format(FPING_REGEX, results))
return computed
@prometheus_exceptions_counter.count_exceptions()
def call_fping(ips):
cmdline = FPING_CMDLINE + ips
process = subprocess.run(cmdline, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, universal_newlines=True)
if process.returncode == 3:
raise ValueError("invalid arguments: {}".format(cmdline))
if process.returncode == 4:
raise OSError("fping reported syscall error: {}".format(process.stderr))
return process.stdout
envs = validate_envs()
files = glob.glob(envs["PROMETHEUS_TEXTFILE_DIR"] + "*")
for f in files:
os.remove(f)
labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}
while True:
with open(CONFIG_PATH, "r") as f:
config = json.loads(f.read())
config["external_targets"] = [] if config["external_targets"] is None else config["external_targets"]
for target in config["external_targets"]:
target["name"] = target["host"] if "name" not in target.keys() else target["name"]
if labeled_prom_metrics["cluster_targets"]:
for metric in labeled_prom_metrics["cluster_targets"]:
if (metric["node_name"], metric["ip"]) not in [(node["name"], node["ipAddress"]) for node in config['cluster_targets']]:
for k, v in prom_metrics_cluster.items():
v.remove(metric["node_name"], metric["ip"])
if labeled_prom_metrics["external_targets"]:
for metric in labeled_prom_metrics["external_targets"]:
if (metric["target_name"], metric["host"]) not in [(target["name"], target["host"]) for target in config['external_targets']]:
for k, v in prom_metrics_external.items():
v.remove(metric["target_name"], metric["host"])
labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}
for node in config["cluster_targets"]:
metrics = {"node_name": node["name"], "ip": node["ipAddress"], "prom_metrics": {}}
for k, v in prom_metrics_cluster.items():
metrics["prom_metrics"][k] = v.labels(node["name"], node["ipAddress"])
labeled_prom_metrics["cluster_targets"].append(metrics)
for target in config["external_targets"]:
metrics = {"target_name": target["name"], "host": target["host"], "prom_metrics": {}}
for k, v in prom_metrics_external.items():
metrics["prom_metrics"][k] = v.labels(target["name"], target["host"])
labeled_prom_metrics["external_targets"].append(metrics)
out = call_fping([prom_metric["ip"] for prom_metric in labeled_prom_metrics["cluster_targets"]] + \
[prom_metric["host"] for prom_metric in labeled_prom_metrics["external_targets"]])
computed = compute_results(out)
for dimension in labeled_prom_metrics["cluster_targets"]:
result = computed[dimension["ip"]]
dimension["prom_metrics"]["sent"].inc(computed[dimension["ip"]]["sent"])
dimension["prom_metrics"]["received"].inc(computed[dimension["ip"]]["received"])
dimension["prom_metrics"]["rtt"].inc(computed[dimension["ip"]]["rtt"])
dimension["prom_metrics"]["min"].set(computed[dimension["ip"]]["min"])
dimension["prom_metrics"]["max"].set(computed[dimension["ip"]]["max"])
dimension["prom_metrics"]["mdev"].set(computed[dimension["ip"]]["mdev"])
for dimension in labeled_prom_metrics["external_targets"]:
result = computed[dimension["host"]]
dimension["prom_metrics"]["sent"].inc(computed[dimension["host"]]["sent"])
dimension["prom_metrics"]["received"].inc(computed[dimension["host"]]["received"])
dimension["prom_metrics"]["rtt"].inc(computed[dimension["host"]]["rtt"])
dimension["prom_metrics"]["min"].set(computed[dimension["host"]]["min"])
dimension["prom_metrics"]["max"].set(computed[dimension["host"]]["max"])
dimension["prom_metrics"]["mdev"].set(computed[dimension["host"]]["mdev"])
prometheus_client.write_to_textfile(
envs["PROMETHEUS_TEXTFILE_DIR"] + envs["PROMETHEUS_TEXTFILE_PREFIX"] + envs["MY_NODE_NAME"] + ".prom", registry)
- 1.
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.
- 99.
- 100.
- 101.
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.
- 110.
- 111.
- 112.
- 113.
- 114.
- 115.
- 116.
- 117.
- 118.
- 119.
- 120.
- 121.
- 122.
- 123.
- 124.
- 125.
- 126.
- 127.
- 128.
- 129.
- 130.
- 131.
- 132.
- 133.
- 134.
- 135.
- 136.
- 137.
- 138.
- 139.
- 140.
- 141.
- 142.
- 143.
- 144.
- 145.
- 146.
- 147.
- 148.
- 149.
- 150.
- 151.
- 152.
- 153.
- 154.
- 155.
- 156.
- 157.
- 158.
- 159.
- 160.
- 161.
- 162.
- 163.
- 164.
- 165.
- 166.
- 167.
- 168.
- 169.
- 170.
- 171.
- 172.
- 173.
- 174.
- 175.
- 176.
- 177.
- 178.
- 179.
- 180.
- 181.
- 182.
- 183.
- 184.
- 185.
- 186.
- 187.
- 188.
- 189.
- 190.
- 191.
- 192.
- 193.
- 194.
该脚本在每个Kubernetes节点上运行,并且每秒两次发送ICMP数据包到Kubernetes集群的所有实例。收集的结果会存储在文本文件中。
该脚本会包含在Docker镜像中:
FROM python:3.6-alpine3.8
COPY rootfs /
WORKDIR /app
RUN pip3 install --upgrade pip && pip3 install -r requirements.txt && apk add --no-cache fping
ENTRYPOINT ["python3", "/app/ping-exporter.py"]
- 1.
- 2.
- 3.
- 4.
- 5.
另外,我们还创建了一个ServiceAccount和一个具有唯一权限的对应角色用于获取节点列表(这样我们就可以知道它们的IP地址):
apiVersion: v1
kind: ServiceAccount
metadata:
name: ping-exporter
namespace: d8-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: d8-system:ping-exporter
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: d8-system:kube-ping-exporter
subjects:
- kind: ServiceAccount
name: ping-exporter
namespace: d8-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: d8-system:ping-exporter
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
最后,我们需要DaemonSet来运行在集群中的所有实例:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ping-exporter
namespace: d8-system
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
name: ping-exporter
template:
metadata:
labels:
name: ping-exporter
spec:
terminationGracePeriodSeconds: 0
tolerations:
- operator: "Exists"
hostNetwork: true
serviceAccountName: ping-exporter
priorityClassName: cluster-low
containers:
- image: private-registry.flant.com/ping-exporter/ping-exporter:v1
name: ping-exporter
env:
- name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PROMETHEUS_TEXTFILE_DIR
value: /node-exporter-textfile/
- name: PROMETHEUS_TEXTFILE_PREFIX
value: ping-exporter_
volumeMounts:
- name: textfile
mountPath: /node-exporter-textfile
- name: config
mountPath: /config
volumes:
- name: textfile
hostPath:
path: /var/run/node-exporter-textfile
- name: config
configMap:
name: ping-exporter-config
imagePullSecrets:
- name: private-registry
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
该解决方案的最后操作细节是:
- Python脚本执行时,其结果(即存储在主机上/var/run/node-exporter-textfile目录中的文本文件)将传递到DaemonSet类型的node-exporter。
- node-exporter使用--collector.textfile.directory /host/textfile参数启动,这里的/host/textfile是hostPath目录/var/run/node-exporter-textfile。(你可以点击这里了解关于node-exporter中文本文件收集器的更多信息。)
- 最后node-exporter读取这些文件,然后Prometheus从node-exporter实例上收集所有数据。
那么结果如何?
现在该来享受期待已久的结果了。指标创建之后,我们可以使用它们,当然也可以对其进行可视化。以下可以看到它们是怎样的。
首先,有一个通用选择器可让我们在其中选择节点以检查其“源”和“目标”连接。你可以获得一个汇总表,用于在Grafana仪表板中指定的时间段内ping选定节点的结果:
以下是包含有关选定节点的组合统计信息的图形:
另外,我们有一个记录列表,其中每个记录都链接到在“源”节点中选择的每个特定节点的图:
如果将记录展开,你将看到从当前节点到目标节点中已选择的所有其他节点的详细ping统计信息:
下面是相关的图形:
节点之间的ping出现问题的图看起来如何?
如果你在现实生活中观察到类似情况,那就该进行故障排查了!
最后,这是我们对外部主机执行ping操作的可视化效果:
我们可以检查所有节点的总体视图,也可以仅检查任何特定节点的图形:
当你观察到仅影响某些特定节点的连接问题时,这可能会有所帮助。