networking - Prometheus Pod无法调用apiserver端点-6ren

networking - Prometheus Pod无法调用apiserver端点

转载作者：行者123 更新时间：2023-12-02 12:28:10

24

4

我正在尝试通过helm install stable/prometheus将监视堆栈(prometheus + alertmanager + node_exporter等)设置到我设置的树莓派k8s集群(1个主节点+ 3个工作节点)上。

设法使所有必需的Pod运行。

pi-monitoring-prometheus-alertmanager-767cd8bc65-89hxt   2/2     Running            0          131m    10.17.2.56      kube2   <none>           <none>
pi-monitoring-prometheus-node-exporter-h86gt             1/1     Running            0          131m    192.168.1.212   kube2   <none>           <none>
pi-monitoring-prometheus-node-exporter-kg957             1/1     Running            0          131m    192.168.1.211   kube1   <none>           <none>
pi-monitoring-prometheus-node-exporter-x9wgb             1/1     Running            0          131m    192.168.1.213   kube3   <none>           <none>
pi-monitoring-prometheus-pushgateway-799d4ff9d6-rdpkf    1/1     Running            0          131m    10.17.3.36      kube1   <none>           <none>
pi-monitoring-prometheus-server-5d989754b6-gp69j         2/2     Running            0          98m     10.17.1.60      kube3   <none>           <none>

但是，在将端口转发到Prometheus服务器端口9090并导航到 Targets页面之后，我意识到没有任何node_exporters被注册。

浏览日志，我发现了这个

evel=error ts=2020-04-12T05:15:05.083Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:333: Failed to list *v1.Node: Get https://10.18.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.18.0.1:443: i/o timeout"
level=error ts=2020-04-12T05:15:05.084Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:299: Failed to list *v1.Service: Get https://10.18.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.18.0.1:443: i/o timeout"
level=error ts=2020-04-12T05:15:05.084Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:261: Failed to list *v1.Endpoints: Get https://10.18.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.18.0.1:443: i/o timeout"
level=error ts=2020-04-12T05:15:05.085Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:262: Failed to list *v1.Service: Get https://10.18.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.18.0.1:443: i/o timeout"

问题:为什么Prometheus Pod无法调用apiserver端点？不太确定配置在哪里做错了

跟随 debug guide，实现的单个节点无法解析其他节点上的服务。

在过去1天的故障排除过程中，阅读了各种资料，但老实说，我什至不知道从哪里开始。

这些是在 kube-system命名空间中运行的Pod。希望这可以更好地了解我的系统的设置方式。

pi@kube4:~ $ kubectl get pods -n kube-system -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP              NODE    NOMINATED NODE   READINESS GATES
coredns-66bff467f8-nzvq8        1/1     Running   0          13d   10.17.0.2       kube4   <none>           <none>
coredns-66bff467f8-z7wdb        1/1     Running   0          13d   10.17.0.3       kube4   <none>           <none>
etcd-kube4                      1/1     Running   0          13d   192.168.1.214   kube4   <none>           <none>
kube-apiserver-kube4            1/1     Running   2          13d   192.168.1.214   kube4   <none>           <none>
kube-controller-manager-kube4   1/1     Running   2          13d   192.168.1.214   kube4   <none>           <none>
kube-flannel-ds-arm-8g9fb       1/1     Running   1          13d   192.168.1.212   kube2   <none>           <none>
kube-flannel-ds-arm-c5qt9       1/1     Running   0          13d   192.168.1.214   kube4   <none>           <none>
kube-flannel-ds-arm-q5pln       1/1     Running   1          13d   192.168.1.211   kube1   <none>           <none>
kube-flannel-ds-arm-tkmn6       1/1     Running   1          13d   192.168.1.213   kube3   <none>           <none>
kube-proxy-4zjjh                1/1     Running   0          13d   192.168.1.213   kube3   <none>           <none>
kube-proxy-6mk2z                1/1     Running   0          13d   192.168.1.211   kube1   <none>           <none>
kube-proxy-bbr8v                1/1     Running   0          13d   192.168.1.212   kube2   <none>           <none>
kube-proxy-wfsbm                1/1     Running   0          13d   192.168.1.214   kube4   <none>           <none>
kube-scheduler-kube4            1/1     Running   3          13d   192.168.1.214   kube4   <none>           <none>

最佳答案

Flannel documentation状态:

NOTE: If kubeadm is used, then pass --pod-network-cidr=10.244.0.0/16 to kubeadm init to ensure that the podCIDR is set.

这是因为默认情况下，法兰绒ConfigMap配置为可在 "Network": "10.244.0.0/16"上使用

您已经使用 --pod-network-cidr=10.17.0.0/16配置了kubeadm，现在需要在法兰绒ConfigMap kube-flannel-cfg中对其进行配置，如下所示:

kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.17.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

感谢 @kitt的调试帮助。

关于networking - Prometheus Pod无法调用apiserver端点，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61168194/

24

4

0

文章推荐： java - objectMapper 将印地语文本转换为特殊字符 "???"

文章推荐： clojure - 如何找出一组map中的所有key？

文章推荐： plugins - Maven2 如何知道在哪里可以找到插件？

文章推荐： java - 如何将byte[]图像渲染到JSP？

prometheus - 当 Prometheus 关闭时，Prometheus 客户端库的行为如何？
我是普罗米修斯的新手。根据我到目前为止所阅读和尝试的内容，Prometheus 客户端库通过 HTTP 公开收集的指标，Prometheus 定期读取(抓取)。在 Prometheus 抓取指标之前
prometheus - 具有单个 Prometheus-Operator 的多个 Prometheus 实例
我们正在寻求实现监控和警报解决方案，我们希望为每个功能单元提供自己的 prometheus 实例。目前我们通过 prometheus-operator 使用单个 prometheus 实例运行它，但
prometheus - Prometheus 中的标签是什么？
在 Prometheus 中，有标签柯里化(Currying)。一些示例方法类似于 CurryWith()。这是什么意思？抱歉，我没有找到这方面的任何文档。问题可能与英语不是我的母语有关，我从函数
prometheus - 平均内存使用查询 - Prometheus
如何编写一个查询来输出过去 24 小时内实例的平均内存使用情况？以下查询显示当前内存使用情况 100 * (1 - ((node_memory_MemFree + node_memory_Cache
prometheus - Prometheus 中某些标签中的数据缺失时发出警报
我正在向 prometheus 发送与两个磁盘相关的数据。我想提醒一个磁盘的指标是否停止发送指标。假设我有 diskA 和 diskB，我正在收集 disk_up 指标。现在diskB失败了。在普罗米
prometheus - Prometheus 中的高基数标签有多危险？
我正在考虑将一些指标导出到 Prometheus，但我对我计划做的事情感到紧张。我的系统由一个工作流引擎组成，我想跟踪工作流中每个步骤的一些指标。这似乎是合理的，有一个名为 wfengine_ste
prometheus - prometheus 值中的环境变量
我想根据 prometheus 值文件中的环境 qa/prod 设置环境特定的值 ## Additional alertmanager container environment variable
prometheus - Prometheus - 查询以获得事件增加的百分比
我有一个包含路径和状态代码的请求直方图...如果过去一小时内的错误比前一小时增加了 20%，我如何发出警报？一个指标示例: {instance="someIp",instance_hostname=
prometheus - Prometheus - 查询以获得事件增加的百分比
我有一个包含路径和状态代码的请求直方图...如果过去一小时内的错误比前一小时增加了 20%，我如何发出警报？一个指标示例: {instance="someIp",instance_hostname=
prometheus - 从原始 Prometheus 规则文件创建一个 prometheus-operator `PrometheusRule` (CRD)？
像这样的 Prometheus 规则文件: groups: - name: ./example.rules rules: - alert: ExampleAlert expr: vec
prometheus - 如何使用联合从多个 Prometheus 实例收集 Prometheus 指标(每个实例使用实例 ="localhost:9090")
我们有多个在数据中心运行的 Prometheus 实例(我将它们称为 DC Prometheus 实例)，以及一个额外的 Prometheus 实例(在下面的文本中我们将其称为“主”)，我们在其中从
kubernetes - 将 prometheus 图表从 prometheus-operator 更新为 kube-prometheus-stack
最近 prometheus-operator图表已弃用，图表已重命名 kube-prometheus-stack更清楚地反射(reflect)它安装了 kube-prometheus 项目堆栈，其中
prometheus - 无法启动 Prometheus 服务器
我在 Amazon linux 2 实例上安装了 prometheus，这是我在用户数据中使用的配置: cat /etc/systemd/system/prometheus.service [Uni
prometheus - 计算 Prometheus 卷大小以存储指标
我们正在使用 prometheus 运算符，我们现在想将数据存储在磁盘上，有一个博客对此进行了解释，但不确定来自查询的数字/大小响应 https://www.robustperception.io/h
prometheus - 是否可以通过 prometheus 获取准确的每分钟请求指标
目标通过 grafana 和 prometheus 跟踪 RPM 和正常运行时间情况我们正在使用 django-prometheus -> To emit metrics fluent-bit
prometheus - 在 Prometheus 中将实例重新标记为主机名
我有 Prometheus 从几台机器上的节点导出器中抓取指标，配置如下: scrape_configs: - job_name: node_exporter static_configs
prometheus - 在从事件更改为非事件之前延迟 Prometheus 警报
我的 Prometheus 设置中有一个警报，它会在 someMetric > 100 时发送警报已对 5m 有效然后每隔 24h 重新发送警报根据下面的配置: prometheus-alert.ym
prometheus - 如何在 Prometheus 中生成平均值
我有两个计数器。一个是测量累加器，另一个是测量计数。如何生成范围向量平均值？我尝试了以下但得到的结果为空。 rate(my_events{type="sum"}[60s]) / rate(my_ev
prometheus - Prometheus metrics_path 中的问号被编码
因为 Prometheus 仅支持文本指标和许多 json 中的工具返回指标(如 Finatra、Spring Boot)，所以我创建了一个简单的代理，将 json 转换为文本。因为我想将它用于多个源
prometheus - 计算 Prometheus 指标具有特定值的持续时间？
Prometheus 是否可以计算指标具有特定值的持续时间(例如以秒为单位)？一个简单的例子是 up可以有两个值的度量:1或 0指示系统是否正在运行。想象一下，自上周以来，系统多次上下波动。我希望

首页

博学

6Ren·AI

商城

networking - Prometheus Pod无法调用apiserver端点