- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
在kubernetes集群中使用已安装的Helm
和Prometheus
的Grafana
:
helm install stable/prometheus
helm install stable/grafana
alertmanage
服务。
alert rules
,
CPU
设置一些
memory
和配置并发送电子邮件,而无需创建其他yaml文件?
configmap
到
alertmanager
的介绍:
https://github.com/kubernetes/charts/tree/master/stable/prometheus#configmap-files
stable/prometheus
的源代码以查看其作用。从
values.yaml
文件中,我发现:
serverFiles:
alerts: ""
rules: ""
prometheus.yml: |-
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
https://github.com/kubernetes/charts/blob/master/stable/prometheus/values.yaml#L600
rules
和
alertmanager
。但是不要清除此块:
rule_files:
- /etc/config/rules
- /etc/config/alerts
serverFiles:
alert: ""
rules: ""
alert rules
中设置
alertmanager
和
values.yaml
配置后:
## Prometheus server ConfigMap entries
##
serverFiles:
alerts: ""
rules: |-
#
# CPU Alerts
#
ALERT HighCPU
IF ((sum(node_cpu{mode=~"user|nice|system|irq|softirq|steal|idle|iowait"}) by (instance, job)) - ( sum(node_cpu{mode=~"idle|iowait"}) by (instance,job) ) ) / (sum(node_cpu{mode=~"user|nice|system|irq|softirq|steal|idle|iowait"}) by (instance, job)) * 100 > 95
FOR 10m
LABELS { service = "backend" }
ANNOTATIONS {
summary = "High CPU Usage",
description = "This machine has really high CPU usage for over 10m",
}
# TEST
ALERT APIHighRequestLatency
IF api_http_request_latencies_second{quantile="0.5"} >1
FOR 1m
ANNOTATIONS {
summary = "High request latency on {{$labels.instance }}",
description = "{{ $labels.instance }} has amedian request latency above 1s (current value: {{ $value }}s)",
}
helm install prometheus/
进行安装。
port-forward
组件的
alertmanager
:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9093
http://127.0.0.1:9003
,得到以下消息:
Forwarding from 127.0.0.1:9093 -> 9093
Handling connection for 9093
Handling connection for 9093
E0122 17:41:53.229084 7159 portforward.go:331] an error occurred forwarding 9093 -> 9093: error forwarding port 9093 to pod 6614ee96df545c266e5fff18023f8f7c87981f3340ee8913acf3d8da0e39e906, uid : exit status 1: 2018/01/22 08:37:54 socat[31237.140275133073152] E connect(5, AF=2 127.0.0.1:9093, 16): Connection refused
Handling connection for 9093
E0122 17:41:53.243511 7159 portforward.go:331] an error occurred forwarding 9093 -> 9093: error forwarding port 9093 to pod 6614ee96df545c266e5fff18023f8f7c87981f3340ee8913acf3d8da0e39e906, uid : exit status 1: 2018/01/22 08:37:54 socat[31238.140565602109184] E connect(5, AF=2 127.0.0.1:9093, 16): Connection refused
E0122 17:41:53.246011 7159 portforward.go:331] an error occurred forwarding 9093 -> 9093: error forwarding port 9093 to pod 6614ee96df545c266e5fff18023f8f7c87981f3340ee8913acf3d8da0e39e906, uid : exit status 1: 2018/01/22 08:37:54 socat[31239.140184300869376] E connect(5, AF=2 127.0.0.1:9093, 16): Connection refused
Handling connection for 9093
Handling connection for 9093
E0122 17:41:53.846399 7159 portforward.go:331] an error occurred forwarding 9093 -> 9093: error forwarding port 9093 to pod 6614ee96df545c266e5fff18023f8f7c87981f3340ee8913acf3d8da0e39e906, uid : exit status 1: 2018/01/22 08:37:55 socat[31250.140004515874560] E connect(5, AF=2 127.0.0.1:9093, 16): Connection refused
E0122 17:41:53.847821 7159 portforward.go:331] an error occurred forwarding 9093 -> 9093: error forwarding port 9093 to pod 6614ee96df545c266e5fff18023f8f7c87981f3340ee8913acf3d8da0e39e906, uid : exit status 1: 2018/01/22 08:37:55 socat[31251.140355466835712] E connect(5, AF=2 127.0.0.1:9093, 16): Connection refused
Handling connection for 9093
E0122 17:41:53.858521 7159 portforward.go:331] an error occurred forwarding 9093 -> 9093: error forwarding port 9093 to pod 6614ee96df545c266e5fff18023f8f7c87981f3340ee8913acf3d8da0e39e906, uid : exit status 1: 2018/01/22 08:37:55 socat[31252.140268300003072] E connect(5, AF=2 127.0.0.1:9093, 16): Connection refused
kubectl describe po illocutionary-heron-prometheus-alertmanager-587d747b9c-qwmm6
时,得到了:
Name: illocutionary-heron-prometheus-alertmanager-587d747b9c-qwmm6
Namespace: default
Node: minikube/192.168.99.100
Start Time: Mon, 22 Jan 2018 17:33:54 +0900
Labels: app=prometheus
component=alertmanager
pod-template-hash=1438303657
release=illocutionary-heron
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"illocutionary-heron-prometheus-alertmanager-587d747b9c","uid":"f...
Status: Running
IP: 172.17.0.10
Created By: ReplicaSet/illocutionary-heron-prometheus-alertmanager-587d747b9c
Controlled By: ReplicaSet/illocutionary-heron-prometheus-alertmanager-587d747b9c
Containers:
prometheus-alertmanager:
Container ID: docker://0808a3ecdf1fa94b36a1bf4b8f0d9d2933bc38afa8b25e09d0d86f036ac3165b
Image: prom/alertmanager:v0.9.1
Image ID: docker-pullable://prom/alertmanager@sha256:ed926b227327eecfa61a9703702c9b16fc7fe95b69e22baa656d93cfbe098320
Port: 9093/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 22 Jan 2018 17:55:24 +0900
Finished: Mon, 22 Jan 2018 17:55:24 +0900
Ready: False
Restart Count: 9
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-h5b8l (ro)
prometheus-alertmanager-configmap-reload:
Container ID: docker://b4a349bf7be4ea78abe6899ad0173147f0d3f6ff1005bc513b2c0ac726385f0b
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload@sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
State: Running
Started: Mon, 22 Jan 2018 17:33:56 +0900
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-h5b8l (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: illocutionary-heron-prometheus-alertmanager
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: illocutionary-heron-prometheus-alertmanager
ReadOnly: false
default-token-h5b8l:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-h5b8l
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 29m (x2 over 29m) default-scheduler PersistentVolumeClaim is not bound: "illocutionary-heron-prometheus-alertmanager"
Normal Scheduled 29m default-scheduler Successfully assigned illocutionary-heron-prometheus-alertmanager-587d747b9c-qwmm6 to minikube
Normal SuccessfulMountVolume 29m kubelet, minikube MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 29m kubelet, minikube MountVolume.SetUp succeeded for volume "pvc-fa84b197-ff4e-11e7-a584-0800270fb7fc"
Normal SuccessfulMountVolume 29m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-h5b8l"
Normal Started 29m kubelet, minikube Started container
Normal Created 29m kubelet, minikube Created container
Normal Pulled 29m kubelet, minikube Container image "jimmidyson/configmap-reload:v0.1" already present on machine
Normal Started 29m (x3 over 29m) kubelet, minikube Started container
Normal Created 29m (x4 over 29m) kubelet, minikube Created container
Normal Pulled 29m (x4 over 29m) kubelet, minikube Container image "prom/alertmanager:v0.9.1" already present on machine
Warning BackOff 9m (x91 over 29m) kubelet, minikube Back-off restarting failed container
Warning FailedSync 4m (x113 over 29m) kubelet, minikube Error syncing pod
alertmanager
文件中的
values.yaml
配置:
## alertmanager ConfigMap entries
##
alertmanagerFiles:
alertmanager.yml: |-
global:
resolve_timeout: 5m
smtp_smarthost: smtp.gmail.com:587
smtp_from: sender@gmail.com
smtp_auth_username: sender@gmail.com
smtp_auth_password: sender_password
receivers:
- name: default-receiver
email_configs:
- to: target_email@gmail.com
route:
group_wait: 10s
group_interval: 5m
receiver: default-receiver
repeat_interval: 3h
alertmanagerFiles:
alertmanager.yml: |-
global:
# slack_api_url: ''
receivers:
- name: default-receiver
# slack_configs:
# - channel: '@you'
# send_resolved: true
route:
group_wait: 10s
group_interval: 5m
receiver: default-receiver
repeat_interval
email_configs
配置方法。
最佳答案
alerts
文件的rules
组中的serverFiles
和values.yaml
键安装在/etc/config
文件夹中的Prometheus容器中。您可以在其中放置所需的配置(例如,从链接的博客文章中汲取灵感),Prometheus将使用它来处理警报。
例如,可以这样设置一个简单的规则:
serverFiles:
alerts: |
ALERT cpu_threshold_exceeded
IF (100 * (1 - avg by(job)(irate(node_cpu{mode='idle'}[5m])))) > 80
FOR 300s
LABELS {
severity = "warning",
}
ANNOTATIONS {
summary = "CPU usage > 80% for {{ $labels.job }}",
description = "CPU usage avg for last 5m: {{ $value }}",
}
关于kubernetes - 如何配置Helm在kubernetes上安装的alertmanager?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48374858/
我只是不喜欢 Logback 的 XML 或 Groovy 配置,而更喜欢用 Java 进行配置(这也是因为我将在初始化后的不同时间在运行时更改配置)。 似乎对 Logback 进行 Java 配置的
我的 sphinx 配置是: ================================ config/sphinx.yml development: bin_path: "/usr/loc
我们计划在生产服务器中部署我们的系统。我有兴趣了解更多有关优化网站性能的信息。 Sitecore 有哪些优化建议? (缓存,网络配置中的其他设置) 我们可以在 IIS 中做哪些优化? 找不到关于这些主
我有一个 Django 应用程序,可以处理网站的两个(或更多)部分,例如网站的“admin”和“api”部分。我还为网站的其余部分提供了普通的 html 页面,其中不需要 Django。 例如,我希望
我刚刚开始研究Docker。我有一个 Node 应用程序,可以调整大小和图像,然后在完成后向 aws 发送 SQS 消息。我已成功创建应用程序的 docker 镜像,并从本地计算机复制它,但遇到了无法
如何配置 checkstyle(在 Ant nt Maven 中)任务?我尝试了一点,但没有正确收到报告。这是我的 Ant 脚本。
我正在使用 Quartz 和 Spring 框架重写一个遗留项目。原始配置是 XML 格式,现在我将其转换为 Java Config。 xml 配置使用 jobDetail 设置触发器 bean 的作
tl;rd: 使用主键对数据库进行分区 索引大小问题。 数据库大小每天增长约 1-3 GB 突袭设置。 您有使用 Hypertable 的经验吗? 长版: 我刚刚建立/购买了一个家庭服务器: 至强 E
在安装 gcp 应用程序后,我们尝试使用 GCP 的图形 api 配置 Azure Active Directory saml 配置。我们正在遵循相同的 AWS graph api saml 设置 U
我刚刚了解了 spring security 并想使用 java hibernate 配置连接到数据库,但我发现的示例或教程很少。我通过使用 xml 配置找到了更多。我在这里使用 Spring 4.0
我们最近切换到 Java 8 以使用 java.time API(LocalDate、LocalDateTime,...)。因此,我们将 Hibernate 依赖项更新到版本 4.3.10。我们编写了
欢迎访问我的GitHub 这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos 本篇概览 本文是《quarkus实战》系列的第六篇,咱
我是 NGINX 的新手,我正在尝试对我们的 ERP 网络服务器进行负载平衡。我有 3 个网络服务器在由 websphere 提供支持的端口 80 上运行,这对我来说是一个黑盒子: * web01.e
我们想使用 gerrit 进行代码审查,但我们在 webview 中缺少一些设置。 是否可以禁止提交者审查/验证他们自己的 提交? 是否有可能两个审稿人给 +1 一个累积它 到+2,以便可以提交? 谢
配置根据运行模式应用于 AEM 实例。在多个运行模式和多个配置的情况下,AEM 如何确定要选择的配置文件?假设以下配置在 AEM 项目中可用, /apps /myproject - con
我正在使用 Neo4j 服务器。我遇到了负载相对较低的问题。但是,响应时间相当长。我认为为请求提供服务的线程数太少了。有没有办法调整为 HTTP 请求提供服务的线程池的大小。那可能吗? 最佳答案 线程
我在/etc/default/celeryd 中有以下配置 CELERYD_NODES = "worker1 worker2 worker3" CELERYD_CHDIR = "path to pro
Plone 在其页面中显示来 self 的母语(巴西葡萄牙语)的特殊字符。但是,当我使用我创建的 spt 页面时,它会显示转义序列,例如: Educa\xc3\xa7\xc3\xa3o 代替 Educ
我正在尝试开始使用 Emacs/Clojure。安装 emacs 扩展的正确方法是什么。我正在尝试安装以下插件: https://bitbucket.org/kotarak/vimclojure 我已
我有一个简单的 C 项目结构: proj/ src/ docs/ build/ tests/ lib/ 尝试编写合适的 CMake 文件。 到目前为止我的尝试:http://pas
我是一名优秀的程序员,十分优秀!