- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我注意到我们的 gke 集群系统 pod (gke-metrics-agent) 内存不足。我试图编辑 daemonset yaml 文件以将内存请求增加到 200Mi 并将内存限制增加到 200Mi。但是,它不允许我应用它。它像以前一样使用默认值重新创建,即 50Mi。 pod status image
请帮我增加gke-metrics-agent的内存资源
最佳答案
一般CrashLoopBackOff
表示容器在重启后反复崩溃。可以关注documentation解决 CrashLoopBackOff
问题。
限制 gke-metric-agent 的 OOM 终止的可能解决方法是增加 gke-metric-agent pod 的内存限制。这可以通过禁用 GKE 监控并使用自定义指标代理 list 将 gke-metric-agent 部署到集群来完成。这将允许您调整 gke-metric-agent 的内存资源以阻止它被杀死。
为此,您可以按照以下步骤操作:
CLUSTER=<cluster_name>
PROJECT=<project>
LOCATION=<location>
gcloud container clusters update $CLUSTER --zone=$LOCATION --project=$PROJECT --monitoring-service=none --logging-service=logging.googleapis.com/kubernetes
sed -u -e's/{{.ClusterName}}/'${CLUSTER}'/g' -e's/{{.Location}}/'${LOCATION}'/g' metrics-agent.yaml | kubectl apply -f -
---
apiVersion: v1
kind: ConfigMap
metadata:
name: gke-metrics-agent-conf
namespace: default
data:
gke-metrics-agent-config: |
receivers:
prometheus:
use_start_time_metric: true
config:
scrape_configs:
- job_name: "kubelet"
scrape_interval: 60s
static_configs:
- targets: ["$KUBELET_HOST:10255"]
metric_relabel_configs:
- source_labels: [ __name__ ]
target_label: gke_component_name
replacement: "nodes/kubelet"
- job_name: "kubelet-prober"
scrape_interval: 60s
static_configs:
- targets: ["$KUBELET_HOST:10255"]
metrics_path: /metrics/probes
metric_relabel_configs:
- source_labels: [__name__]
regex: "prober_probe_total|process_start_time_seconds"
action: keep
- source_labels: [ __name__ ]
target_label: gke_component_name
replacement: "nodes/kubelet"
- job_name: "addons"
scrape_interval: 60s
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- kube-system
selectors:
- role: pod
field: "spec.nodeName=$NODE_NAME"
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_container_port_name ]
regex: ".*metrics"
action: keep
- source_labels: [ __meta_kubernetes_pod_annotationpresent_components_gke_io_component_name ]
regex: true
action: keep
- source_labels: [ __meta_kubernetes_pod_annotationpresent_monitoring_gke_io_path, __meta_kubernetes_pod_annotation_monitoring_gke_io_path ]
regex: "true;(.*)"
target_label: __metrics_path__
- source_labels: [ __meta_kubernetes_pod_name ]
target_label: pod
- source_labels: [ __meta_kubernetes_pod_container_name ]
target_label: container
- source_labels: [ __meta_kubernetes_namespace ]
target_label: namespace
- source_labels: [ __meta_kubernetes_pod_annotation_components_gke_io_component_name ]
target_label: gke_component_name
replacement: "addons/${ARG1}"
- source_labels: [ gke_component_name ]
target_label: gke_component_name
regex: "(.*)-(.*)"
replacement: "${ARG1}_${ARG2}"
- source_labels: [ gke_component_name ]
target_label: gke_component_name
regex: "(.*)-(.*)"
replacement: "${ARG1}_${ARG2}"
- job_name: "coredns"
scrape_interval: 60s
static_configs:
- targets: ["$KUBELET_HOST:9253"]
metric_relabel_configs:
- source_labels: [ __name__ ]
target_label: gke_component_name
replacement: "nodes/coredns"
- job_name: "coredns-nodecache"
scrape_interval: 60s
static_configs:
- targets: ["$KUBELET_HOST:9353"]
metric_relabel_configs:
- source_labels: [ __name__ ]
target_label: gke_component_name
replacement: "nodes/coredns"
- job_name: "node"
scrape_interval: 60s
static_configs:
- targets: ["$KUBELET_HOST:10231"]
metric_relabel_configs:
- source_labels: [ __name__ ]
target_label: gke_component_name
replacement: "net/cluster/node"
kubenode:
endpoint: "http://$KUBELET_HOST:10255"
scrape_interval: 60s
cluster_name: {{.ClusterName}}
location: {{.Location}}
node_name: "$NODE_NAME"
kubernetes_service_host: "$KUBERNETES_SERVICE_HOST"
exporters:
stackdriver:
endpoint: monitoring.googleapis.com:443
skip_create_metric_descriptor: true
processors:
resource:
type: "host"
labels:
cloud.zone: {{.Location}}
host.name: "$NODE_NAME"
k8s.cluster.name: {{.ClusterName}}
metrics_export:
common_prefix: "kubernetes.io/internal"
detect_container_metrics: true
allowed_labels:
- "project"
- "location"
- "cluster_name"
- "node_name"
- "namespace"
- "pod"
- "container"
export_map:
- "kubernetes.io/internal/nodes/kubelet/process_start_time_seconds":
drop: true
- "kubernetes.io/internal/nodes/kubelet/kubelet_docker_operations_total":
allowed_labels:
- "operation_type"
export_name: "kubernetes.io/internal/nodes/kubelet/docker_operations_total"
export_as_int: true
- "kubernetes.io/internal/nodes/kubelet/kubelet_docker_operations_errors_total":
allowed_labels:
- "operation_type"
export_name: "kubernetes.io/internal/nodes/kubelet/docker_operations_errors_total"
export_as_int: true
- "kubernetes.io/internal/nodes/kubelet/kubelet_runtime_operations_total":
allowed_labels:
- "operation_type"
export_name: "kubernetes.io/internal/nodes/kubelet/runtime_operations_total"
export_as_int: true
- "kubernetes.io/internal/nodes/kubelet/kubelet_runtime_operations_errors_total":
allowed_labels:
- "operation_type"
export_name: "kubernetes.io/internal/nodes/kubelet/runtime_operations_errors_total"
export_as_int: true
- "kubernetes.io/internal/nodes/kubelet/rest_client_requests_total":
allowed_labels:
- "code"
- "method"
- "host"
export_as_int: true
- "kubernetes.io/internal/nodes/kubelet/storage_operation_duration_seconds":
allowed_labels:
- "volume_plugin"
- "operation_name"
- "kubernetes.io/internal/nodes/kubelet/kubelet_network_plugin_operations_duration_seconds":
allowed_labels:
- "operation_type"
export_name: "kubernetes.io/internal/nodes/kubelet/network_plugin_operations_duration_seconds"
- "kubernetes.io/internal/nodes/kubelet/storage_operation_errors_total":
allowed_labels:
- "volume_plugin"
- "operation_name"
export_as_int: true
- "kubernetes.io/internal/nodes/kubelet/storage_operation_status_count":
allowed_labels:
- "volume_plugin"
- "operation_name"
- "status"
export_as_int: true
- "kubernetes.io/internal/nodes/kubelet/prober_probe_total":
allowed_labels:
- "container"
- "namespace"
- "pod"
- "pod_uid"
- "result"
- "probe_type"
export_as_int: true
is_container_metric: true
- "kubernetes.io/internal/nodes/coredns/process_start_time_seconds":
drop: true
- "kubernetes.io/internal/nodes/coredns/coredns_cache_drops_total":
allowed_labels:
- "server"
export_name: "kubernetes.io/internal/nodes/coredns/cache_drops_total"
- "kubernetes.io/internal/nodes/coredns/coredns_cache_hits_total":
allowed_labels:
- "server"
- "type"
export_name: "kubernetes.io/internal/nodes/coredns/cache_hits_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_cache_misses_total":
allowed_labels:
- "server"
export_name: "kubernetes.io/internal/nodes/coredns/cache_misses_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_cache_prefetch_total":
allowed_labels:
- "server"
export_name: "kubernetes.io/internal/nodes/coredns/cache_prefetch_total"
- "kubernetes.io/internal/nodes/coredns/coredns_cache_size":
allowed_labels:
- "server"
- "type"
export_name: "kubernetes.io/internal/nodes/coredns/cache_size"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_dns_request_count_total":
allowed_labels:
- "family"
- "proto"
- "server"
- "zone"
export_name: "kubernetes.io/internal/nodes/coredns/dns_request_count_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_dns_request_duration_seconds":
allowed_labels:
- "server"
- "zone"
export_name: "kubernetes.io/internal/nodes/coredns/dns_request_duration_seconds"
- "kubernetes.io/internal/nodes/coredns/coredns_dns_request_type_count_total":
allowed_labels:
- "server"
- "type"
- "zone"
export_name: "kubernetes.io/internal/nodes/coredns/dns_request_type_count_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_dns_response_rcode_count_total":
allowed_labels:
- "rcode"
- "server"
- "zone"
export_name: "kubernetes.io/internal/nodes/coredns/dns_response_rcode_count_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_forward_healthcheck_failure_count_total":
allowed_labels:
- "to"
export_name: "kubernetes.io/internal/nodes/coredns/forward_healthcheck_failure_count_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_forward_request_count_total":
allowed_labels:
- "to"
export_name: "kubernetes.io/internal/nodes/coredns/forward_request_count_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_forward_request_duration_seconds":
allowed_labels:
- "to"
export_name: "kubernetes.io/internal/nodes/coredns/forward_request_duration_seconds"
- "kubernetes.io/internal/nodes/coredns/coredns_forward_response_rcode_count_total":
allowed_labels:
- "rcode"
- "to"
export_name: "kubernetes.io/internal/nodes/coredns/forward_response_rcode_count_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_forward_sockets_open":
allowed_labels:
- "to"
export_name: "kubernetes.io/internal/nodes/coredns/forward_sockets_open"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_health_request_duration_seconds":
allowed_labels: []
export_name: "kubernetes.io/internal/nodes/coredns/health_request_duration_seconds"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/coredns_panic_count_total":
allowed_labels: []
export_name: "kubernetes.io/internal/nodes/coredns/dns_panic_count_total"
export_as_int: true
- "kubernetes.io/internal/nodes/coredns/nodecache_setup_errors_total":
allowed_labels:
- "errortype"
export_name: "kubernetes.io/internal/nodes/coredns/nodecache_setup_errors_total"
- "kubernetes.io/internal/net/cluster/node/process_start_time_seconds":
drop: true
- "kubernetes.io/internal/net/cluster/node/conntrack_entries":
allowed_labels: []
export_as_int: true
- "kubernetes.io/internal/net/cluster/node/conntrack_error_count":
allowed_labels:
- "type"
export_as_int: true
- "kubernetes.io/internal/net/cluster/node/num_inuse_sockets":
allowed_labels:
- "protocol"
export_as_int: true
- "kubernetes.io/internal/net/cluster/node/num_tw_sockets":
allowed_labels: []
export_as_int: true
- "kubernetes.io/internal/net/cluster/node/socket_memory":
allowed_labels: []
export_as_int: true
- "kubernetes.io/internal/addons/kubedns/process_start_time_seconds":
drop: true
- "kubernetes.io/internal/addons/kubedns/skydns_skydns_dns_request_count_total":
allowed_labels:
- "system"
export_name: "kubernetes.io/internal/addons/kubedns/skydns_dns_request_count_total"
export_as_int: true
- "kubernetes.io/internal/addons/kubedns/skydns_skydns_dns_request_duration_seconds":
allowed_labels:
- "system"
export_name: "kubernetes.io/internal/addons/kubedns/skydns_dns_request_duration_seconds"
- "kubernetes.io/internal/addons/kubedns/skydns_skydns_dns_response_size_bytes":
allowed_labels:
- "system"
export_name: "kubernetes.io/internal/addons/kubedns/skydns_dns_response_size_bytes"
- "kubernetes.io/internal/addons/kubedns/skydns_skydns_dns_error_count_total":
allowed_labels:
- "system"
- "cause"
export_name: "kubernetes.io/internal/addons/kubedns/skydns_dns_error_count_total"
export_as_int: true
- "kubernetes.io/internal/addons/kubedns/skydns_skydns_dns_cachemiss_count_total":
allowed_labels:
- "cache"
export_name: "kubernetes.io/internal/addons/kubedns/skydns_dns_cachemiss_count_total"
export_as_int: true
extensions:
observability:
endpoint: monitoring.googleapis.com:443
prefix: "kubernetes.io/internal/addons/gke_otelsvc"
resource:
type: "k8s_container"
labels:
location: {{.Location}}
cluster_name: {{.ClusterName}}
pod_name: "$POD_NAME"
namespace_name: "$POD_NAMESPACE"
container_name: "gke-metrics-agent"
service:
extensions:
- observability
pipelines:
metrics/kube:
receivers:
- kubenode
exporters:
- stackdriver
metrics/prom:
receivers:
- prometheus
processors:
- resource
- metrics_export
exporters:
- stackdriver
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gke-metrics-agent
namespace: default
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
annotations:
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
kubernetes.io/description: Policy used by the gke-metrics-agent addon.
seccomp.security.alpha.kubernetes.io/allowedProfileNames: runtime/default,docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
name: gce.gke-metrics-agent
labels:
kubernetes.io/cluster-service: 'true'
spec:
privileged: false
allowPrivilegeEscalation: false
volumes:
- 'hostPath'
- 'secret'
- 'configMap'
allowedHostPaths:
- pathPrefix: /etc/ssl/certs
hostNetwork: true
hostIPC: false
hostPID: false
runAsUser:
rule: 'RunAsAny'
seLinux:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: false
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: gke-metrics-agent
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- watch
- apiGroups:
- policy
resourceNames:
- gce.gke-metrics-agent
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: gke-metrics-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gke-metrics-agent
subjects:
- kind: ServiceAccount
name: gke-metrics-agent
namespace: default
---
# linux deployment
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gke-metrics-agent
namespace: default
labels:
k8s-app: gke-metrics-agent
component: gke-metrics-agent
spec:
selector:
matchLabels:
k8s-app: gke-metrics-agent
component: gke-metrics-agent
template:
metadata:
labels:
k8s-app: gke-metrics-agent
component: gke-metrics-agent
spec:
nodeSelector:
kubernetes.io/os: linux
tolerations:
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
hostNetwork: true
serviceAccount: gke-metrics-agent
containers:
- name: gke-metrics-agent
image: "gcr.io/gke-release/gke-metrics-agent:0.1.3-gke.0"
resources:
requests:
memory: 50Mi
cpu: 3m
limits:
memory: 70Mi
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: KUBELET_HOST
value: "127.0.0.1"
- name: ARG1
value: "${1}"
- name: ARG2
value: "${2}"
- name: WINDOWS_JOB_ACTION
value: "drop"
command:
- "/otelsvc"
- "--config=/conf/gke-metrics-agent-config.yaml"
- "--metrics-level=NONE"
volumeMounts:
- name: gke-metrics-agent-config-vol
mountPath: /conf
- name: ssl-certs
mountPath: /etc/ssl/certs
readOnly: true
volumes:
- configMap:
name: gke-metrics-agent-conf
items:
- key: gke-metrics-agent-config
path: gke-metrics-agent-config.yaml
name: gke-metrics-agent-config-vol
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
---
# windows deployment
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gke-metrics-agent-windows
namespace: default
labels:
k8s-app: gke-metrics-agent
component: gke-metrics-agent
spec:
selector:
matchLabels:
k8s-app: gke-metrics-agent
component: gke-metrics-agent
template:
metadata:
labels:
k8s-app: gke-metrics-agent
component: gke-metrics-agent
spec:
nodeSelector:
kubernetes.io/os: windows
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
- effect: NoSchedule
key: node.kubernetes.io/os
operator: Equal
value: windows
serviceAccount: gke-metrics-agent
containers:
- name: gke-metrics-agent
image: "gke.io/gke-release/gke-metrics-agent-windows:0.3.1-gke.2"
resources:
requests:
cpu: 5m
memory: 200Mi
limits:
memory: 200Mi
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: KUBELET_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: KUBERNETES_SERVICE_HOST
value: "kubernetes.default.svc.cluster.local"
- name: ARG1
value: "${1}"
- name: ARG2
value: "${2}"
- name: WINDOWS_JOB_ACTION
value: "keep"
command:
- "c:\\otelsvc.exe"
- "--config=/conf/gke-metrics-agent-config.yaml"
- "--metrics-level=NONE"
volumeMounts:
- name: gke-metrics-agent-config-vol
mountPath: /conf
volumes:
- configMap:
name: gke-metrics-agent-conf
items:
- key: gke-metrics-agent-config
path: gke-metrics-agent-config.yaml
name: gke-metrics-agent-config-vol
注意:您可以根据需要编辑 linux 部署的内存限制。
sed -u -e's/{{.ClusterName}}/'${CLUSTER}'/g' -e's/{{.Location}}/'${LOCATION}'/g' metrics-agent.yaml | kubectl delete -f -
或
kubectl delete ds gke-metrics-agent
Kubectl delete ds gke-metrics-agent-windows
kubectl delete cm gke-metrics-agent-conf
kubectl delete sa gke-metrics-agent
gcloud container clusters update $CLUSTER --zone=$LOCATION --project=$PROJECT --monitoring-service=monitoring.googleapis.com/kubernetes --logging-service=logging.googleapis.com/kubernetes
关于google-kubernetes-engine - GKE 系统 pod gke-metrics-agent OOMKilld,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67668808/
我有这个代码: System.err.print("number of terms = "); System.out.println(allTerms.size()); System.err
我有以下问题:在操作系统是 Linux 的情况下和在操作系统是 MacOs 的情况下,我必须执行不同的操作。 所以我创建了以下 Ant 脚本目标: /u
我正在调用 system("bash ../tools/bashScript\"This is an argument!\"&"),然后我正在调用 close(socketFD) 直接在 system
使用最初生成的随机元素来约束随机数组的连续元素是否有效。 例如:我想生成一组 10 个 addr、size 对来模拟典型的内存分配例程并具有如下类: class abc; rand bit[5:0
我正在创建一个必须使用system(const char*)函数来完成一些“繁重工作”的应用程序,并且我需要能够为用户提供粗略的进度百分比。例如,如果操作系统正在为您移动文件,它会为您提供一个进度条,
我即将编写一些项目经理、开发人员和业务分析师会使用的标准/指南和模板。目标是更好地理解正在开发或已经开发的解决方案。 其中一部分是提供有关记录解决方案的标准/指南。例如。记录解决/满足业务案例/用户需
在开发使用压缩磁盘索引或磁盘文件的应用程序时,其中部分索引或文件被重复访问(为了论证,让我们说一些类似于 Zipfian 分布的东西),我想知道什么时候足够/更好地依赖操作系统级缓存(例如,Debia
我们编写了一个 powershell 脚本,用于处理来自内部系统的图像并将其发送到另一个系统。现在,业务的另一部分希望加入其中,对数据进行自己的处理,并将其推送到另一个系统。打听了一下,公司周围有几个
我正在尝试朗姆酒我的应用程序,但我收到以下错误:System.Web.HttpUnhandledException:引发了“System.Web.HttpUnhandledException”类型的异
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。 要求我们推荐或查找工具、库或最喜欢的场外资源的问题对于 Stack Overflow 来说是偏离主题的,
所以我在其他程序中没有收到此错误,但我在这个程序中收到了它。 这个程序是一个我没有收到错误的示例。 #include int main() { system("pause"); } // en
我在 c# System.URI.FormatExption 中遇到问题 为了清楚起见,我使用的是 Segseuil 的 Matlab 方法,并且它返回一个图片路径 result。我想为其他用户保存此
我正在尝试像这样设置文本框的背景色: txtCompanyName.BackColor = Drawing.Color.WhiteSmoke; 它不喜欢它,因为它要我在前面添加系统,例如: txtCo
请帮助我解决 System.StackOverflowException我想用 .aspx 将记录写入数据库我使用 4 层架构来实现这一切都正常但是当我编译页面然后它显示要插入数据的字段时,当我将数据
我使用了一些通常由系统调用的API。 因此,我将 android:sharedUserId="android.uid.system" 添加到 manifest.xml, 并使用来自 GIT 的 And
我正在尝试创建一个小型应用程序,它需要对/system 文件夹进行读/写访问(它正在尝试删除一个文件,并创建一个新文件来代替它)。我可以使用 adb 毫无问题地重新挂载该文件夹,如果我这样做,我的应用
我想从没有 su 的系统 priv-app 将/system 重新挂载为 RW。如何以编程方式执行此操作?只会用 Runtime.getruntime().exec() 执行一个 shell 命令吗
我正在尝试制作一个带有登录系统的程序我对此很陌生,但我已经连续工作 8 个小时试图解决这个问题。这是我得到的错误代码 + ServerVersion 'con.ServerVersion' threw
当我“构建并运行”Code::Blocks 中的程序时,它运行得非常好!但是当我从“/bin”文件夹手动运行它时,当它试图用 system() 调用“temp.bat”时,它会重置。这是为什么?它没有
我想使用 system/pipe 命令来执行具有特殊字符的命令。下面是示例代码。通过系统/管道执行命令后,它通过改变特殊字符来改变命令。我很惊讶地看到系统命令正在更改作为命令传递的文本。 run(ch
我是一名优秀的程序员,十分优秀!