gpt4 book ai didi

linux - K8s 没有杀死我的 airflow 网络服务器 pod

转载 作者:塔克拉玛干 更新时间:2023-11-03 02:04:15 25 4
gpt4 key购买 nike

我在 k8s 容器中运行 Airflow 。

网络服务器遇到 DNS 错误(无法将我的数据库的 url 转换为 ip)并且网络服务器工作人员被杀死。

让我感到困扰的是,k8s 并没有试图杀死 pod 并在其位置上启动一个新的 pod。

Pod 日志输出:

OperationalError: (psycopg2.OperationalError) could not translate host name "my.dbs.url" to address: Temporary failure in name resolution
[2017-12-01 06:06:05 +0000] [2202] [INFO] Worker exiting (pid: 2202)
[2017-12-01 06:06:05 +0000] [2186] [INFO] Worker exiting (pid: 2186)
[2017-12-01 06:06:05 +0000] [2190] [INFO] Worker exiting (pid: 2190)
[2017-12-01 06:06:05 +0000] [2194] [INFO] Worker exiting (pid: 2194)
[2017-12-01 06:06:05 +0000] [2198] [INFO] Worker exiting (pid: 2198)
[2017-12-01 06:06:06 +0000] [13] [INFO] Shutting down: Master
[2017-12-01 06:06:06 +0000] [13] [INFO] Reason: Worker failed to boot.

k8s 状态为 RUNNING,但是当我在 k8s UI 中打开一个 exec shell 时,我得到以下输出(gunicorn 似乎意识到它已经死了):

root@webserver-373771664-3h4v9:/# ps -Al
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 1 0 0 80 0 - 107153 - ? 00:06:42 /usr/local/bin/
4 Z 0 13 1 0 80 0 - 0 - ? 00:01:24 gunicorn: maste <defunct>
4 S 0 2206 0 0 80 0 - 4987 - ? 00:00:00 bash
0 R 0 2224 2206 0 80 0 - 7486 - ? 00:00:00 ps

以下是我部署的 YAML:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: webserver
namespace: airflow
spec:
replicas: 1
template:
metadata:
labels:
app: airflow-webserver
spec:
volumes:
- name: webserver-dags
emptyDir: {}
containers:
- name: airflow-webserver
image: my.custom.image :latest
imagePullPolicy: Always
resources:
requests:
cpu: 100m
limits:
cpu: 500m
ports:
- containerPort: 80
protocol: TCP
env:
- name: AIRFLOW_HOME
value: /var/lib/airflow
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: db1
key: sqlalchemy_conn
volumeMounts:
- mountPath: /var/lib/airflow/dags/
name: webserver-dags
command: ["airflow"]
args: ["webserver"]
- name: docker-s3-to-backup
image: my.custom.image:latest
imagePullPolicy: Always
resources:
requests:
cpu: 50m
limits:
cpu: 500m
env:
- name: ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws
key: access_key_id
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: aws
key: secret_access_key
- name: S3_PATH
value: s3://my-s3-bucket/dags/
- name: DATA_PATH
value: /dags/
- name: CRON_SCHEDULE
value: "*/5 * * * *"
volumeMounts:
- mountPath: /dags/
name: webserver-dags
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: webserver
namespace: airflow
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: webserver
minReplicas: 2
maxReplicas: 20
targetCPUUtilizationPercentage: 75
---
apiVersion: v1
kind: Service
metadata:
labels:
name: webserver
namespace: airflow
spec:
type: NodePort
ports:
- port: 80
selector:
app: airflow-webserver

最佳答案

您需要定义 Readiness 和 liveness 探针 Kubernetes 来检测 POD 状态。

就像本页记录的那样。 https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-a-tcp-liveness-probe

 - containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20

关于linux - K8s 没有杀死我的 airflow 网络服务器 pod,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47603238/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com