gpt4 book ai didi

kubernetes - kube-dns 不断在 coreos 上使用 kubernetes 重新启动

转载 作者:行者123 更新时间:2023-12-03 16:46:50 31 4
gpt4 key购买 nike

我通过 CoreOS alpha (1353.1.0) 在 Container Linux 上安装了 Kubernetes
使用 hyperkube v1.5.5_coreos.0 使用我的 coreos-kubernetes 在 https://github.com/kfirufk/coreos-kubernetes 安装脚本的分支。
我有两台 ContainerOS 机器。

  • coreos-2.tux-in.com 解析为 192.168.1.2 作为 Controller
  • coreos-3.tux-in.com 解析为 192.168.1.3 作为 worker
  • kubectl get pods --all-namespaces 返回
    NAMESPACE       NAME                                       READY     STATUS    RESTARTS   AGE
    ceph ceph-mds-2743106415-rkww4 0/1 Pending 0 1d
    ceph ceph-mon-check-3856521781-bd6k5 1/1 Running 0 1d
    kube-lego kube-lego-3323932148-g2tf4 1/1 Running 0 1d
    kube-system calico-node-xq6j7 2/2 Running 0 1d
    kube-system calico-node-xzpp2 2/2 Running 4560 1d
    kube-system calico-policy-controller-610849172-b7xjr 1/1 Running 0 1d
    kube-system heapster-v1.3.0-beta.0-2754576759-v1f50 2/2 Running 0 1d
    kube-system kube-apiserver-192.168.1.2 1/1 Running 0 1d
    kube-system kube-controller-manager-192.168.1.2 1/1 Running 1 1d
    kube-system kube-dns-3675956729-r7hhf 3/4 Running 3924 1d
    kube-system kube-dns-autoscaler-505723555-l2pph 1/1 Running 0 1d
    kube-system kube-proxy-192.168.1.2 1/1 Running 0 1d
    kube-system kube-proxy-192.168.1.3 1/1 Running 0 1d
    kube-system kube-scheduler-192.168.1.2 1/1 Running 1 1d
    kube-system kubernetes-dashboard-3697905830-vdz23 1/1 Running 1246 1d
    kube-system monitoring-grafana-4013973156-m2r2v 1/1 Running 0 1d
    kube-system monitoring-influxdb-651061958-2mdtf 1/1 Running 0 1d
    nginx-ingress default-http-backend-150165654-s4z04 1/1 Running 2 1d
    所以我可以看到 kube-dns-782804071-h78rf 不断重启。 kubectl describe pod kube-dns-3675956729-r7hhf --namespace=kube-system 返回:
    Name:       kube-dns-3675956729-r7hhf
    Namespace: kube-system
    Node: 192.168.1.2/192.168.1.2
    Start Time: Sat, 11 Mar 2017 17:54:14 +0000
    Labels: k8s-app=kube-dns
    pod-template-hash=3675956729
    Status: Running
    IP: 10.2.67.243
    Controllers: ReplicaSet/kube-dns-3675956729
    Containers:
    kubedns:
    Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:kubedns
    Image: gcr.io/google_containers/kubedns-amd64:1.9
    Image ID: rkt://sha512-c7b7c9c4393bea5f9dc5bcbe1acf1036c2aca36ac14b5e17fd3c675a396c4219
    Ports: 10053/UDP, 10053/TCP, 10055/TCP
    Args:
    --domain=cluster.local.
    --dns-port=10053
    --config-map=kube-dns
    --v=2
    Limits:
    memory: 170Mi
    Requests:
    cpu: 100m
    memory: 70Mi
    State: Running
    Started: Sun, 12 Mar 2017 17:47:41 +0000
    Last State: Terminated
    Reason: Completed
    Exit Code: 0
    Started: Sun, 12 Mar 2017 17:46:28 +0000
    Finished: Sun, 12 Mar 2017 17:47:02 +0000
    Ready: False
    Restart Count: 981
    Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
    Volume Mounts:
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
    Environment Variables:
    PROMETHEUS_PORT: 10055
    dnsmasq:
    Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq
    Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
    Image ID: rkt://sha512-8c5f8b40f6813bb676ce04cd545c55add0dc8af5a3be642320244b74ea03f872
    Ports: 53/UDP, 53/TCP
    Args:
    --cache-size=1000
    --no-resolv
    --server=127.0.0.1#10053
    --log-facility=-
    Requests:
    cpu: 150m
    memory: 10Mi
    State: Running
    Started: Sun, 12 Mar 2017 17:47:41 +0000
    Last State: Terminated
    Reason: Completed
    Exit Code: 0
    Started: Sun, 12 Mar 2017 17:46:28 +0000
    Finished: Sun, 12 Mar 2017 17:47:02 +0000
    Ready: True
    Restart Count: 981
    Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Volume Mounts:
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
    Environment Variables: <none>
    dnsmasq-metrics:
    Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq-metrics
    Image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
    Image ID: rkt://sha512-ceb3b6af1cd67389358be14af36b5e8fb6925e78ca137b28b93e0d8af134585b
    Port: 10054/TCP
    Args:
    --v=2
    --logtostderr
    Requests:
    memory: 10Mi
    State: Running
    Started: Sun, 12 Mar 2017 17:47:41 +0000
    Last State: Terminated
    Reason: Completed
    Exit Code: 0
    Started: Sun, 12 Mar 2017 17:46:28 +0000
    Finished: Sun, 12 Mar 2017 17:47:02 +0000
    Ready: True
    Restart Count: 981
    Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
    Volume Mounts:
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
    Environment Variables: <none>
    healthz:
    Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:healthz
    Image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
    Image ID: rkt://sha512-3a85b0533dfba81b5083a93c7e091377123dac0942f46883a4c10c25cf0ad177
    Port: 8080/TCP
    Args:
    --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
    --url=/healthz-dnsmasq
    --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
    --url=/healthz-kubedns
    --port=8080
    --quiet
    Limits:
    memory: 50Mi
    Requests:
    cpu: 10m
    memory: 50Mi
    State: Running
    Started: Sun, 12 Mar 2017 17:47:41 +0000
    Last State: Terminated
    Reason: Completed
    Exit Code: 0
    Started: Sun, 12 Mar 2017 17:46:28 +0000
    Finished: Sun, 12 Mar 2017 17:47:02 +0000
    Ready: True
    Restart Count: 981
    Volume Mounts:
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
    Environment Variables: <none>
    Conditions:
    Type Status
    Initialized True
    Ready False
    PodScheduled True
    Volumes:
    default-token-zqbdp:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-zqbdp
    QoS Class: Burstable
    Tolerations: CriticalAddonsOnly=:Exists
    No events.
    这表明 kubedns-amd64:1.9Ready: false
    这是我的 kude-dns-de.yaml 文件:
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    name: kube-dns
    namespace: kube-system
    labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    spec:
    strategy:
    rollingUpdate:
    maxSurge: 10%
    maxUnavailable: 0
    selector:
    matchLabels:
    k8s-app: kube-dns
    template:
    metadata:
    labels:
    k8s-app: kube-dns
    annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ''
    scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
    spec:
    containers:
    - name: kubedns
    image: gcr.io/google_containers/kubedns-amd64:1.9
    resources:
    limits:
    memory: 170Mi
    requests:
    cpu: 100m
    memory: 70Mi
    livenessProbe:
    httpGet:
    path: /healthz-kubedns
    port: 8080
    scheme: HTTP
    initialDelaySeconds: 60
    timeoutSeconds: 5
    successThreshold: 1
    failureThreshold: 5
    readinessProbe:
    httpGet:
    path: /readiness
    port: 8081
    scheme: HTTP
    initialDelaySeconds: 3
    timeoutSeconds: 5
    args:
    - --domain=cluster.local.
    - --dns-port=10053
    - --config-map=kube-dns
    # This should be set to v=2 only after the new image (cut from 1.5) has
    # been released, otherwise we will flood the logs.
    - --v=2
    env:
    - name: PROMETHEUS_PORT
    value: "10055"
    ports:
    - containerPort: 10053
    name: dns-local
    protocol: UDP
    - containerPort: 10053
    name: dns-tcp-local
    protocol: TCP
    - containerPort: 10055
    name: metrics
    protocol: TCP
    - name: dnsmasq
    image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
    livenessProbe:
    httpGet:
    path: /healthz-dnsmasq
    port: 8080
    scheme: HTTP
    initialDelaySeconds: 60
    timeoutSeconds: 5
    successThreshold: 1
    failureThreshold: 5
    args:
    - --cache-size=1000
    - --no-resolv
    - --server=127.0.0.1#10053
    - --log-facility=-
    ports:
    - containerPort: 53
    name: dns
    protocol: UDP
    - containerPort: 53
    name: dns-tcp
    protocol: TCP
    # see: https://github.com/kubernetes/kubernetes/issues/29055 for details
    resources:
    requests:
    cpu: 150m
    memory: 10Mi
    - name: dnsmasq-metrics
    image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
    livenessProbe:
    httpGet:
    path: /metrics
    port: 10054
    scheme: HTTP
    initialDelaySeconds: 60
    timeoutSeconds: 5
    successThreshold: 1
    failureThreshold: 5
    args:
    - --v=2
    - --logtostderr
    ports:
    - containerPort: 10054
    name: metrics
    protocol: TCP
    resources:
    requests:
    memory: 10Mi
    - name: healthz
    image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
    resources:
    limits:
    memory: 50Mi
    requests:
    cpu: 10m
    memory: 50Mi
    args:
    - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
    - --url=/healthz-dnsmasq
    - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
    - --url=/healthz-kubedns
    - --port=8080
    - --quiet
    ports:
    - containerPort: 8080
    protocol: TCP
    dnsPolicy: Default
    这是我的 kube-dns-svc.yaml :
    apiVersion: v1
    kind: Service
    metadata:
    name: kube-dns
    namespace: kube-system
    labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeDNS"
    spec:
    selector:
    k8s-app: kube-dns
    clusterIP: 10.3.0.10
    ports:
    - name: dns
    port: 53
    protocol: UDP
    - name: dns-tcp
    port: 53
    protocol: TCP
    任何有关该问题的信息将不胜感激!
    更新 rkt list --full 2> /dev/null | grep kubedns 显示:
    744a4579-0849-4fae-b1f5-cb05d40f3734    kubedns             gcr.io/google_containers/kubedns-amd64:1.9      sha512-c7b7c9c4393b running 2017-03-22 22:14:55.801 +0000 UTC   2017-03-22 22:14:56.814 +0000 UTC
    journalctl -m _MACHINE_ID=744a45790849b1f5cb05d40f3734 提供:
    Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254       8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
    我试图将 - --proxy-mode=userspace 添加到 /etc/kubernetes/manifests/kube-proxy.yaml 但结果是一样的。 kubectl get svc --all-namespaces 提供:
    NAMESPACE       NAME                   CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
    ceph ceph-mon None <none> 6789/TCP 1h
    default kubernetes 10.3.0.1 <none> 443/TCP 1h
    kube-system heapster 10.3.0.2 <none> 80/TCP 1h
    kube-system kube-dns 10.3.0.10 <none> 53/UDP,53/TCP 1h
    kube-system kubernetes-dashboard 10.3.0.116 <none> 80/TCP 1h
    kube-system monitoring-grafana 10.3.0.187 <none> 80/TCP 1h
    kube-system monitoring-influxdb 10.3.0.214 <none> 8086/TCP 1h
    nginx-ingress default-http-backend 10.3.0.233 <none> 80/TCP 1h
    kubectl get cs 提供:
    NAME                 STATUS    MESSAGE              ERROR
    controller-manager Healthy ok
    scheduler Healthy ok
    etcd-0 Healthy {"health": "true"}
    我的 kube-proxy.yaml 有以下内容:
    apiVersion: v1
    kind: Pod
    metadata:
    name: kube-proxy
    namespace: kube-system
    annotations:
    rkt.alpha.kubernetes.io/stage1-name-override: coreos.com/rkt/stage1-fly
    spec:
    hostNetwork: true
    containers:
    - name: kube-proxy
    image: quay.io/coreos/hyperkube:v1.5.5_coreos.0
    command:
    - /hyperkube
    - proxy
    - --cluster-cidr=10.2.0.0/16
    - --kubeconfig=/etc/kubernetes/controller-kubeconfig.yaml
    securityContext:
    privileged: true
    volumeMounts:
    - mountPath: /etc/ssl/certs
    name: "ssl-certs"
    - mountPath: /etc/kubernetes/controller-kubeconfig.yaml
    name: "kubeconfig"
    readOnly: true
    - mountPath: /etc/kubernetes/ssl
    name: "etc-kube-ssl"
    readOnly: true
    - mountPath: /var/run/dbus
    name: dbus
    readOnly: false
    volumes:
    - hostPath:
    path: "/usr/share/ca-certificates"
    name: "ssl-certs"
    - hostPath:
    path: "/etc/kubernetes/controller-kubeconfig.yaml"
    name: "kubeconfig"
    - hostPath:
    path: "/etc/kubernetes/ssl"
    name: "etc-kube-ssl"
    - hostPath:
    path: /var/run/dbus
    name: dbus
    这是我能找到的所有有值(value)的信息。有任何想法吗? :)
    更新 2
    http://pastebin.com/2GApCj0n Controller ContainerOS 上的 iptables-save 输出
    更新 3
    我在 Controller 节点上运行 curl
    # curl https://10.3.0.1 --insecure
    Unauthorized
    意味着它可以正确访问它,我没有添加足够的参数来授权它吗?
    更新 4
    感谢 @jaxxstorm,我删除了 calico manifests,更新了他们的 quay/cni 和 quay/node 版本并重新安装了它们。
    现在 kubedns 不断重启,但我认为现在 calico 可以工作了。因为它第一次尝试在工作节点而不是 Controller 节点上安装 kubedns,而且当我对 kubedns pod 进行 rkt enter 并尝试 wget https://10.3.0.1 时,我得到:
    # wget https://10.3.0.1
    Connecting to 10.3.0.1 (10.3.0.1:443)
    wget: can't execute 'ssl_helper': No such file or directory
    wget: error getting response: Connection reset by peer
    这清楚地表明有某种 react 。哪个好?
    现在 kubectl get pods --all-namespaces 显示:
    kube-system     kube-dns-3675956729-ljz2w                  4/4       Running             88         42m
    所以.. 4/4 准备好了,但它一直在重新启动。
    http://pastebin.com/Z70U331Gkubectl describe pod kube-dns-3675956729-ljz2w --namespace=kube-system 输出
    所以它无法连接到 http://10.2.47.19:8081/readiness ,我猜这是 kubedns 的 IP,因为它使用端口 8081。不知道如何继续进一步调查这个问题。
    感谢一切!

    最佳答案

    kube-dns 有一个就绪探针,它尝试通过 kube-dns 的服务 IP 进行解析。您的服务网络可能有问题吗?

    在此处查看答案和解决方案:
    kubernetes service IPs not reachable

    关于kubernetes - kube-dns 不断在 coreos 上使用 kubernetes 重新启动,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42637493/

    31 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com