gpt4 book ai didi

kubernetes - 节点在尝试加入具有 `kubeadm` 的集群时未获得证书

转载 作者:行者123 更新时间:2023-12-02 03:01:44 24 4
gpt4 key购买 nike

我能够使用 kubeadm 为 kubernetes 部署引导主节点,但我在 kubeadm join phase kubelet-start phase 中遇到错误:

kubeadm --v=5 join phase kubelet-start 192.168.1.198:6443 --token x4drpl.ie61lm4vrqyig5vg     --discovery-token-ca-cert-hash sha256:hjksdhjsakdhjsakdhajdka --node-name media-server         
W0118 23:53:28.414247 22327 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
I0118 23:53:28.414383 22327 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock
I0118 23:53:28.414476 22327 join.go:441] [preflight] Discovering cluster-info
I0118 23:53:28.414744 22327 token.go:188] [discovery] Trying to connect to API Server "192.168.1.198:6443"
I0118 23:53:28.416434 22327 token.go:73] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.198:6443"
I0118 23:53:28.433749 22327 token.go:134] [discovery] Requesting info from "https://192.168.1.198:6443" again to validate TLS against the pinned public key
I0118 23:53:28.446096 22327 token.go:152] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.1.198:6443"
I0118 23:53:28.446130 22327 token.go:194] [discovery] Successfully established connection with API Server "192.168.1.198:6443"
I0118 23:53:28.446163 22327 discovery.go:51] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I0118 23:53:28.446186 22327 join.go:455] [preflight] Fetching init configuration
I0118 23:53:28.446197 22327 join.go:493] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0118 23:53:28.461658 22327 interface.go:400] Looking for default routes with IPv4 addresses
I0118 23:53:28.461682 22327 interface.go:405] Default route transits interface "eno2"
I0118 23:53:28.462107 22327 interface.go:208] Interface eno2 is up
I0118 23:53:28.462180 22327 interface.go:256] Interface "eno2" has 2 addresses :[192.168.1.113/24 fe80::225:90ff:febe:5aaf/64].
I0118 23:53:28.462205 22327 interface.go:223] Checking addr 192.168.1.113/24.
I0118 23:53:28.462217 22327 interface.go:230] IP found 192.168.1.113
I0118 23:53:28.462228 22327 interface.go:262] Found valid IPv4 address 192.168.1.113 for interface "eno2".
I0118 23:53:28.462238 22327 interface.go:411] Found active IP 192.168.1.113
I0118 23:53:28.462284 22327 kubelet.go:107] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I0118 23:53:28.463384 22327 kubelet.go:115] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I0118 23:53:28.465766 22327 kubelet.go:133] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.Unfortunately, an error has occurred:
timed out waiting for the conditionThis error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
timed out waiting for the condition
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).BindToCommand.func1.1
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:348
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
/workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:203
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357

现在,使用 journalctl -xeu kubelet 查看 kubelet 日志:

Jan 19 00:04:38 media-server systemd[23817]: kubelet.service: Executing: /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --cgroup-driver=cgroupfs
Jan 19 00:04:38 media-server kubelet[23817]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 19 00:04:38 media-server kubelet[23817]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.706834 23817 server.go:416] Version: v1.17.1
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.707261 23817 plugins.go:100] No cloud provider specified.
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.707304 23817 server.go:821] Client rotation is on, will bootstrap in background
Jan 19 00:04:38 media-server kubelet[23817]: E0119 00:04:38.709106 23817 bootstrap.go:240] unable to read existing bootstrap client config: invalid configuration: [unable to read client-cert /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory, unable to read client-key /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory]
Jan 19 00:04:38 media-server kubelet[23817]: F0119 00:04:38.709153 23817 server.go:273] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jan 19 00:04:38 media-server systemd[1]: kubelet.service: Child 23817 belongs to kubelet.service.
Jan 19 00:04:38 media-server systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION

有趣的是,在试图加入的 worker 上没有找到 kubelet-client-current.pem,实际上是 /var/lib/kubelet/pki 中唯一的文件是 kubelet.{crt,key}

如果我在尝试加入的节点上运行以下命令,我会发现所有证书都丢失了:

# kubeadm alpha certs check-expiration
W0119 00:06:35.088034 24017 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0119 00:06:35.088082 24017 validation.go:28] Cannot validate kubelet config - no validator is available
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
!MISSING! admin.conf
!MISSING! apiserver
!MISSING! apiserver-etcd-client
!MISSING! apiserver-kubelet-client
!MISSING! controller-manager.conf
!MISSING! etcd-healthcheck-client
!MISSING! etcd-peer
!MISSING! etcd-server
!MISSING! front-proxy-client
!MISSING! scheduler.conf Error checking external CA condition for ca certificate authority: failure loading certificate for API server: failed to load certificate: couldn't load the certificate file /etc/kubernetes/pki/apiserver.crt: open /etc/kubernetes/pki/apiserver.crt: no such file or directory
To see the stack trace of this error execute with --v=5 or higher

/etc/kubernetes/pki 中唯一的文件是ca.crt

master 和 worker 都有 kubeadm 和 kubelet 版本 1.17.1,所以版本不匹配的可能性不大

可能不相关但也容易导致错误的是工作节点和主节点都使用 Cgroup Driver: systemd 设置了 docker,但由于某种原因 kubelet 被传递 --cgroup-驱动=cgroupfs

是什么导致了这个问题?更重要的是,我该如何修复它才能成功将节点加入主节点?

编辑:更多信息

在 worker 上,systemd 文件是:

~# cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
#Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

kubelet 的单元服务:

~# cat /etc/systemd/system/multi-user.target.wants/kubelet.service 
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

和 kubelet config.yaml:

~# cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

工作节点与主节点上 /var/lib/kubelet/kubeadm-flags.env 的内容:

worker :

KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1"

大师:

KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-conf=/run/systemd/resolve/resolv.conf"

master和worker的docker版本都是18.09,配置文件也一样:

~$ cat /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"data-root": "/opt/var/docker/"
}

最佳答案

我认为,由于引导 token 过期,工作节点上的 kubelet 服务无法向 API 服务器进行身份验证。您能否在主节点上重新生成 token 并尝试在工作节点上运行 kubeadm join 命令?

CMD:  kubeadm token create --print-join-command

关于kubernetes - 节点在尝试加入具有 `kubeadm` 的集群时未获得证书,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59826535/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com