gpt4 book ai didi

kubernetes - kubeadm v1.18.2 和 crio 版本 1.18.2 无法从 Centos7/RH7 上的私有(private)仓库启动主节点

转载 作者:行者123 更新时间:2023-12-02 12:11:27 25 4
gpt4 key购买 nike

说明
我对 Kubernetes 比较陌生。我可以在使用默认套接字 (/var/run/dockershim.sock) 时运行我的集群,但是当我尝试使用 crio 套接字从我的私有(private)仓库中提取图像时,我注意到速度甚至无法与之相比。
我正在尝试将所有节点配置为使用 crio.socket,但我无法使用此套接字启动主节点。
我遵循了 kubernetes Configuring each kubelet in your cluster using kubeadm 和 git 文档 cri-o 的文档。
不幸的是,我无法让它工作,因为它似乎忽略了私有(private) repo 标志。
重现问题的步骤:

  • 使用以下初始化启动主节点(主节点)(使用私有(private)存储库):

  • kubeadm init \
    --upload-certs \
    --cri-socket=/var/run/crio/crio.sock \
    --node-name=my_node_name \
    --image-repository=my.private.repo \
    --pod-network-cidr=10.96.0.0/16 \
    --kubernetes-version=v1.18.2 \
    --control-plane-endpoint=ip:6443 \
    --apiserver-cert-extra-sans=ip \
    --apiserver-advertise-address=ip
  • 以 root 身份或使用 sudo 运行:journalctl -xeu crio -f
  • 在调试或信息模式下观察下面的日志示例

  • 描述您收到的结果:
    Debug模式下来自 crio 的日志示例:
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043499089+02:00" level=debug msg="Trying to access \"k8s.gcr.io/pause:3.2\"" file="docker/docker_image_src.go:68"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043547722+02:00" level=debug msg="Credentials not found" file="config/config.go:123"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043576124+02:00" level=debug msg="Using registries.d directory /etc/containers/registries.d for sigstore configuration" file="docker/lookaside.go:51"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043706369+02:00" level=debug msg=" Using \"default-docker\" configuration" file="docker/lookaside.go:169"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043736378+02:00" level=debug msg=" No signature storage configuration found for k8s.gcr.io/pause:3.2" file="docker/lookaside.go:174"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043769424+02:00" level=debug msg="Looking for TLS certificates and private keys in /etc/docker/certs.d/k8s.gcr.io" file="tlsclientconfig/tlsclientconfig.go:21"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043858410+02:00" level=debug msg="GET https://k8s.gcr.io/v2/" file="docker/docker_client.go:516"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046154250+02:00" level=debug msg="Ping https://k8s.gcr.io/v2/ err Get \"https://k8s.gcr.io/v2/\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v2/\", Err:(*net.OpError)(0xc00084d5e0)})" file="docker/docker_client.go:708"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046239456+02:00" level=debug msg="GET https://k8s.gcr.io/v1/_ping" file="docker/docker_client.go:516"
    Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.048653448+02:00" level=debug msg="Ping https://k8s.gcr.io/v1/_ping err Get \"https://k8s.gcr.io/v1/_ping\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v1/_ping\", Err:(*net.OpError)(0xc0006b0690)})" file="docker/docker_client.go:735"
    描述你期望的结果:
    使用 crio 套接字启动节点
    您认为重要的其他信息(例如,问题仅偶尔发生):
    如果我使用默认套接字启动节点,例如:
    # kubeadm init \
    --upload-certs \
    --cri-socket=/var/run/dockershim.sock \
    --node-name=my_node_name \
    --image-repository=my.private.repo \
    --pod-network-cidr=10.96.0.0/16 \
    --kubernetes-version=v1.18.2 \
    --control-plane-endpoint=ip:6443 \
    --apiserver-cert-extra-sans=ip \
    --apiserver-advertise-address=ip
    W0630 20:24:33.223266 29033 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
    [init] Using Kubernetes version: v1.18.2
    [preflight] Running pre-flight checks
    [preflight] Pulling images required for setting up a Kubernetes cluster
    [preflight] This might take a minute or two, depending on the speed of your internet connection
    [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
    [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet-start] Starting the kubelet
    [certs] Using certificateDir folder "/etc/kubernetes/pki"
    [certs] Using existing ca certificate authority
    [certs] Using existing apiserver certificate and key on disk
    [certs] Using existing apiserver-kubelet-client certificate and key on disk
    [certs] Using existing front-proxy-ca certificate authority
    [certs] Using existing front-proxy-client certificate and key on disk
    [certs] Using existing etcd/ca certificate authority
    [certs] Using existing etcd/server certificate and key on disk
    [certs] Using existing etcd/peer certificate and key on disk
    [certs] Using existing etcd/healthcheck-client certificate and key on disk
    [certs] Using existing apiserver-etcd-client certificate and key on disk
    [certs] Using the existing "sa" key
    [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
    [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
    [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
    [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
    [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
    [control-plane] Using manifest folder "/etc/kubernetes/manifests"
    [control-plane] Creating static Pod manifest for "kube-apiserver"
    [control-plane] Creating static Pod manifest for "kube-controller-manager"
    W0630 20:24:35.839949 29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
    [control-plane] Creating static Pod manifest for "kube-scheduler"
    W0630 20:24:35.841420 29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
    [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [apiclient] All control plane components are healthy after 11.003647 seconds
    [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    [kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
    [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
    [upload-certs] Using certificate key:
    key
    [mark-control-plane] Marking the node hostname as control-plane by adding the label "node-role.kubernetes.io/master=''"
    [mark-control-plane] Marking the node hostname as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
    [bootstrap-token] Using token: token
    [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
    [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
    [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
    [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
    [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
    [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
    [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
    [addons] Applied essential addon: CoreDNS
    [addons] Applied essential addon: kube-proxy

    Your Kubernetes control-plane has initialized successfully!

    To start using your cluster, you need to run the following as a regular user:

    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config

    You should now deploy a pod network to the cluster.
    Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
    https://kubernetes.io/docs/concepts/cluster-administration/addons/

    You can now join any number of the control-plane node running the following command on each as root:

    kubeadm join ip:6443 --token token \
    --discovery-token-ca-cert-hash sha256:hash \
    --control-plane --certificate-key key

    Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
    As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
    "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

    Then you can join any number of worker nodes by running the following on each as root:

    kubeadm join ip:6443 --token token \
    --discovery-token-ca-cert-hash sha256:hash
    如果我使用 crio 套接字启动节点:
    # kubeadm init \
    --upload-certs \
    --cri-socket=/var/run/crio/crio.sock \
    --node-name=my_node_name \
    --image-repository=my.private.repo \
    --pod-network-cidr=10.96.0.0/16 \
    --kubernetes-version=v1.18.2 \
    --control-plane-endpoint=ip:6443 \
    --apiserver-cert-extra-sans=ip \
    --apiserver-advertise-address=ip
    W0630 20:32:33.827957 2916 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
    [init] Using Kubernetes version: v1.18.2
    [preflight] Running pre-flight checks
    [preflight] Pulling images required for setting up a Kubernetes cluster
    [preflight] This might take a minute or two, depending on the speed of your internet connection
    [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
    [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet-start] Starting the kubelet
    [certs] Using certificateDir folder "/etc/kubernetes/pki"
    [certs] Generating "ca" certificate and key
    [certs] Generating "apiserver" certificate and key
    [certs] apiserver serving cert is signed for DNS names [hostname kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.96.134.57 10.96.134.57 10.96.134.57]
    [certs] Generating "apiserver-kubelet-client" certificate and key
    [certs] Generating "front-proxy-ca" certificate and key
    [certs] Generating "front-proxy-client" certificate and key
    [certs] Generating "etcd/ca" certificate and key
    [certs] Generating "etcd/server" certificate and key
    [certs] etcd/server serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
    [certs] Generating "etcd/peer" certificate and key
    [certs] etcd/peer serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
    [certs] Generating "etcd/healthcheck-client" certificate and key
    [certs] Generating "apiserver-etcd-client" certificate and key
    [certs] Generating "sa" key and public key
    [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
    [kubeconfig] Writing "admin.conf" kubeconfig file
    [kubeconfig] Writing "kubelet.conf" kubeconfig file
    [kubeconfig] Writing "controller-manager.conf" kubeconfig file
    [kubeconfig] Writing "scheduler.conf" kubeconfig file
    [control-plane] Using manifest folder "/etc/kubernetes/manifests"
    [control-plane] Creating static Pod manifest for "kube-apiserver"
    [control-plane] Creating static Pod manifest for "kube-controller-manager"
    W0630 20:32:37.829806 2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
    [control-plane] Creating static Pod manifest for "kube-scheduler"
    W0630 20:32:37.830826 2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
    [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
    Unfortunately, an error has occurred:
    timed out waiting for the condition

    This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

    Additionally, a control plane component may have crashed or exited when started by the container runtime.
    To troubleshoot, list all containers using your preferred container runtimes CLI.

    Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
    - 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'

    error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
    To see the stack trace of this error execute with --v=5 or higher
    我可以看到 localhost 正在监听端口 10248:
    # curl -sSL http://localhost:10248/healthz
    ok
    crio 套接字示例(如文档中所述):
    # curl -v --unix-socket /var/run/crio/crio.sock http://localhost/info | jq
    * About to connect() to localhost port 80 (#0)
    * Trying /var/run/crio/crio.sock...
    * Failed to set TCP_KEEPIDLE on fd 3
    * Failed to set TCP_KEEPINTVL on fd 3
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to localhost (/var/run/crio/crio.sock) port 80 (#0)
    > GET /info HTTP/1.1
    > User-Agent: curl/7.29.0
    > Host: localhost
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Content-Type: application/json
    < Date: Tue, 30 Jun 2020 18:36:35 GMT
    < Content-Length: 240
    <
    { [data not shown]
    100 240 100 240 0 0 144k 0 --:--:-- --:--:-- --:--:-- 234k
    * Connection #0 to host localhost left intact
    {
    "storage_driver": "overlay2",
    "storage_root": "/var/lib/containers/storage",
    "cgroup_driver": "systemd",
    "default_id_mappings": {
    "uids": [
    {
    "container_id": 0,
    "host_id": 0,
    "size": 4294967295
    }
    ],
    "gids": [
    {
    "container_id": 0,
    "host_id": 0,
    "size": 4294967295
    }
    ]
    }
    }
    kubelet status 的输出
    # systemctl status kubelet -l
    ● kubelet.service - kubelet: The Kubernetes Node Agent
    Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
    └─10-kubeadm.conf
    Active: active (running) since Tue 2020-06-30 20:39:49 CEST; 6s ago
    Docs: https://kubernetes.io/docs/
    Main PID: 8502 (kubelet)
    Tasks: 15
    Memory: 20.1M
    CGroup: /system.slice/kubelet.service
    └─8502 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --hostname-override=hostname

    Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.369441 8502 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
    Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.399015 8502 kubelet_node_status.go:70] Attempting to register node hostname
    Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.403707 8502 kubelet.go:2267] node "hostname" not found
    Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.503871 8502 kubelet.go:2267] node "hostname" not found
    Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.604115 8502 kubelet.go:2267] node "hostname" not found
    Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.704324 8502 kubelet.go:2267] node "hostname" not found
    Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.769448 8502 kubelet_node_status.go:92] Unable to register node "hostname" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
    Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.805779 8502 kubelet.go:2267] node "hostname" not found
    Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.906014 8502 kubelet.go:2267] node "hostname" not found
    Jun 30 20:39:56 hostname kubelet[8502]: E0630 20:39:56.007272 8502 kubelet.go:2267] node "hostname" not found
    据我所知,网络错误无关紧要,因为我还没有启动网络容器,所以此时会出现错误。
    crio --version 的输出:
    # crio --version
    crio version 1.18.2
    Version: 1.18.2
    GitCommit: 7f261aeebffed079b4475dde8b9d602b01973d33
    GitTreeState: clean
    BuildDate: 2020-06-18T21:05:27Z
    GoVersion: go1.14
    Compiler: gc
    Platform: linux/amd64
    Linkmode: static
    kubelet --version 的输出:
    # kubelet --version
    Kubernetes v1.18.2
    LinuxOS version 的输出:
    # cat /etc/redhat-release
    Red Hat Enterprise Linux Server release 7.8 (Maipo)
    其他环境详细信息(AWS、VirtualBox、物理等):
    安装应用在准系统节点上。
    kubelet 文件示例
    # cat /etc/default/kubelet
    KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m
    更新: 我在 github Kubernetes v1.18.2 with crio version 1.18.2 failing to sync with kubelet on RH7 #3915 中提出了一张票。看起来有一个错误,因为 cri-o 无法处理远程存储库,它正在尝试提取默认 repo k8s.io。我会在获得更多信息后立即更新票证。

    最佳答案

    所以问题不完全是 CRI-O 上的错误。正如我们最初所想的那样(也是 CRI-O 开发团队),但如果用户希望使用 CRI-O,似乎需要应用很多配置。作为 CRI对于 kubernetes并且还希望使用私有(private) repo 。
    所以我不会把 CRI-O 的配置放在这里,因为它已经记录在我与团队提出的票证上 Kubernetes v1.18.2 with crio version 1.18.2 failing to sync with kubelet on RH7#3915 .
    有人应该应用的第一个配置是配置将提取图像的容器的注册表:

    $ cat /etc/containers/registries.conf
    [[registry]]
    prefix = "k8s.gcr.io"
    insecure = false
    blocked = false
    location = "k8s.gcr.io"

    [[registry.mirror]]
    location = "my.private.repo"
    CRI-O 建议将此配置作为标志传递给 kubelet ( haircommander/cri-o-kubeadm),但对我而言,它不仅仅使用此配置。
    我回到了 kubernetes 手册,建议不要将标志传递给 kubelet,而是传递给文件 /var/lib/kubelet/config.yaml。在运行期间。对我来说这是不可能的,因为节点需要从 CRI-O 套接字而不是任何其他套接字开始(引用 Configure cgroup driver used by kubelet on control-plane node)。
    所以我设法通过在下面的配置文件示例中传递这个标志来启动并运行它:
    $ cat /tmp/config.yaml
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: InitConfiguration
    localAPIEndpoint:
    advertiseAddress: 1.2.3.4
    bindPort: 6443
    nodeRegistration:
    criSocket: unix:///var/run/crio/crio.sock
    name: node.name
    taints:
    - effect: NoSchedule
    key: node-role.kubernetes.io/master
    ---
    apiServer:
    timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta2
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controllerManager: {}
    dns:
    type: CoreDNS
    etcd:
    local:
    dataDir: /var/lib/etcd
    controlPlaneEndpoint: 1.2.3.4:6443
    imageRepository: my.private.repo
    kind: ClusterConfiguration
    kubernetesVersion: v1.18.2
    networking:
    dnsDomain: cluster.local
    podSubnet: 10.85.0.0/16
    serviceSubnet: 10.96.0.0/12
    scheduler: {}
    ---
    apiVersion: kubelet.config.k8s.io/v1beta1
    kind: KubeletConfiguration
    cgroupDriver: systemd
    然后用户可以简单地使用标志 --config <file.yml> 启动主/工作节点。并且节点将成功启动。
    希望这里的所有信息对其他人有所帮助。

    关于kubernetes - kubeadm v1.18.2 和 crio 版本 1.18.2 无法从 Centos7/RH7 上的私有(private)仓库启动主节点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62675268/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com