gpt4 book ai didi

debian-stretch - 重新启动后,DRBD 出现 Connected Diskless/Diskless 状态

转载 作者:行者123 更新时间:2023-12-04 01:46:19 26 4
gpt4 key购买 nike

在无人注意的断电后,面临重大问题,每次重新启动 DBRB 想出连接无盘/无盘地位。

主要问题:

  • dump-md response: Found meta data is "unclean"
  • apply-al command terminated with exit code 20 with message open(/dev/nvme0n1p1) failed: Device or resource busy
  • drbd resource config cannot be opened exclusive.


关于环境:

这个 drbd 资源通常用作 lvm 的块存储,它被配置为 proxmox ve 5.3-8 集群的(共享 lvm)存储。在 drbd 块设备之上配置了 lvm,但在 drbd 主机 lvm config 上,drbd 服务下面的设备(/dev/nvme0n1p1)被过滤掉(/etc/lvm/lvm.conf 如下所示)

drbd下的设备是PCIe NVMe设备

它有一些由 systemctl 显示的额外属性:
root@pmx0:~# systemctl list-units | grep nvme
sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1-nvme0n1p1.device loaded active plugged /sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1.device loaded active plugged /sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1

sytemctl 中列出的其他存储设备普通 SAS 磁盘看起来有点不同:
root@pmx0:~# systemctl list-units | grep sdb
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb1.device loaded active plugged PERC_H710 1
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb2.device loaded active plugged PERC_H710 2
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb.device loaded active plugged PERC_H710

使用 ls 列出 NVMe/sys/devices/..
root@pmx0:~# ls /sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
alignment_offset dev discard_alignment holders inflight partition power ro size start stat subsystem trace uevent

事情并非如此:
  • 再次重启没用
  • drbd 服务重启没用
  • drbdadm detach/disconnect/attach/service restart 没用
  • 未在这些 drbd 节点上配置 nfs-kernel-server 服务(因此无法取消配置 nfs-server)

  • 经过一番调查:

    dump-md response: Found meta data is "unclean", please apply-al first apply-al command terminated with exit code 20 with this message: open(/dev/nvme0n1p1) failed: Device or resource busy

    It seems that the problem is that this device (/dev/nvme0n1p1) used by my drbd resource config cannot be opened exclusive.



    失败的 DRBD 命令:
    root@pmx0:~# drbdadm attach r0
    open(/dev/nvme0n1p1) failed: Device or resource busy
    Operation canceled.
    Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20
    root@pmx0:~# drbdadm apply-al r0
    open(/dev/nvme0n1p1) failed: Device or resource busy
    Operation canceled.
    Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20

    root@pmx0:~# drbdadm dump-md r0
    open(/dev/nvme0n1p1) failed: Device or resource busy

    Exclusive open failed. Do it anyways?
    [need to type 'yes' to confirm] yes

    Found meta data is "unclean", please apply-al first
    Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal dump-md' terminated with exit code 255

    DRBD 服务状态/命令:
    root@pmx0:~# drbd-overview
    0:r0/0 Connected Secondary/Secondary Diskless/Diskless
    root@pmx0:~# drbdadm dstate r0
    Diskless/Diskless
    root@pmx0:~# drbdadm disconnect r0
    root@pmx0:~# drbd-overview
    0:r0/0 . . .
    root@pmx0:~# drbdadm detach r0
    root@pmx0:~# drbd-overview
    0:r0/0 . . .

    尝试重新附加资源 r0:
    root@pmx0:~# drbdadm attach r0
    open(/dev/nvme0n1p1) failed: Device or resource busy
    Operation canceled.
    Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20
    root@pmx0:~# drbdadm apply-al r0
    open(/dev/nvme0n1p1) failed: Device or resource busy
    Operation canceled.
    Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20

    lsof,定影器零输出:
    root@pmx0:~# lsof /dev/nvme0n1p1
    root@pmx0:~# fuser /dev/nvme0n1p1
    root@pmx0:~# fuser /dev/nvme0n1
    root@pmx0:~# lsof /dev/nvme0n1

    资源磁盘分区和 LVM 配置:
    root@pmx0:~# fdisk -l /dev/nvme0n1
    Disk /dev/nvme0n1: 1.9 TiB, 2048408248320 bytes, 4000797360 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: dos
    Disk identifier: 0x59762e31

    Device Boot Start End Sectors Size Id Type
    /dev/nvme0n1p1 2048 3825207295 3825205248 1.8T 83 Linux
    root@pmx0:~# pvs
    PV VG Fmt Attr PSize PFree
    /dev/sdb2 pve lvm2 a-- 135.62g 16.00g
    root@pmx0:~# vgs
    VG #PV #LV #SN Attr VSize VFree
    pve 1 3 0 wz--n- 135.62g 16.00g
    root@pmx0:~# lvs
    LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
    data pve twi-a-tz-- 75.87g 0.00 0.04
    root pve -wi-ao---- 33.75g
    swap pve -wi-ao---- 8.00g
    root@pmx0:~# vi /etc/lvm/lvm.conf
    root@pmx0:~# cat /etc/lvm/lvm.conf | grep nvm
    filter = [ "r|/dev/nvme0n1p1|", "a|/dev/sdb|", "a|sd.*|", "a|drbd.*|", "r|.*|" ]

    DRBD 资源配置:
    root@pmx0:~# cat /etc/drbd.d/r0.res
    resource r0 {
    protocol C;
    startup {
    wfc-timeout 0; # non-zero wfc-timeout can be dangerous (http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration)
    degr-wfc-timeout 300;
    become-primary-on both;
    }
    net {
    cram-hmac-alg sha1;
    shared-secret "*********";
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
    #data-integrity-alg crc32c; # has to be enabled only for test and disabled for production use (check man drbd.conf, section "NOTES ON DATA INTEGRITY")
    }
    on pmx0 {
    device /dev/drbd0;
    disk /dev/nvme0n1p1;
    address 10.0.20.15:7788;
    meta-disk internal;
    }
    on pmx1 {
    device /dev/drbd0;
    disk /dev/nvme0n1p1;
    address 10.0.20.16:7788;
    meta-disk internal;
    }
    disk {
    # no-disk-barrier and no-disk-flushes should be applied only to systems with non-volatile (battery backed) controller caches.
    # Follow links for more information:
    # http://www.drbd.org/users-guide-8.3/s-throughput-tuning.html#s-tune-disable-barriers
    # http://www.drbd.org/users-guide/s-throughput-tuning.html#s-tune-disable-barriers
    no-disk-barrier;
    no-disk-flushes;
    }
    }

    其他节点:
    root@pmx1:~# drbd-overview
    0:r0/0 Connected Secondary/Secondary Diskless/Diskless

    等等每个命令响应和配置都显示相同的节点 pmx0 上面...

    Debian 和 DRBD 版本:
    root@pmx0:~# uname -a
    Linux pmx0 4.15.18-10-pve #1 SMP PVE 4.15.18-32 (Sat, 19 Jan 2019 10:09:37 +0100) x86_64 GNU/Linux
    root@pmx0:~# cat /etc/debian_version
    9.8
    root@pmx0:~# dpkg --list| grep drbd
    ii drbd-utils 8.9.10-2 amd64 RAID 1 over TCP/IP for Linux (user utilities)
    root@pmx0:~# lsmod | grep drbd
    drbd 364544 1
    lru_cache 16384 1 drbd
    libcrc32c 16384 2 dm_persistent_data,drbd
    root@pmx0:~# modinfo drbd
    filename: /lib/modules/4.15.18-10-pve/kernel/drivers/block/drbd/drbd.ko
    alias: block-major-147-*
    license: GPL
    version: 8.4.10
    description: drbd - Distributed Replicated Block Device v8.4.10
    author: Philipp Reisner <phil@linbit.com>, Lars Ellenberg <lars@linbit.com>
    srcversion: 9A7FB947BDAB6A2C83BA0D4
    depends: lru_cache,libcrc32c
    retpoline: Y
    intree: Y
    name: drbd
    vermagic: 4.15.18-10-pve SMP mod_unload modversions
    parm: allow_oos:DONT USE! (bool)
    parm: disable_sendpage:bool
    parm: proc_details:int
    parm: minor_count:Approximate number of drbd devices (1-255) (uint)
    parm: usermode_helper:string

    坐骑:
    root@pmx0:~# cat /proc/mounts
    sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
    proc /proc proc rw,relatime 0 0
    udev /dev devtmpfs rw,nosuid,relatime,size=24679656k,nr_inodes=6169914,mode=755 0 0
    devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
    tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=4940140k,mode=755 0 0
    /dev/mapper/pve-root / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
    securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
    tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
    tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
    tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
    cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
    pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
    cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
    cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
    cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
    cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
    cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
    cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
    cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0
    cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
    cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
    cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
    cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
    systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=39,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=20879 0 0
    debugfs /sys/kernel/debug debugfs rw,relatime 0 0
    hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
    mqueue /dev/mqueue mqueue rw,relatime 0 0
    sunrpc /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
    configfs /sys/kernel/config configfs rw,relatime 0 0
    fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
    /dev/sda1 /mnt/intelSSD700G ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
    lxcfs /var/lib/lxcfs fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
    /dev/fuse /etc/pve fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
    10.0.0.15:/samba/shp /mnt/pve/bckNFS nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.15,mountvers=3,mountport=42772,mountproto=udp,local_lock=none,addr=10.0.0.15 0 0

    最佳答案

    尽管我的设置略有不同,但我遇到了类似的问题和错误消息:用于 DRBD 存储的 LVM 逻辑卷位于 MD raid1 之上。我不明白是什么导致了问题(系统停止了,我不得不冷启动),但以下命令帮助我找到并解决了“繁忙”问题。

    Code below from this blog

    dmsetup info -c  => find Major and Minor of problematic device (253 and 2 in my case)

    ls -la /sys/dev/block/253\:2/holders

    包含指向/dev/dm-9 的链接

    所有其他 drbd 设备的持有者指向类似 drbd3 -> ../../drbd3 的内容。

    因此(警告:我不知道会造成什么损害。它对我有用):
    dmsetup remove /dev/dm-9

    drbdadm up RESOURCE

    关于debian-stretch - 重新启动后,DRBD 出现 Connected Diskless/Diskless 状态,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55127490/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com