gpt4 book ai didi

centos - 重启状态为down的节点

转载 作者:行者123 更新时间:2023-12-04 00:25:41 24 4
gpt4 key购买 nike

停电后,我的节点进入状态

信息-a

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
partMain up infinite 4 down* node[001-004]
part1* up infinite 3 down* node[002-004]
part2 up infinite 1 down* node001

我执行这些命令
 /etc/init.d/slurm stop
/etc/init.d/slurm start

信息-a
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
partMain up infinite 4 down node[001-004]
part1* up infinite 3 down node[002-004]
part2 up infinite 1 down node001

我怎么能重新启动我的节点?

信息-R
REASON USER TIMESTAMP NODELIST
Not responding root 2019-07-23T08:40:25 node[001-004]
$ scontrol update nodename=node001 state=idle    
$ scontrol update nodename=node[001-004] state=resume

# the state changes to idle* but for a few seconds then returns to down*

$service --status-all | grep 'slurm'
slurmctld (pid 24000) is running... slurmdbd (pid 4113) is running...


$systemctl status -l slurm
● slurm.service - LSB: slurm daemon management
Loaded: loaded (/etc/rc.d/init.d/slurm; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-07-24 13:45:38 CEST; 257ms ago
Docs: man:systemd-sysv-generator(8)
Process: 30094 ExecStop=/etc/rc.d/init.d/slurm stop (code=exited, status=1/FAILURE)
Process: 30061 ExecStart=/etc/rc.d/init.d/slurm start (code=exited, status=0/SUCCESS)
Main PID: 30069 (code=exited, status=1/FAILURE)

最佳答案

启动守护程序后尝试此操作:
scontrol update nodename=node001 state=idle

关于centos - 重启状态为down的节点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57144602/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com