gpt4 book ai didi

linux - 如何找到与 sbatch 作业相关的进程?

转载 作者:行者123 更新时间:2023-12-03 09:49:54 25 4
gpt4 key购买 nike

当我开始工作时 sbatch在多节点系统上,一些进程正在相关节点上启动。

我怎样才能找出在这些节点上运行的进程(进程 ID),因为 sbatch跑?

我检查了 slurm 文档,但没有找到任何显示相关进程的命令(例如 scontrolsstat )。

这个想法是找到进程 ID,然后使用 Linux 工具调试被“卡住”(即没有输出等)的进程,并可能找出这个特定进程在做什么。

最佳答案

您要找的是scontrol listpids .来自 scontrol manpage :

listpids [job_id[.step_id]] [NodeName]

Print a listing of the process IDs in a job step (if JOBID.STEPID is provided), or all of the job steps in a job (if job_id is provided), or all of the job steps in all of the jobs on the local node (if job_id is not provided or job_id is "*"). This will work only with processes on the node on which scontrol is run, and only for those processes spawned by Slurm and their descendants. Note that some Slurm configurations (ProctrackType value of pgid) are unable to identify all processes associated with a job or job step. Note that the NodeName option is only really useful when you have multiple slurmd daemons running on the same host machine. Multiple slurmd daemons on one host are, in general, only used by Slurm developers.



只需通过 SSH 连接到计算节点并运行 scontrol listpids .它将输出一个带有 PID/JOBID 对应关系的表。
[root@node003 ~]# scontrol listpids | column -t
PID JOBID STEPID LOCALID GLOBALID
269852 68706234 batch 0 0
269998 68706234 batch - -
[etc.]

我在这里使用 column命令以更好地对齐列并便于阅读。

关于linux - 如何找到与 sbatch 作业相关的进程?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61118592/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com