python - 万无一失的跨平台进程kill daemon-6ren

python - 万无一失的跨平台进程kill daemon

转载作者：IT王子更新时间：2023-10-29 00:14:45

我有一些 python 自动化，它生成了我使用 linux script 记录的 telnet session 命令;有两个script每个日志记录 session 的进程 ID(父进程和子进程)。

我需要解决一个问题，如果 python 自动化脚本死了，script session 永远不会自行关闭；由于某种原因，这比它应该做的要难得多。

到目前为止，我已经实现了 watchdog.py(请参阅问题的底部)，它 self 守护，并在循环中轮询 python 自动化脚本的 PID。当它看到 python 自动化 PID 从服务器的进程表中消失时，它会尝试终止 script。 session 。

我的问题是:

script session 总是产生两个独立的进程，其中一个是 script sessions 是另一个的父级 script session 。
watchdog.py 不会杀死 child script session ，如果我开始script来自自动化脚本的 session (参见下面的自动化示例)

自动化示例(`reproduce_bug.py`)

import pexpect as px
from subprocess import Popen
import code
import time
import sys
import os

def read_pid_and_telnet(_child, addr):
    time.sleep(0.1) # Give the OS time to write the PIDFILE
    # Read the PID in the PIDFILE
    fh = open('PIDFILE', 'r')
    pid = int(''.join(fh.readlines()))
    fh.close()
    time.sleep(0.1)
    # Clean up the PIDFILE
    os.remove('PIDFILE')
    _child.expect(['#', '\$'], timeout=3)
    _child.sendline('telnet %s' % addr)
    return str(pid)

pidlist = list()
child1 = px.spawn("""bash -c 'echo $$ > PIDFILE """
    """&& exec /usr/bin/script -f LOGFILE1.txt'""")
pidlist.append(read_pid_and_telnet(child1, '10.1.1.1'))

child2 = px.spawn("""bash -c 'echo $$ > PIDFILE """
    """&& exec /usr/bin/script -f LOGFILE2.txt'""")
pidlist.append(read_pid_and_telnet(child2, '10.1.1.2'))

cmd = "python watchdog.py -o %s -k %s" % (os.getpid(), ','.join(pidlist))
Popen(cmd.split(' '))
print "I started the watchdog with:\n   %s" % cmd

time.sleep(0.5)
raise RuntimeError, "Simulated script crash.  Note that script child sessions are hung"

现在是当我运行上面的自动化时发生的事情的例子...注意 PID 30017 产生 30018 和 PID 30020 产生 30021。所有上述 PID 都是 script session 。

[mpenning@Hotcoffee Network]$ python reproduce_bug.py 
I started the watchdog with:
   python watchdog.py -o 30016 -k 30017,30020
Traceback (most recent call last):
  File "reproduce_bug.py", line 35, in <module>
    raise RuntimeError, "Simulated script crash.  Note that script child sessions are hung"
RuntimeError: Simulated script crash.  Note that script child sessions are hung
[mpenning@Hotcoffee Network]$

在我运行上面的自动化之后，所有的 child script session 仍在运行。

[mpenning@Hotcoffee Models]$ ps auxw | grep script
mpenning 30018  0.0  0.0  15832   508 ?        S    12:08   0:00 /usr/bin/script -f LOGFILE1.txt
mpenning 30021  0.0  0.0  15832   516 ?        S    12:08   0:00 /usr/bin/script -f LOGFILE2.txt
mpenning 30050  0.0  0.0   7548   880 pts/8    S+   12:08   0:00 grep script
[mpenning@Hotcoffee Models]$

我在 Python 2.6.6 下运行自动化，在 Debian Squeeze Linux 系统上 (uname -a: Linux Hotcoffee 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux).

问题:

守护进程似乎无法在生成进程崩溃后幸存下来。如果自动化停止(如上例所示)，我该如何修复 watchdog.py 以关闭所有脚本 session ？

说明问题的 watchdog.py 日志(遗憾的是，PID 与原始问题不一致)...

[mpenning@Hotcoffee ~]$ cat watchdog.log 
2012-02-22,15:17:20.356313 Start watchdog.watch_process
2012-02-22,15:17:20.356541     observe pid = 31339
2012-02-22,15:17:20.356643     kill pids = 31352,31356
2012-02-22,15:17:20.356730     seconds = 2
[mpenning@Hotcoffee ~]$

决议

问题本质上是竞争条件。当我试图杀死“父”script 进程时，它们已经与自动化事件同时死亡......

要解决这个问题...首先，看门狗守护进程需要在轮询观察到的 PID 之前识别要杀死的整个子列表(我的原始脚本试图在观察到的 PID 崩溃后识别 child )。接下来，我必须修改我的看门狗守护程序，以允许某些 script 进程可能会随着观察到的 PID 而终止。

看门狗.py:

#!/usr/bin/python
"""
Implement a cross-platform watchdog daemon, which observes a PID and kills 
other PIDs if the observed PID dies.

Example:
--------

watchdog.py -o 29322 -k 29345,29346,29348 -s 2

The command checks PID 29322 every 2 seconds and kills PIDs 29345, 29346, 29348 
and their children, if PID 29322 dies.

Requires:
----------

 * https://github.com/giampaolo/psutil
 * http://pypi.python.org/pypi/python-daemon
"""
from optparse import OptionParser
import datetime as dt
import signal
import daemon
import logging
import psutil
import time
import sys
import os

class MyFormatter(logging.Formatter):
    converter=dt.datetime.fromtimestamp
    def formatTime(self, record, datefmt=None):
        ct = self.converter(record.created)
        if datefmt:
            s = ct.strftime(datefmt)
        else:
            t = ct.strftime("%Y-%m-%d %H:%M:%S")
            s = "%s,%03d" % (t, record.msecs)
        return s

def check_pid(pid):        
    """ Check For the existence of a unix / windows pid."""
    try:
        os.kill(pid, 0)   # Kill 0 raises OSError, if pid isn't there...
    except OSError:
        return False
    else:
        return True

def kill_process(logger, pid):
    try:
        psu_proc = psutil.Process(pid)
    except Exception, e:
        logger.debug('Caught Exception ["%s"] while looking up PID %s' % (e, pid))
        return False

    logger.debug('Sending SIGTERM to %s' % repr(psu_proc))
    psu_proc.send_signal(signal.SIGTERM)
    psu_proc.wait(timeout=None)
    return True

def watch_process(observe, kill, seconds=2):
    """Kill the process IDs listed in 'kill', when 'observe' dies."""
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    logfile = logging.FileHandler('%s/watchdog.log' % os.getcwd())
    logger.addHandler(logfile)
    formatter = MyFormatter(fmt='%(asctime)s %(message)s',datefmt='%Y-%m-%d,%H:%M:%S.%f')
    logfile.setFormatter(formatter)


    logger.debug('Start watchdog.watch_process')
    logger.debug('    observe pid = %s' % observe)
    logger.debug('    kill pids = %s' % kill)
    logger.debug('    seconds = %s' % seconds)
    children = list()

    # Get PIDs of all child processes...
    for childpid in kill.split(','):
        children.append(childpid)
        p = psutil.Process(int(childpid))
        for subpsu in p.get_children():
            children.append(str(subpsu.pid))

    # Poll observed PID...
    while check_pid(int(observe)):
        logger.debug('Poll PID: %s is alive.' % observe)
        time.sleep(seconds)
    logger.debug('Poll PID: %s is *dead*, starting kills of %s' % (observe, ', '.join(children)))

    for pid in children:
        # kill all child processes...
        kill_process(logger, int(pid))
    sys.exit(0) # Exit gracefully

def run(observe, kill, seconds):

    with daemon.DaemonContext(detach_process=True, 
        stdout=sys.stdout,
        working_directory=os.getcwd()):
        watch_process(observe=observe, kill=kill, seconds=seconds)

if __name__=='__main__':
    parser = OptionParser()
    parser.add_option("-o", "--observe", dest="observe", type="int",
                      help="PID to be observed", metavar="INT")
    parser.add_option("-k", "--kill", dest="kill",
                      help="Comma separated list of PIDs to be killed", 
                      metavar="TEXT")
    parser.add_option("-s", "--seconds", dest="seconds", default=2, type="int",
                      help="Seconds to wait between observations (default = 2)", 
                      metavar="INT")
    (options, args) = parser.parse_args()
    run(options.observe, options.kill, options.seconds)

最佳答案

你的问题是 script 在生成后没有从自动化脚本中分离出来，所以它作为 child 工作，当 parent 去世时它仍然无法管理。

要处理 python 脚本退出，您可以使用 atexit模块。要监视子进程退出，您可以使用 os.wait或者处理 SIGCHLD 信号

关于python - 万无一失的跨平台进程kill daemon，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9400724/

文章推荐： c - listen() 忽略积压参数？

文章推荐： php - 如何防止 PHPStorm 在启动时打开项目？

文章推荐： c++ - eclipse CDT : 'can' t find a source file' while debugging

android - 进程管理 : To be killed or Not to be killed
一个观察:当我在 android 上开发我的应用程序时，我注意到 LogCat 中有以下两行。这些发生是因为我自己的应用程序的内存需求太多。我读过 Android 可以决定何时摆脱不需要或由于内存需求
unix - kill -INT 与 kill -TERM
SIGINT有什么区别信号和 SIGTERM信号？我知道SIGINT相当于在键盘上按Ctrl+C，但什么是SIGTERM为了？如果我想优雅地停止一些后台进程，我应该使用哪些？最佳答案响应的唯一区别
linux - kill(SIGSTOP) 是否在 kill() 返回时生效？
假设我有一个父进程和一个子进程(以例如 fork() 或 clone() 启动)在 Linux 上运行。进一步假设存在一些父子都可以修改的共享内存。在父进程的上下文中，我想停止子进程并知道它实际上已
bash:while 循环内的计数器(kill 和 kill -9)
所以我最近了解到 kill is not a synchronous命令，所以我在 bash 中使用这个 while 循环，这很棒: while kill PID_OF_THE_PROCESS 2>/
linux - 'kill -STOP and kill -CONT' 是如何工作的？
我遇到了一个问题。我们有一个干净的脚本用来清理旧文件，有时我们需要停止它，稍后再启动它。像下面的过程。我们在check.sh中使用kill -STOP $pid和kill -CONT $pid来控制c
c - "kill"是如何工作的？特别是 "kill"如何处理被阻止的进程？
内核中谁负责终止进程。如果“kill”来了，进程处于阻塞状态怎么办。 kill是否等到进程进入running状态才清理自己。如果有人可以在内核方面回答更多问题，例如当 kill 命令生成 SIGI
kill - 谈论 NFS 挂载选项时 "sure kill"是什么？
在下面的链接中 http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html 它说一个进程除非被“sure kill”才能杀死，但是什么是sure kill？
bash - 为什么/bin/kill 的有效信号与 kill 不同？
我正在尝试编写一个快速的 bash 脚本，以在脚本检测到的特定条件下向程序发出信号，并且出于习惯，我正在使用某些 bin 实用程序的完整路径，即 /bin/rm 和 /bin/kill 代替 rm 和
linux - kill nohup 不适用于 kill -9 PID
我的服务器在端口 80 上运行 nohup。我试试 ps aux | grep nohup 得到 root 9234 0.0 0.1 11740 932 pts/1 S+ 15:19 0:00 gre
linux - kill -9 + 禁用来自 kill 命令的消息(标准输出)
我写了下面的脚本，如果 grep 在文件中找不到相关的字符串，它会启用 20 秒的超时。脚本运行良好，但是脚本的输出是这样的: ./test: line 11: 30039: Killed 如何从
linux - "kill 0"和 "kill -‍- -$$"有什么区别？
基本上我想要一个 bash 脚本进程，在收到 SIGINT 后，在退出之前杀死它的所有子进程。我读了here使用以下内容: trap "kill -TERM -$$ ; exit 1" INT QUI
linux - 如何杀死从 kill 或 kill -9 脚本启动的进程产生的所有子进程
我有一个名为 Launcher.sh 的 shell 脚本，它由 java 进程执行。 java进程内部使用ProcessBuilder来执行bash脚本。在 Launcher.sh 中，我有以下代
python / celery : how can I kill subtasks when killing a parent task?
上下文我创建了一个调用 celery 任务的 Django 应用程序，该任务又会生成其他任务并等待它们完成。工作流程如下: 1) 主要的python/django代码在后台启动一个celery任务
linux - 我想在 kill -9 时将 " Killed ~"输出到日志文件
我要输出这条消息/usr/local/ex1.sh: line xxx: Killed ex2.sh >> $LOG_FILE 2>&1到日志文件。不过 “ex1.sh”输出 /usr/local
python - 我试图用子弹击中永久 'kill' 目标，但在使用 '.kill()' (PYGAME) 后它们重新出现
我最近上传了很多问题，我认为人们对我感到厌倦，但我不擅长编程，并且正在尝试为 A-level 类(class)编写游戏代码，所以我需要所有可以得到的帮助来学习语言。不管怎样，我将在下面展示一些相关的代
hadoop - hadoop job -kill job_id 和 yarn application -kill application_id 有什么区别
hadoop job -kill job_id 和 yarn application -kill application_id 有什么区别？ job_id 和 application_id 是否代表/
emacs - 在 post-command-hook 中，kill-word 的 this-command 不知何故变成了 kill-region
在我的 post-command-hook回调，当我这样做时 kill-word , this-command var 是 kill-region - 而不是 kill-word正如预期的那样。我想
c++ - linux 中的 force kill 命令 (kill -9) 是否会在 C++ 应用程序中使用 new 运算符清理动态分配的内存？
我有一个要在 Oracle Linux OS 上运行的 C++ 应用程序。考虑一下，我用 new 创建了几个对象。运算符(operator)。虽然我已经使用 delete 操作符来解除分配它，但是
node.js - 无法运行 node 或 npm，在 bash 上运行时收到消息 "zsh: killed"或 "Killed: 9"
不确定是否相关，但在周末我将我的操作系统升级到 Big Sur 11.1 版，然后当我开始工作时，一件又一件地出错，现在我根本无法使用 node 或 npm。我在 zsh 中收到以下消息: % nod
python - 无法使用 process.kill() 或 process.terminate() 或 os.kill() 或使用 psutil 终止 Python 子进程
我使用 python 并行启动两个子进程。一个是 HTTP 服务器，而另一个是另一个程序的执行(CustomSimpleHTTPServer.py，它是由 selenium IDE 插件生成的 pyt

IT王子

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城