带有 Docker 的 Python Luigi - 线程/信号问题-6ren

带有 Docker 的 Python Luigi - 线程/信号问题

转载作者：行者123 更新时间：2023-11-28 19:02:23

25

4

概述

我们正在使用 Luigi 在 Docker 容器内构建管道.这
这是我第一次使用 Luigi，我正试图让它运行，但我遇到了 Python 线程/信号错误。

我们正在 build 的
我们有一个运行 setup.py 脚本作为入口点的容器。这个脚本导入我的 Luigi 任务，但它的主要功能是打开一个 PubSub channel 到谷歌云服务。当它在该 channel 上接收到消息时，它会启动一系列任务。

错误 .
我直接从 Python 调用 Luigi，试验了这个命令的变体:
luigi.build([GetImageSet()], workers=2, local_scheduler=True, no_lock=True)
并收到此错误:
ValueError: signal only works in main thread
Signal和Luigi的背景

来自 Python Signal 模块文档:
signal.signal:这个函数只能从主线程调用；尝试从其他线程调用它会导致引发 ValueError 异常。

来自 Luigi worker.py 脚本 here
Luigi 提供 no_install_shutdown_handler 标志(默认为 false)。 '如果为真，SIGUSR1 关闭处理程序将不会安装在工作人员上'。这也是发生错误的地方(第 538 行)。该脚本在运行 signal.signal() 之前检查 no_install_shutdown_handler 的配置标志是否(默认)为 false。到目前为止，我未能让 Luigi 读取我的 client.cfg 文件并将该标志设置为 true，而 Docker 可能是罪魁祸首。

来自 Luigi interface.py 脚本 here
如果您不想从命令行运行 luigi。您可以使用此模块中定义的方法以编程方式运行 luigi。在这个脚本中，我可以提供一个自定义的 worker 调度工厂，但我还不能解决这个问题。

本地与全局 Luigi 调度程序
Luigi 为运行任务提供了两个调度程序选项。本地的

Dockerfile 问题 :在这个容器的 Dockerfile 中，我正在通过 pip 安装 Luigi，但没有做太多其他事情。审核后this和 this github 上的 docker/luigi 实现，我开始担心我在 Dockerfile 中做得不够。

我认为发生错误的可能原因

pub-sub channel 订阅者是非阻塞的，所以我正在做一些可能很糟糕的事情来阻止主线程在我们在后台等待消息时退出。这似乎是我的线程问题的可能来源。

no_install_shutdown_handler 标志未成功设置为 True，这有望规避错误，但不一定是我想要的

本地任务调度程序。我应该使用全局调度程序而不是本地调度程序。无论如何，我最终将不得不让这个工作用于生产...

从 Python 而不是命令行运行脚本

使用 luigi.build .相反，我应该使用 luigi.run ，但基于 Running from Python 的文档页面build “如果您想从另一个来源(例如数据库)获取一些动态参数，或者在开始任务之前提供额外的逻辑，这很有用。”这听起来很适合我的用例(在从发布-订阅 channel 接收消息后触发任务，该消息传递了运行第一个任务所需的变量)

反正我做错了吗？
如果您对实现我所描述的系统有任何建议，请告诉我。我还将根据要求发布我的 Dockerfile 和 setup.py 尝试。

一些代码示例

这是 Dockerfile

# ./Dockerfile
# sfm-base:test is the container with tensorflow & our python sfm-library. It installs Ubuntu, Python, pip etc.
FROM sfm-base:test
LABEL maintainer "---@---.io"

# Install luigi, google-cloud w/ pip in module mode
RUN python -m pip install luigi && \
python -m pip install --upgrade google-cloud

# for now at least, start.sh just calls setup.py and sets google credentials. Ignore that chmod bit if it's bad I don't know.
COPY start.sh /usr/local/bin
RUN chmod -R 755 "/usr/local/bin/start.sh"
ENTRYPOINT [ "start.sh" ]

WORKDIR /src
COPY . .

# this was my attempt at setting the Luigi client.cfg in the container
# all I'm having the config do for now is set [core] no_install_shutdown_handler: true
ENV LUIGI_CONFIG_PATH /src/client.cfg

这是 setup.py (针对 SO 进行了编辑)

# setup.py
from google.cloud import pubsub_v1
from time import sleep
import luigitasks
import luigi
import logging
import json

subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
'servicename', 'pubsubcommand')

# Example task. These are actually in luigitasks.py
class GetImageSet(luigi.Task):
     uri = luigi.Parameter(default='')

     def requires(self):
          return []

     def output(self):
          # write zip to local
          return

     def run(self):
          # use the URI to retrieve the ImageSet.zip from the bucket
          logging.info('Running getImageSet')

# Pubsub message came in
def onMessageReceived(message):
     print('Received message: {}'.format(message))

     if message.attributes:
          for key in message.attributes:
               if key == 'ImageSetUri':
                    value = message.attributes.get(key)
                    # Kick off the pipeline starting with the GetImageSet Task
                    # I've tried setting no_lock, take_lock, local_scheduler...
                    # General flags to try and prevent the thread issues
                    luigi.build([GetImageSet()], workers=3, local_scheduler=True, no_lock=False)
                    message.ack()

subscriber.subscribe(subscription_path, callback=onMessageReceived)

# The subscriber is non-blocking, so I keep the main thread from
# exiting to allow it to process messages in the background. Is this
# the cause of my woes?
print('Listening for messages on {}'.format(subscription_path))
while True:
    sleep(60)

最佳答案

发生这种情况是因为 subscriber.subscribe启动后台线程。当该线程调用 luigi.build抛出异常。

这里最好的解决方案是使用 subscriber.pull 从主线程读取发布-订阅消息。 .见示例 in the docs .

关于带有 Docker 的 Python Luigi - 线程/信号问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51295521/

25

4

0

文章推荐： ios - 如何显示带有逆时针动画的自定义圆形进度条？

文章推荐： ios - 无论屏幕尺寸如何，如何在正确的位置显示 Sprite

文章推荐： css - 按钮在 IE 中不起作用，在 Firefox/Opera/Chrome 中正常

文章推荐： css - 光标 :pointer and text align

docker - docker ， docker 没有IPAddress检查
我正在使用以下dockerfile: FROM ubuntu:14.04 MAINTAINER xxx xxx # SSH RUN apt-get update && apt-get install
docker - Docker docker-compose不获取相关的缓存镜像
我运行了docker-compose build celery，(经过数小时的尝试，我的连接不良)成功了。 app Dockerfile的前80％是相同的，但不会重复使用缓存。从我可以浏览的内容来看，
docker - docker 守护进程重新启动后，Docker 注册表中的所有存储库都会被删除 (docker-for-mac)
我可以使用以下命令成功创建 Docker 注册表 v2 服务:docker service create 然后我使用 docker Push 将一些图像推送到该服务。当我通过 curl localh
docker - 无法连接到 docker 的 docker 镜像中的守护进程 docker
我正在尝试使用 gitlab 构建 CI，我从 docker 的 docker 镜像开始，我的前端存储库没有任何问题，但现在使用相同的 gitlab-ci 配置文件，我有此守护程序错误。这是构建的输
docker - 最小化 Docker-in-Docker 容器内的 `docker build` 执行时间
用例: 我们在 Jenkins 中有几个“发布作业”build 和 push 应用程序的 Docker 镜像到 docker registry，更新各种文件中的项目版本，最后将发布标签推送到相应的 G
docker - 我无法使用 Docker 构建我的 docker 文件来创建我的 docker 镜像
当我尝试构建我的 docker 文件时，docker 返回以下错误: [+] Building 0.0s (1/2)
docker - 如何在不使用 docker-in-docker 的情况下在 jenkins 管道中使用 docker
docker-in-docker 的作者在此博客中建议不要将此图像用于 CI 目的: jpetazzo/Using Docker-in-Docker for your CI or testing en
docker - 在 Docker 容器中运行 Docker : Cannot connect to the Docker daemon
我创建了一个 Dockerfile 来在 Docker 中运行 Docker: FROM ubuntu:16.04 RUN apt-get update && \ apt-get in
docker - 如何在 Docker 命令行的 Docker 注册表中找到具有特定标记的 Docker 镜像？
我尝试为 Docker 镜像定位一个特定标签。我怎样才能在命令行上做到这一点？我想避免下载所有图像，然后删除不需要的图像。在 Ubuntu 官方版本中，https://registry.hub.do
docker - docker 内的 docker ，发布HTTP错误
我正在尝试在docker中运行docker。唯一的目的是实验性的，我绝不尝试实现任何功能，我只想检查docker从另一个docker运行时的性能。我通过Mac上的boot2docker启动docke
docker - Docker:docker-compose.yml中用于自动重新部署新镜像的选项
docker-compose.yml version: "3" services: daggr: image: "docker.pvt.com/test/daggr:stable"
docker - 在 Docker 容器内访问 Docker
我有一个非常具体的开发环境用例。在一些代码中，我启动了一个容器来抓取页面并检索在容器中运行的服务(Gitlab)的 token 。现在，我希望 Dockerize 运行它的代码。具体来说，类似: o
docker - docker-compose文件vs docker bundle
之前已经问过这个问题，但我不确定当时是否可以使用docker-compose文件完成docker堆栈部署。由于最新版本支持使用compose将服务部署到堆栈，因此，我无法理解dab文件的值。我检查
docker - docker 池和 docker 注册表有什么区别？
我在一次采访中被问到这个问题，但无法回答。也没有找到任何相关信息。最佳答案正如 Docker 文档中所述，Docker 注册表是: [...] a hosted service containin
docker - docker :如何将 docker 中的所有png文件复制到主机？
有没有一种方法可以将具有给定扩展名的所有文件复制到Docker中的主机？就像是 docker cp container_name:path/to/file/in/docker/*.png path/o
docker - docker 日志级别会影响日志记录驱动程序还是仅影响 docker 守护程序的日志？
我的日志驱动程序设置为journald。使用日志记录驱动程序时，daemon.json文件中的日志级别配置会影响日志吗？使用docker logs 时仅会影响容器日志？例如，docker和journ
docker - docker 服务如何管理从单独的 docker 容器调用实例？
我最近开始使用Docker + Celery。我还共享了full sample codes for this example on github，以下是其中的一些代码段，以帮助解释我的观点。就上下文
docker - docker :无法提交构建的 docker 镜像
运行docker build .命令后，尝试提交构建的镜像，但收到以下错误 Step 12 : CMD activator run ---> Using cache ---> efc82ff1ca
docker - Docker + docker-组成+无法启动服务
我们有docker-compose.yml，其中包含Kafka，zookeeper和schema registry的配置当我们启动docker compose时，出现以下错误 docker-comp
docker - docker 基本图像存储库ouside docker 中心？
我是Docker的新手。是否可以在Docker Hub外部建立Docker基本镜像存储库？假设将它们存储在您的云中，而不是拥有DH帐户？谢谢。最佳答案您可以根据需要托管自己的注册表。可以在Depl

首页

博学

6Ren·AI

商城

带有 Docker 的 Python Luigi - 线程/信号问题