gpt4 book ai didi

Python Kafka 多进程与线程

转载 作者:行者123 更新时间:2023-11-28 21:39:38 29 4
gpt4 key购买 nike

我可以使用 KafkaConsumer 在单独的线程中消费消息。

但是,当我使用 multiprocessing.Process 而不是 threading.Thread 时,出现错误:

OSError: [Errno 9] 错误的文件描述符

questiondocumentation建议使用 multiprocessing 并行使用消息是可能的。有人可以分享一个工作示例吗?

编辑

这是一些示例代码。抱歉,原始代码太复杂了,所以我在这里创建了一个示例,希望能传达正在发生的事情。如果我使用 threading.Thread 而不是 multiprocessing.Process,此代码可以正常工作。

from multiprocessing import Process

class KafkaWrapper():
def __init__(self):
self.consumer = KafkaConsumer(bootstrap_servers='my.server.com')

def consume(self, topic):
self.consumer.subscribe(topic)
for message in self.consumer:
print(message.value)

class ServiceInterface():
def __init__(self):
self.kafka_wrapper = KafkaWrapper()

def start(self, topic):
self.kafka_wrapper.consume(topic)

class ServiceA(ServiceInterface):
pass

class ServiceB(ServiceInterface):
pass


def main():

serviceA = ServiceA()
serviceB = ServiceB()

jobs=[]
# The code works fine if I used threading.Thread here instead of Process
jobs.append(Process(target=serviceA.start, args=("my-topic",)))
jobs.append(Process(target=serviceB.start, args=("my-topic",)))

for job in jobs:
job.start()

for job in jobs:
job.join()

if __name__ == "__main__":
main()

这是我看到的错误(同样,我的实际代码与上面的示例不同,如果我使用 threading.Thread 它工作正常但如果我使用 multiprocessing.Process ):

File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "service_interface.py", line 58, in start
self._kafka_wrapper.start_consuming(self.service_object_id)
File "kafka_wrapper.py", line 141, in start_consuming
for message in self._consumer:
File "venv/lib/python3.6/site-packages/kafka/consumer/group.py", line 1082, in __next__
return next(self._iterator)
File "venv/lib/python3.6/site-packages/kafka/consumer/group.py", line 1022, in _message_generator
self._client.poll(timeout_ms=poll_ms, sleep=True)
File "venv/lib/python3.6/site-packages/kafka/client_async.py", line 556, in poll
responses.extend(self._poll(timeout, sleep=sleep))
File "venv/lib/python3.6/site-packages/kafka/client_async.py", line 573, in _poll
ready = self._selector.select(timeout)
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/selectors.py", line 577, in select
kev_list = self._kqueue.control(None, max_ev, timeout)
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "service_interface.py", line 58, in start
self._kafka_wrapper.start_consuming(self.service_object_id)
File "kafka_wrapper.py", line 141, in start_consuming
for message in self._consumer:
File "venv/lib/python3.6/site-packages/kafka/consumer/group.py", line 1082, in __next__
return next(self._iterator)
File "venv/lib/python3.6/site-packages/kafka/consumer/group.py", line 1022, in _message_generator
self._client.poll(timeout_ms=poll_ms, sleep=True)
File "venv/lib/python3.6/site-packages/kafka/client_async.py", line 556, in poll
responses.extend(self._poll(timeout, sleep=sleep))
OSError: [Errno 9] Bad file descriptor
File "venv/lib/python3.6/site-packages/kafka/client_async.py", line 573, in _poll
ready = self._selector.select(timeout)
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/selectors.py", line 577, in select
kev_list = self._kqueue.control(None, max_ev, timeout)
OSError: [Errno 9] Bad file descriptor

最佳答案

Kafka 消费者可以是多进程或多线程(确保正确使用的客户端库支持 Kafka 早期版本所需的 Kafka 消费者组),选择由您决定。

但是,如果我们想使用进程,Kafka 客户端库需要做一些事情,以保证自身 fork 安全,使用的底层 TCP 连接(连接到 Kafka 服务器)不应该被共享不止一个过程。这就是您出现连接错误的原因。

作为解决方法,您不应在生成进程之前创建 KafkaConsumer。相反,将操作移到每个进程中。

另一种方法是使用单线程/进程获取消息,并使用额外的进程池来完成真正的操作。

关于Python Kafka 多进程与线程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46491616/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com