gpt4 book ai didi

python - 协助 Python 多线程

转载 作者:行者123 更新时间:2023-11-28 21:58:07 29 4
gpt4 key购买 nike

目前,我有一个 url 列表可以从中获取内容并且正在连续进行。我想将其更改为并行抓取它们。这是一个伪代码。请问是设计的声音吗?我知道 .start() 启动线程,但是,我的数据库没有更新。我需要使用 q.get() 吗?谢谢

import threading    
import Queue
q = Queue.Queue()

def do_database(url):
""" grab url then input to database """
webdata = grab_url(url)
try:
insert_data_into_database(webdata)
except:
....
else:
< do I need to do anything with the queue after each db operation is done?>

def put_queue(q, url ):
q.put( do_database(url) )

for myfiles in currentdir:
url = myfiles + some_other_string
t=threading.Thread(target=put_queue,args=(q,url))
t.daemon=True
t.start()

最佳答案

奇怪的是,您将东西放入 q 中,但从不从 q 中取出任何东西。 q 的目的是什么?此外,由于do_database()返回 任何东西,因此q.put(do_database(url)) 确实看起来像唯一的东西将 None 放入 q

这些事情通常的工作方式是,将要做的工作的描述添加到队列中,然后固定数量的线程轮流从队列中取出事情。您可能不想创建无限数量的线程;-)

这是一个非常完整但未经测试的草图:

import threading
import Queue

NUM_THREADS = 5 # whatever

q = Queue.Queue()
END_OF_DATA = object() # a unique object

class Worker(threading.Thread):
def run(self):
while True:
url = q.get()
if url is END_OF_DATA:
break
webdata = grab_url(url)
try:
# Does your database support concurrent updates
# from multiple threads? If not, need to put
# this in a "with some_global_mutex:" block.
insert_data_into_database(webdata)
except:
#....

threads = [Worker() for _ in range(NUM_THREADS)]
for t in threads:
t.start()

for myfiles in currentdir:
url = myfiles + some_other_string
q.put(url)

# Give each thread an END_OF_DATA marker.
for _ in range(NUM_THREADS):
q.put(END_OF_DATA)

# Shut down cleanly. `daemon` is way overused.
for t in threads:
t.join()

关于python - 协助 Python 多线程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18940469/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com