gpt4 book ai didi

python - Queue.join() 不解锁

转载 作者:太空狗 更新时间:2023-10-30 01:21:32 35 4
gpt4 key购买 nike

我正在尝试编写用于并行抓取网站的 Python 脚本。我制作了一个可以让我爬到深度的原型(prototype)。

但是,join() 似乎不起作用,我也不知道为什么。

这是我的代码:

from threading import Thread
import Queue
import urllib2
import re
from BeautifulSoup import *
from urlparse import urljoin


def doWork():
while True:
try:
myUrl = q_start.get(False)
except:
continue
try:
c=urllib2.urlopen(myUrl)
except:
continue
soup = BeautifulSoup(c.read())
links = soup('a')
for link in links:
if('href' in dict(link.attrs)):
url = urljoin(myUrl,link['href'])
if url.find("'")!=-1: continue
url=url.split('#')[0]
if url[0:4] == 'http':
print url
q_new.put(url)




q_start = Queue.Queue()

q_new = Queue.Queue()



for i in range(20):
t = Thread(target=doWork)
t.daemon = True
t.start()


q_start.put("http://google.com")
print "loading"
q_start.join()
print "end"

最佳答案

join() will block until task_done() has been called as many times as items have been enqueued .

您不调用 task_done(),因此 join() 会阻塞。在您提供的代码中,调用它的正确位置是在您的 doWork 循环的最后:

def doWork():
while True:
task = start_q.get(False)
...
for subtask in processed(task):
...
start_q.task_done() # tell the producer we completed a task

关于python - Queue.join() 不解锁,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30805216/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com