gpt4 book ai didi

python - 带有 gevent 的 ImportError 并请求异步模块

转载 作者:太空宇宙 更新时间:2023-11-03 11:55:58 25 4
gpt4 key购买 nike

我正在编写一个简单的脚本:

  1. 加载大量 URL
  2. 使用 requests' async 获取发出并发 HTTP 请求的每个 URL 的内容模块
  3. 使用 lxml 解析页面内容,以检查页面中是否有链接
  4. 如果链接存在于页面上,则在 ZODB 数据库中保存有关该页面的一些信息

当我用 4 或 5 个 URL 测试脚本时它运行良好,当脚本结束时我只有以下消息:

 Exception KeyError: KeyError(45989520,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

但是当我尝试检查大约 24000 个 URL 时,它在列表末尾失败(当还有大约 400 个 URL 要检查时)并出现以下错误:

Traceback (most recent call last):
File "check.py", line 95, in <module>
File "/home/alex/code/.virtualenvs/linka/local/lib/python2.7/site-packages/requests/async.py", line 83, in map
File "/home/alex/code/.virtualenvs/linka/local/lib/python2.7/site-packages/gevent-1.0b2-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 405, in joinall
ImportError: No module named queue
Exception KeyError: KeyError(45989520,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

我在 pypi 上可用的 gevent 版本都试过了并从 gevent repository 下载并安装最新版本 (1.0b2) .

我不明白为什么会这样,为什么只在我检查一堆 URL 时才发生。有什么建议么?

这是整个脚本:

from requests import async, defaults
from lxml import html
from urlparse import urlsplit
from gevent import monkey
from BeautifulSoup import UnicodeDammit
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
import transaction
import persistent
import random

storage = FileStorage('Data.fs')
db = DB(storage)
connection = db.open()
root = connection.root()
monkey.patch_all()
defaults.defaults['base_headers']['User-Agent'] = "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
defaults.defaults['max_retries'] = 10


def save_data(source, target, anchor):
root[source] = persistent.mapping.PersistentMapping(dict(target=target, anchor=anchor))
transaction.commit()


def decode_html(html_string):
converted = UnicodeDammit(html_string, isHTML=True)
if not converted.unicode:
raise UnicodeDecodeError(
"Failed to detect encoding, tried [%s]",
', '.join(converted.triedEncodings))
# print converted.originalEncoding
return converted.unicode


def find_link(html_doc, url):
decoded = decode_html(html_doc)
doc = html.document_fromstring(decoded.encode('utf-8'))
for element, attribute, link, pos in doc.iterlinks():
if attribute == "href" and link.startswith('http'):
netloc = urlsplit(link).netloc
if "example.org" in netloc:
return (url, link, element.text_content().strip())
else:
return False


def check(response):
if response.status_code == 200:
html_doc = response.content
result = find_link(html_doc, response.url)
if result:
source, target, anchor = result
# print "Source: %s" % source
# print "Target: %s" % target
# print "Anchor: %s" % anchor
# print
save_data(source, target, anchor)
global todo
todo = todo -1
print todo

def load_urls(fname):
with open(fname) as fh:
urls = set([url.strip() for url in fh.readlines()])
urls = list(urls)
random.shuffle(urls)
return urls

if __name__ == "__main__":

urls = load_urls('urls.txt')
rs = []
todo = len(urls)
print "Ready to analyze %s pages" % len(urls)
for url in urls:
rs.append(async.get(url, hooks=dict(response=check), timeout=10.0))
responses = async.map(rs, size=100)
print "DONE."

最佳答案

我不确定您问题的根源是什么,但为什么文件顶部没有 monkey.patch_all()?

你能试试把

from gevent import monkey; monkey.patch_all()

在你的主程序的顶部,看看它是否修复了什么?

关于python - 带有 gevent 的 ImportError 并请求异步模块,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10267086/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com