gpt4 book ai didi

python - Mechanize + 异步浏览器调用

转载 作者:太空宇宙 更新时间:2023-11-04 05:59:52 24 4
gpt4 key购买 nike

我正在寻找一种无需等待答案即可发出大量异步 Web 请求的解决方案。

这是我当前的代码:

import mechanize
from mechanize._opener import urlopen
from mechanize._form import ParseResponse
from multiprocessing import Pool

brow = mechanize.Browser()
brow.open('https://website.com')

#Login
brow.select_form(nr = 0)

brow.form['username'] = 'user'
brow.form['password'] = 'password'
brow.submit()

while(true):
#async open the browser until some state is fullfilled
brow.open('https://website.com/needthiswebsite')

上面代码的问题是,如果我尝试打开两个浏览器,bro2 必须等待 bro1 完成才能启动。 (它的阻塞)

bro1.open('https://website.com/needthiswebsite')
bro2.open('https://website.com/needthiswebsite')

解决方案的尝试:

#PSUDO-CODE

#GLOBAL VARIABLE STATE
boolean state = true

while(state):
#async open the browser until some state is full filled
#I spam this function until I get a positive answer from one of the calls
pool = Pool(processes = 1)
result = pool.apply_async(openWebsite,[brow1],callback = updateState)

def openWebsite(browser):
result = browser.open('https://website.com/needthiswebsite')
if result.something() == WHATIWANT:
return true
return false

def updateState(state):
state = true

我正在尝试为我的问题实现类似的解决方案,如以下答案: Asynchronous method call in Python?关于 stackoverflow 的问题。

问题是我在尝试使用 pool.apply_async(brow.open()) 时遇到错误

错误信息:

PicklingError: Can't pickle : attribute lookup builtin.function failed

我尝试了很多方法来尝试修复 PicklingError,但似乎没有任何效果。

  • 是否可以通过 Mechanize 来做到这一点?
  • 我应该改用另一个库,例如 urllib2 或类似的库吗?

任何帮助将不胜感激:)

最佳答案

mechanize.Browser 对象不可 pickleable,因此不能将其传递给 pool.apply_async(或任何其他需要将对象发送到子进程):

>>> b = mechanize.Browser()
>>> import pickle
>>> pickle.dumps(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 725, in save_inst
save(stuff)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "/usr/lib/python2.7/pickle.py", line 615, in _batch_appends
save(x)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 725, in save_inst
save(stuff)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle instancemethod objects

最简单的做法是在每个子进程中创建 Browser 实例,而不是在父进程中:

def openWebsite(url):
brow = mechanize.Browser()
brow.open('https://website.com')

#Login
brow.select_form(nr=0)

brow.form['username'] = 'user'
brow.form['password'] = 'password'
brow.submit()

result = brow.open(url)
if result.something() == WHATIWANT:
return True
return False

理想情况下,您可以使用父进程中的 Browser 对象登录,然后在多个进程之间发出并行请求,但这可能需要花费大量的精力使对象可腌制(如果可能的话)——即使您设法删除了导致当前错误的 instancemethod 对象,浏览器 中可能还有更多不可腌制的对象> 除此之外。

关于python - Mechanize + 异步浏览器调用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25665987/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com