gpt4 book ai didi

python - 扭曲的 Python getPage

转载 作者:太空狗 更新时间:2023-10-30 01:47:23 25 4
gpt4 key购买 nike

我试图就此获得支持,但我完全感到困惑。

这是我的代码:


from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.web.error import Error
from twisted.internet.defer import DeferredList
from sys import argv

class GrabPage:
def __init__(self, page):
self.page = page

def start(self, *args):
if args == ():
# We apparently don't need authentication for this
d1 = getPage(self.page)
else:
if len(args) == 2:
# We have our login information
d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
else:
raise Exception('Missing parameters')

d1.addCallback(self.pageCallback)
dl = DeferredList([d1])
d1.addErrback(self.errorHandler)
dl.addCallback(self.listCallback)

def errorHandler(self,result):
# Bad thingy!
pass

def pageCallback(self, result):
return result

def listCallback(self, result):
print result

a = GrabPage('http://www.google.com')
data = a.start() # Not the HTML

我希望在调用 start() 时获取提供给 pageCallback 的 HTML。这对我来说是一个皮塔饼。泰!为我糟糕的编码感到抱歉。

最佳答案

您缺少 Twisted 运作方式的基础知识。这一切都围绕着 reactor,您甚至从未运行过它。把 react 堆想象成这样:

Reactor Loop
(来源:krondo.com)

在您启动 react 器之前,通过设置延迟,您所做的只是将它们链接起来,没有可触发的事件。

我建议你给Twisted Intro通过 Dave Peticolas一读。它速度很快,而且确实为您提供了 Twisted 文档没有提供的所有缺失信息。

无论如何,这里是尽可能最基本的getPage用法示例:

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
print output
if reactor.running:
reactor.stop()

if __name__ == '__main__':
print 'fetching', url
d = getPage(url)
d.addCallback(print_and_stop)
reactor.run()

由于 getPage 返回一个延迟链,我将回调 print_and_stop 添加到延迟链。之后,我启动了 reactor。 react 器触发 getPage,然后触发 print_and_stop,打印来自 aol.com 的数据,然后停止 react 器。

编辑以显示 OP 代码的工作示例:

class GrabPage:
def __init__(self, page):
self.page = page
########### I added this:
self.data = None

def start(self, *args):
if args == ():
# We apparently don't need authentication for this
d1 = getPage(self.page)
else:
if len(args) == 2:
# We have our login information
d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
else:
raise Exception('Missing parameters')

d1.addCallback(self.pageCallback)
dl = DeferredList([d1])
d1.addErrback(self.errorHandler)
dl.addCallback(self.listCallback)

def errorHandler(self,result):
# Bad thingy!
pass

def pageCallback(self, result):
########### I added this, to hold the data:
self.data = result
return result

def listCallback(self, result):
print result
# Added for effect:
if reactor.running:
reactor.stop()

a = GrabPage('http://google.com')
########### Just call it without assigning to data
#data = a.start() # Not the HTML
a.start()

########### I added this:
if not reactor.running:
reactor.run()

########### Reference the data attribute from the class
data = a.data
print '------REACTOR STOPPED------'
print
########### First 100 characters of a.data:
print '------a.data[:100]------'
print data[:100]

关于python - 扭曲的 Python getPage,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2671780/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com