gpt4 book ai didi

Scrapy 教程异常

转载 作者:行者123 更新时间:2023-12-04 05:40:29 25 4
gpt4 key购买 nike

我正在关注 Scrapy 教程文档 http://media.readthedocs.org/pdf/scrapy/0.14/scrapy.pdf并且我已经验证 items.py 和 dmoz_spider.py 输入(不是剪切和粘贴)正确。

对我来说,第一个“嗯……”部分是这个指令:

This is the code for our first Spider; save it in a file named dmoz_spider.py under the dmoz/spiders directory



我使用的是最新版本的 Ubuntu,但没有创建 dmoz 文件夹,所以我将此代码放入 ~/tutorial/tutorial/spiders.txt 中。 (这是我的第一个错误吗?)

所以这是我的 dmoz_spider.py 脚本:
from scrapy.spider import BaseSpider

class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]

def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)

在我的终端中,我输入
scrapy crawl dmoz

我明白了:
2012-10-08 13:20:22-0700 [scrapy] INFO: Scrapy 0.12.0.2546 started (bot: tutorial)
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled item pipelines:
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2012-10-08 13:20:22-0700 [dmoz] INFO: Spider opened
2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: <None>)
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop
self.runUntilCurrent()
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse
raise NotImplementedError
exceptions.NotImplementedError:

2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: <None>)
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop
self.runUntilCurrent()
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse
raise NotImplementedError
exceptions.NotImplementedError:

2012-10-08 13:20:22-0700 [dmoz] INFO: Closing spider (finished)
2012-10-08 13:20:22-0700 [dmoz] INFO: Spider closed (finished)

在我的搜索中,我看到其他人说可能没有安装扭曲......但是如果我使用 Scrapy 的 Ubuntu 软件包安装程序,它不会安装吗?

提前致谢!

最佳答案

BaseSpider 中的 parse 方法被调用而不是您的方法,因为您没有正确覆盖 parse 方法。您的缩进是错误的,因此 parse 被声明为 DmozSpider 类之外的函数。欢迎使用 python :)

这与twisted 无关,我可以看到twisted 在回溯中,所以它显然是安装的。

关于Scrapy 教程异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12789024/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com