gpt4 book ai didi

python - 运行 scrapy 网络爬虫时出错

转载 作者:行者123 更新时间:2023-11-28 16:37:07 24 4
gpt4 key购买 nike

import scrapy

class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]

def parse(self, response):
for sel in response.xpath('//ul/li'):
title = sel.xpath('a/text()').extract()
link = sel.xpath('a/@href').extract()
desc = sel.xpath('text()').extract()
print title, link, desc

但是,当我尝试调用蜘蛛时,我收到以下错误消息:

[example] ERROR: Spider error processing <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 1178, in mainLoop
self.runUntilCurrent()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 800, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 368, in callback
self._startRunCallbacks(result)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 464, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 551, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/andy2/Documents/Python/tutorial/tutorial/spiders/example.py", line 18, in parse
print title, link, desc
exceptions.NameError: global name 'link' is not defined

我能做些什么来使这段代码正常工作吗?

谁能帮帮我?

谢谢!!!

最佳答案

您需要实例化一个 Selector并将 response 作为参数传递。另外,您的导入不正确。这是蜘蛛的固定版本:

from scrapy.selector import Selector
from scrapy.spider import Spider


class ExampleSpider(Spider):
name = "example"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]

def parse(self, response):
sel = Selector(response)
for li in sel.xpath('//ul/li'):
title = li.xpath('a/text()').extract()
link = li.xpath('a/@href').extract()
desc = li.xpath('text()').extract()
print title, link, desc

关于python - 运行 scrapy 网络爬虫时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24502315/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com