python - 我该如何解决; "TypeError: ' WikipediaItem'对象不支持项目分配”-6ren

python - 我该如何解决; "TypeError: ' WikipediaItem'对象不支持项目分配”

转载作者：太空宇宙更新时间：2023-11-03 14:28:25

24

4

我对 python 和 scrapy 都很陌生。我想从维基百科上抓取数据，但没有成功。每次我做 scrapy crawl wiki 时，我总是得到； “TypeError:'WikipediaItem' 对象不支持项目分配”。我该如何解决这个问题并成功地从维基百科中抓取详细信息。

无论如何，这是我的代码:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from wikipedia.items import WikipediaItem

class WikipediaItem(BaseSpider):
    name = "wiki"
    allowed_domains = ["wikipedia.org"]
    start_urls = ["http://en.wikipedia.org/wiki/Main_Page"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//table[@id="mp-upper"]/tr')
        items = []
        for site in sites:
            item = WikipediaItem()
            item['title'] = site.select('.//a[@class="MainPageBG"]/text()').extract()
            item['link'] = site.select('.//a[@class="MainPageBG"]').extract()
            item['details'] = site.select('.//p/text()').extract()
            items.append(item)
        return items

这是我得到的结果:

2013-04-18 23:56:54+0800 [scrapy] INFO: Scrapy 0.14.4 started (bot: wikipedia)
2013-04-18 23:56:54+0800 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
2013-04-18 23:56:54+0800 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-04-18 23:56:54+0800 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware    
2013-04-18 23:56:54+0800 [scrapy] DEBUG: Enabled item pipelines: 
2013-04-18 23:56:54+0800 [wiki] INFO: Spider opened
2013-04-18 23:56:54+0800 [wiki] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-04-18 23:56:54+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-04-18 23:56:54+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-04-18 23:56:56+0800 [wiki] DEBUG: Crawled (200) <GET http://en.wikipedia.org/wiki/Main_Page> (referer: None)
2013-04-18 23:56:56+0800 [wiki] ERROR: Spider error processing <GET http://en.wikipedia.org/wiki/Main_Page>
    Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop
        self.runUntilCurrent()
      File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 368, in callback
        self._startRunCallbacks(result)
      File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 464, in _startRunCallbacks
        self._runCallbacks()
    --- <exception caught here> ---
      File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/home/jean/wiki/wikipedia/spiders/wikipedia_spider.py", line 17, in parse
        item['title'] = row.select('.//a[@class="MainPageBG"]/text()').extract()
    exceptions.TypeError: 'WikipediaItem' object does not support item assignment
2013-04-18 23:56:56+0800 [wiki] INFO: Closing spider (finished)
2013-04-18 23:56:56+0800 [wiki] INFO: Dumping spider stats:
    {'downloader/request_bytes': 215,
     'downloader/request_count': 1,    
     'downloader/request_method_count/GET': 1,
     'downloader/response_bytes': 17762,
     'downloader/response_count': 1,
     'downloader/response_status_count/200': 1,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2013, 4, 18, 15, 56, 56, 244255),    
     'scheduler/memory_enqueued': 1,
     'spider_exceptions/TypeError': 1,
     'start_time': datetime.datetime(2013, 4, 18, 15, 56, 54, 592948)}
2013-04-18 23:56:56+0800 [wiki] INFO: Spider closed (finished)
2013-04-18 23:56:56+0800 [scrapy] INFO: Dumping global stats:
    {'memusage/max': 28065792, 'memusage/startup': 28065792}

这是我的 items.py

从 scrapy.item 导入项目，字段

类维基百科项目(项目):

title = Field()

link = Field()

details = Field()

最佳答案

您将您的抓取器命名为与您导入的 WikipediaItem 相同:

from wikipedia.items import WikipediaItem

class WikipediaItem(BaseSpider):
    # ...

因此，parse 使用的是您的 BaseSpider 子类，而不是您在 wikipedia.items 中定义的任何内容。也许您想重命名该类:

class WikipediaSpider(BaseSpider):
    # ...

关于python - 我该如何解决; "TypeError: ' WikipediaItem'对象不支持项目分配”，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16088022/

24

4

0

文章推荐： python : template var without space

文章推荐： c# - 是否可以在 Stream 参数中限制/要求某些功能？

javascript - native 基础 toast - TypeError : TypeError: TypeError: null is not an object (evaluating 'this.toastInstance._root.getModalState' )
我正在使用 React Native 构建移动应用程序。我面临 Nativ Base Toast 问题。当我第一次加载应用程序然后导航到工单状态时，如果我返回带有 android 后退按钮的主页，则会
TypeError: $(...).perfectScrollbar is not a function(TypeError：$(...).Perfect滚动条不是函数)
我正在尝试创建一个“完美的滚动条”，它是这样的：。Https://github.com/noraesae/perfect-scrollbar-bower。使用尽可能简单的代码：。我犯了以下错误：。当然
javascript - Draftjs: TypeError: TypeError: this.getImmutable(...) 未定义
我正在尝试在简单的 Draftjs 编辑器上应用自定义装饰器: import React from 'react'; import {Editor, EditorState, RichUtils} f
TypeError - read csv functionality(TypeError-读取CSV功能)
读取以钟形字符作为分隔符的CSV文件时，出现类型错误。我不想使用熊猫，我需要使用CSV库来解决这个问题。。示例标题：。数据类型。样本数据：。示例代码。我明白这个错误-。铃声字符参考-https://w
reactjs - TypeError : TypeError: (0, _reactRedux.useSelector) 不是函数
我正在处理 useSelector的 react-redux在我的 React Native 应用程序中，我收到以下错误: TypeError: TypeError: (0, _reactRedux.
javascript - Node 子进程生成 "TypeError: Bad argument TypeError"？
当我用 Node 运行以下代码时: var command = "/home/myScript.sh"; fs.exists(command, function(exists){ if(exi
reactjs - TypeError : wrapper. 存在不是函数 && TypeError : wrapper. find 不是函数
我正在为我的一个组件编写测试用例，该组件具有路由器(使用 withrouter)。我收到错误 wrapper.find is not a function。基本要求是需要检查我的渲染中是否存在标签，还
javascript - jquery TypeError : $(. ..).validate 和 TypeError : $(. ..).modal 不是函数
我一直在研究一个简单的表单提交。首先，我想在提交表单之前创建一个模式警报。于是，我使用了bootstrap的modal函数，反复得到 TypeError: $(...).modal is not a
python - is_authenticated() 引发 TypeError TypeError : 'bool' object is not callable
这个问题在这里已经有了答案: Flask-Login raises TypeError: 'bool' object is not callable when trying to override
TypeError: 'ListNode' object has no attribute '__getitem__'(TypeError：‘ListNode’对象没有属性‘__getitem__’)
这是我在leetcode中遇到的问题。您将看到两个非空链接表，表示两个非负整数。数字以相反的顺序存储，并且它们的每个节点都包含一个数字。将这两个数字相加，然后以链表的形式返回总和。。你可以假设这两个数
Why am I seeing "TypeError: string indices must be integers"?(为什么我看到“TypeError：字符串索引必须是整数”？)
我正在尝试学习Python，并试图将GitHub问题变成一种可读的形式。根据关于如何将JSON转换为CSV的建议，我得出了以下结论：。其中“Issues.json”是包含GitHub问题的JSON文件
javascript - 代理类的 TypeError - TypeError : 'set' on proxy: trap returned truish for property
我在使用 Proxy 类时遇到了这个有趣的错误: TypeError: 'set' on proxy: trap returned truish for property 'users' which
TypeError:unsupported format string passed to function .__format__(TypeError：传递给函数的格式字符串不受支持。__FORMAT__)
在研究Jupyter笔记本电脑时，我遇到了这个问题：。这是代码开始的地方：。下面的代码是在jupyter笔记本的另一个单元上运行的。我怎么才能解决它呢？。尝试更改参数和一系列其他内容，但所有这些都弹出
TypeError:unsupported format string passed to function .__format__(TypeError：传递给函数的格式字符串不受支持。__FORMAT__)
Working on jupyter notebooks, I came across this problem:在研究Jupyter笔记本电脑时，我遇到了这个问题： TypeError:un
javascript - TypeError : object is not a function - Javascript, ExtJS、Jasmine 和 TypeError:将循环结构转换为 JSON
我对此很陌生(对于 Jasmine 测试、ExtJs 和 JS 来说确实很陌生)，我必须修复这个错误/错误。我正在运行一些单元测试，但不断收到以下错误: TypeError: object is no
TypeError: run_simple() got an unexpected keyword argument 'jupyter_mode'(TypeError：Run_Simple()获得意外的关键字参数‘jupyter_mode’)
在下面的文档中，我们可以不使用JupyterDash在笔记本中运行应用程序，而只需运行app.run(jupyter_mode=“外部”)。。Https://dash.plotly.com/dash-
angular - ionic 错误地理定位 ionic 未捕获( promise ): TypeError: Object(…) is not a function TypeError: Object(…) is not a function
导入地理位置时: import { Geolocation } from '@ionic-native/geolocation/ngx'; 获取错误: ionic Geolocation :Ionic
python - TypeError: __getitem__() takes exactly 2 arguments (2 given) TypeError? ( python 3)
我定义了以下函数: def eigval(matrix): a = matrix[0, 0] b = matrix[0, 1] c = matrix[1, 0] d =
Diffusers SDXL "TypeError: argument of type 'NoneType' is not iterable"(Differs SDXL“TypeError：‘NoneType’类型的参数不可迭代”)
刚刚获得了SDXL模型的访问权限，希望为即将发布的版本进行测试...不幸的是，我们当前用于我们服务的代码似乎不能与稳定ai/稳定-扩散-xl-base-0.9一起工作，我不完全确定SDXL有什么不同，
ERROR: TypeError: Cannot read properties of undefined (reading 'username')(错误：TypeError：无法读取未定义的属性(正在读取‘UserName’))
这是我的全部代码。我试图通过/insta/：id在我的page.ejs页面上查找，但它显示错误：。无法读取未定义的属性(正在读取‘UserName’)。。我希望获得uuidv4()将提供的id，但它返

首页

博学

6Ren·AI

商城

python - 我该如何解决; "TypeError: ' WikipediaItem'对象不支持项目分配”