python - 如何摆脱Exceptions.TypeError错误？-6ren

python - 如何摆脱Exceptions.TypeError错误？

转载作者：行者123 更新时间：2023-11-30 23:17:40

25

4

我正在使用 Scrapy 编写一个爬虫。我希望它做的一件事是比较当前网页的根域和其中链接的根域。如果该域不同，则必须继续提取数据。这是我当前的代码:

class MySpider(Spider):
    name = 'smm'
    allowed_domains = ['*']
    start_urls = ['http://en.wikipedia.org/wiki/Social_media']
    def parse(self, response):
        items = []
        for link in response.xpath("//a"):
            #Extract the root domain for the main website from the canonical URL
            hostname1 = link.xpath('/html/head/link[@rel=''canonical'']').extract()
            hostname1 = urlparse(hostname1).hostname
            #Extract the root domain for thelink
            hostname2 = link.xpath('@href').extract()
            hostname2 = urlparse(hostname2).hostname
            #Compare if the root domain of the website and the root domain of the link are different.
            #If so, extract the items & build the dictionary 
            if hostname1 != hostname2:
                item = SocialMediaItem()
                item['SourceTitle'] = link.xpath('/html/head/title').extract()
                item['TargetTitle'] = link.xpath('text()').extract()
                item['link'] = link.xpath('@href').extract()
                items.append(item)
        return items

但是，当我运行它时，我收到此错误:

Traceback (most recent call last):
  File "C:\Anaconda\lib\site-packages\twisted\internet\base.py", line 1201, in mainLoop
    self.runUntilCurrent()
  File "C:\Anaconda\lib\site-packages\twisted\internet\base.py", line 824, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "C:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 382, in callback
    self._startRunCallbacks(result)
  File "C:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 490, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "C:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "E:\Usuarios\Daniel\GitHub\SocialMedia-Web-Scraper\socialmedia\socialmedia\spiders\SocialMedia.py", line 16, in parse
    hostname1 = urlparse(hostname1).hostname
  File "C:\Anaconda\lib\urlparse.py", line 143, in urlparse
    tuple = urlsplit(url, scheme, allow_fragments)
  File "C:\Anaconda\lib\urlparse.py", line 176, in urlsplit
    cached = _parse_cache.get(key, None)
exceptions.TypeError: unhashable type: 'list'

谁能帮我解决这个错误？我认为这与列表键有关，但我不知道如何解决。非常感谢您!

丹尼

最佳答案

这里有一些问题:

无需在循环中计算 hostname1，因为它始终选择相同的 rel 元素，即使在子选择器上使用(由于xpath 表达式的性质，它是绝对的而不是相对的，但这是您需要的方式)。
hostname1 的 xpath 表达式格式错误并且返回 None，因此在尝试仅获取 Kevin 提出的第一个元素时会出现错误。表达式中有两个单引号，而不是一个转义单引号或双引号。
当您应该获取其 @href 属性时，您正在获取 rel 元素本身。应更改 XPath 表达式以反射(reflect)这一点。

解决这些问题后，代码可能如下所示(未经测试):

    def parse(self, response):
        items = []
        hostname1 = response.xpath("/html/head/link[@rel='canonical']/@href").extract()[0]
        hostname1 = urlparse(hostname1).hostname

        for link in response.xpath("//a"):
            hostname2 = (link.xpath('@href').extract() or [''])[0]
            hostname2 = urlparse(hostname2).hostname
            #Compare and extract
            if hostname1 != hostname2:
                ...
        return items

关于python - 如何摆脱Exceptions.TypeError错误？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27231751/

25

4

0

文章推荐： python - 计算 python/pandas 中组的平均值

文章推荐： c# - 从 Web API 2 上的请求和响应构建/映射模型的最佳方法

文章推荐： c# - 使用动态 LINQ 的空表达式

javascript - native 基础 toast - TypeError : TypeError: TypeError: null is not an object (evaluating 'this.toastInstance._root.getModalState' )
我正在使用 React Native 构建移动应用程序。我面临 Nativ Base Toast 问题。当我第一次加载应用程序然后导航到工单状态时，如果我返回带有 android 后退按钮的主页，则会
TypeError: $(...).perfectScrollbar is not a function(TypeError：$(...).Perfect滚动条不是函数)
我正在尝试创建一个“完美的滚动条”，它是这样的：。Https://github.com/noraesae/perfect-scrollbar-bower。使用尽可能简单的代码：。我犯了以下错误：。当然
javascript - Draftjs: TypeError: TypeError: this.getImmutable(...) 未定义
我正在尝试在简单的 Draftjs 编辑器上应用自定义装饰器: import React from 'react'; import {Editor, EditorState, RichUtils} f
TypeError - read csv functionality(TypeError-读取CSV功能)
读取以钟形字符作为分隔符的CSV文件时，出现类型错误。我不想使用熊猫，我需要使用CSV库来解决这个问题。。示例标题：。数据类型。样本数据：。示例代码。我明白这个错误-。铃声字符参考-https://w
reactjs - TypeError : TypeError: (0, _reactRedux.useSelector) 不是函数
我正在处理 useSelector的 react-redux在我的 React Native 应用程序中，我收到以下错误: TypeError: TypeError: (0, _reactRedux.
javascript - Node 子进程生成 "TypeError: Bad argument TypeError"？
当我用 Node 运行以下代码时: var command = "/home/myScript.sh"; fs.exists(command, function(exists){ if(exi
reactjs - TypeError : wrapper. 存在不是函数 && TypeError : wrapper. find 不是函数
我正在为我的一个组件编写测试用例，该组件具有路由器(使用 withrouter)。我收到错误 wrapper.find is not a function。基本要求是需要检查我的渲染中是否存在标签，还
javascript - jquery TypeError : $(. ..).validate 和 TypeError : $(. ..).modal 不是函数
我一直在研究一个简单的表单提交。首先，我想在提交表单之前创建一个模式警报。于是，我使用了bootstrap的modal函数，反复得到 TypeError: $(...).modal is not a
python - is_authenticated() 引发 TypeError TypeError : 'bool' object is not callable
这个问题在这里已经有了答案: Flask-Login raises TypeError: 'bool' object is not callable when trying to override
TypeError: 'ListNode' object has no attribute '__getitem__'(TypeError：‘ListNode’对象没有属性‘__getitem__’)
这是我在leetcode中遇到的问题。您将看到两个非空链接表，表示两个非负整数。数字以相反的顺序存储，并且它们的每个节点都包含一个数字。将这两个数字相加，然后以链表的形式返回总和。。你可以假设这两个数
Why am I seeing "TypeError: string indices must be integers"?(为什么我看到“TypeError：字符串索引必须是整数”？)
我正在尝试学习Python，并试图将GitHub问题变成一种可读的形式。根据关于如何将JSON转换为CSV的建议，我得出了以下结论：。其中“Issues.json”是包含GitHub问题的JSON文件
javascript - 代理类的 TypeError - TypeError : 'set' on proxy: trap returned truish for property
我在使用 Proxy 类时遇到了这个有趣的错误: TypeError: 'set' on proxy: trap returned truish for property 'users' which
TypeError:unsupported format string passed to function .__format__(TypeError：传递给函数的格式字符串不受支持。__FORMAT__)
在研究Jupyter笔记本电脑时，我遇到了这个问题：。这是代码开始的地方：。下面的代码是在jupyter笔记本的另一个单元上运行的。我怎么才能解决它呢？。尝试更改参数和一系列其他内容，但所有这些都弹出
TypeError:unsupported format string passed to function .__format__(TypeError：传递给函数的格式字符串不受支持。__FORMAT__)
Working on jupyter notebooks, I came across this problem:在研究Jupyter笔记本电脑时，我遇到了这个问题： TypeError:un
javascript - TypeError : object is not a function - Javascript, ExtJS、Jasmine 和 TypeError:将循环结构转换为 JSON
我对此很陌生(对于 Jasmine 测试、ExtJs 和 JS 来说确实很陌生)，我必须修复这个错误/错误。我正在运行一些单元测试，但不断收到以下错误: TypeError: object is no
TypeError: run_simple() got an unexpected keyword argument 'jupyter_mode'(TypeError：Run_Simple()获得意外的关键字参数‘jupyter_mode’)
在下面的文档中，我们可以不使用JupyterDash在笔记本中运行应用程序，而只需运行app.run(jupyter_mode=“外部”)。。Https://dash.plotly.com/dash-
angular - ionic 错误地理定位 ionic 未捕获( promise ): TypeError: Object(…) is not a function TypeError: Object(…) is not a function
导入地理位置时: import { Geolocation } from '@ionic-native/geolocation/ngx'; 获取错误: ionic Geolocation :Ionic
python - TypeError: __getitem__() takes exactly 2 arguments (2 given) TypeError? ( python 3)
我定义了以下函数: def eigval(matrix): a = matrix[0, 0] b = matrix[0, 1] c = matrix[1, 0] d =
Diffusers SDXL "TypeError: argument of type 'NoneType' is not iterable"(Differs SDXL“TypeError：‘NoneType’类型的参数不可迭代”)
刚刚获得了SDXL模型的访问权限，希望为即将发布的版本进行测试...不幸的是，我们当前用于我们服务的代码似乎不能与稳定ai/稳定-扩散-xl-base-0.9一起工作，我不完全确定SDXL有什么不同，
ERROR: TypeError: Cannot read properties of undefined (reading 'username')(错误：TypeError：无法读取未定义的属性(正在读取‘UserName’))
这是我的全部代码。我试图通过/insta/：id在我的page.ejs页面上查找，但它显示错误：。无法读取未定义的属性(正在读取‘UserName’)。。我希望获得uuidv4()将提供的id，但它返

首页

博学

6Ren·AI

商城

python - 如何摆脱Exceptions.TypeError错误？