gpt4 book ai didi

Scrapy:提取评论(隐藏)内容

转载 作者:行者123 更新时间:2023-12-01 23:23:44 25 4
gpt4 key购买 nike

如何从带有scrapy的注释标签中提取内容?

例如,如何在以下示例中提取“黄色”:

<div class="fruit">
<div class="infos">
<h2 class="Name">Banana</h2>
<span class="edible">Edible: Yes</span>
</div>
<!--
<p class="color">Yellow</p>
-->
</div>

最佳答案

您可以使用 XPath 表达式,如 //comment()获取评论内容,然后在剥离评论标签后解析该内容。

示例scrapy shell session :

paul@wheezy:~$ scrapy shell 
...
In [1]: doc = """<div class="fruit">
...: <div class="infos">
...: <h2 class="Name">Banana</h2>
...: <span class="edible">Edible: Yes</span>
...: </div>
...: <!--
...: <p class="color">Yellow</p>
...: -->
...: </div>"""

In [2]: from scrapy.selector import Selector

In [4]: selector = Selector(text=doc, type="html")

In [5]: import re

In [6]: regex = re.compile(r'<!--(.*)-->', re.DOTALL)

In [7]: selector.xpath('//comment()').re(regex)
Out[7]: [u'\n <p class="color">Yellow</p>\n ']

In [8]: comment = selector.xpath('//comment()').re(regex)[0]

In [9]: commentsel = Selector(text=comment, type="html")

In [10]: commentsel.css('p.color')
Out[10]: [<Selector xpath=u"descendant-or-self::p[@class and contains(concat(' ', normalize-space(@class), ' '), ' color ')]" data=u'<p class="color">Yellow</p>'>]

In [11]: commentsel.css('p.color').extract()
Out[11]: [u'<p class="color">Yellow</p>']

In [12]: commentsel.css('p.color::text').extract()
Out[12]: [u'Yellow']

关于Scrapy:提取评论(隐藏)内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21830812/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com