Scrapy HtmlXPathSelector-6ren

Scrapy HtmlXPathSelector

转载作者：行者123 更新时间：2023-12-04 16:35:17

25

4

只是尝试scrapy并尝试让一个基本的蜘蛛工作。我知道这可能是我缺少的东西，但我已经尝试了我能想到的一切。

我得到的错误是:

line 11, in JustASpider
    sites = hxs.select('//title/text()')
NameError: name 'hxs' is not defined

我的代码目前非常基本，但我似乎仍然无法找到我出错的地方。谢谢你的帮助!

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

class JustASpider(BaseSpider):
    name = "google.com"
    start_urls = ["http://www.google.com/search?hl=en&q=search"]


    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//title/text()')
        for site in sites:
            print site.extract()


SPIDER = JustASpider()

最佳答案

代码看起来很旧。我建议改用这些代码

from scrapy.spider import Spider
from scrapy.selector import Selector

class JustASpider(Spider):
    name = "googlespider"
    allowed_domains=["google.com"]
    start_urls = ["http://www.google.com/search?hl=en&q=search"]


    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//title/text()').extract()
        print sites
        #for site in sites: (I dont know why you want to loop for extracting the text in the title element)
            #print site.extract()

希望它有所帮助和 here是一个很好的榜样。

关于Scrapy HtmlXPathSelector，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12254740/

25

4

0

文章推荐： performance - File::Slurp 更快地将文件写入 perl

文章推荐： jsp - EL 通过整数键获取 HashMap 的值

文章推荐： haskell - 如何在我的计算机上使用 Haskell？

文章推荐： regex - Vim 正则表达式不匹配字符类中的空格

Scrapy HtmlXPathSelector
只是尝试scrapy并尝试让一个基本的蜘蛛工作。我知道这可能是我缺少的东西，但我已经尝试了我能想到的一切。我得到的错误是: line 11, in JustASpider sites = h
从字符串中抓取 HtmlXPathSelector
很难说出这里问的是什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或言辞激烈，无法以目前的形式合理回答。如需帮助澄清此问题以便可以重新打开，visit the help center . 9年前关闭
python - 使用 Xpath (HtmlXPathSelector) 获取父文本和子文本
我正在抓取一个网站，我需要从这个 HTML 文档中获取数值: 1.950 3.400 我需要同时提取 1.950 和 3.400，但当一个值仅在 a 中但另一个值也有跨度时，我不知道该怎么做。
python - 如何将 Selenium html 页面传递给 htmlXpathSelector
我需要抓取一个使用 javascript 的页面。这就是我使用 Selenium 的原因。问题是 selenium 无法获取所需的数据。我想使用 htmlXmlSelector 来尝试获取数据。如
python - 如何使用 HtmlXPathSelector (Scrapy) 以 HTML 形式返回结果
如何检索标记内包含的所有 HTML？ hxs = HtmlXPathSelector(response) element = hxs.select('//span[@class="title"]/')

首页

博学

6Ren·AI

商城

Scrapy HtmlXPathSelector