gpt4 book ai didi

Scrapy HtmlXPathSelector

转载 作者:行者123 更新时间:2023-12-04 16:35:17 25 4
gpt4 key购买 nike

只是尝试scrapy并尝试让一个基本的蜘蛛工作。我知道这可能是我缺少的东西,但我已经尝试了我能想到的一切。

我得到的错误是:

line 11, in JustASpider
sites = hxs.select('//title/text()')
NameError: name 'hxs' is not defined

我的代码目前非常基本,但我似乎仍然无法找到我出错的地方。谢谢你的帮助!
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

class JustASpider(BaseSpider):
name = "google.com"
start_urls = ["http://www.google.com/search?hl=en&q=search"]


def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//title/text()')
for site in sites:
print site.extract()


SPIDER = JustASpider()

最佳答案

代码看起来很旧。我建议改用这些代码

from scrapy.spider import Spider
from scrapy.selector import Selector

class JustASpider(Spider):
name = "googlespider"
allowed_domains=["google.com"]
start_urls = ["http://www.google.com/search?hl=en&q=search"]


def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//title/text()').extract()
print sites
#for site in sites: (I dont know why you want to loop for extracting the text in the title element)
#print site.extract()

希望它有所帮助和 here是一个很好的榜样。

关于Scrapy HtmlXPathSelector,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12254740/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com