gpt4 book ai didi

python - Scrapy Spider 不遵循请求回调

转载 作者:太空宇宙 更新时间:2023-11-03 17:54:57 26 4
gpt4 key购买 nike

我已阅读Scrapy: Follow link to get additional Item data?并遵循它,但它不起作用,可能这是一个简单的错误,所以我把我的蜘蛛的源代码。

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector

class MySpider1(Spider):
name = "timeanddate"
allowed_domains = ["http://www.timeanddate.com"]
start_urls = (
'http://www.timeanddate.com/holidays/',
)

def parse(self, response):
countries = Selector(response).xpath('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]')

for item in countries:

link = item.xpath('@href').extract()[0]
country = item.xpath('text()').extract()[0]

linkToFollow = self.allowed_domains[0] + link + "/#!hol=1"

print link # link
print country # text in a HTML tag
print linkToFollow

request = scrapy.Request(linkToFollow, callback=self.parse_page2)


def parse_page2(self, response):
print "XXXXXX"
hxs = HtmlXPathSelector(response)

print hxs

我也尝试获取每个国家/地区的所有假期列表,这就是我需要访问另一个页面的内容。

我不明白为什么不调用parse_page2。

最佳答案

我可以使用 Link Extractors 让你的示例正常工作

这是一个例子:

#-*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.lxmlhtml import LxmlLinkExtractor

class TimeAndDateSpider(CrawlSpider):
name = "timeanddate"
allowed_domains = ["timeanddate.com"]
start_urls = [
"http://www.timeanddate.com/holidays/",
]


rules = (
Rule (LxmlLinkExtractor(restrict_xpaths=('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]',))
, callback='second_page'),
)

#2nd page
def second_page(self,response):
print "second page - %s" % response.url

将继续尝试使请求回调示例正常工作

关于python - Scrapy Spider 不遵循请求回调,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28614496/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com