gpt4 book ai didi

python - Scrapy 蜘蛛没有抓取正确的 div

转载 作者:行者123 更新时间:2023-12-01 02:04:49 24 4
gpt4 key购买 nike

import scrapy
class rottenTomatoesSpider(scrapy.Spider):
name = "movieList"
start_urls = [
'https://www.rottentomatoes.com/'
]

def parse(self, response):
for movieList in response.xpath('//div[@id="homepage-opening-this-week"]'):
yield {
'score': response.css('td.left_col').extract_first(),
'title': response.css('td.middle_col').extract_first(),
'openingDate': response.css('td.right_col right').extract_first()
}

所以蜘蛛正在抓取 <div id='homepage-tv-top'>

我假设它是 homepage-这使脚本变得困惑。有人知道解决方法吗?

最佳答案

您需要迭代每个tr,并且在for循环中使用movieList而不是response

for movieList in response.xpath('//div[@id="homepage-opening-this-week"]//tr'):
yield {
'score': "".join(a for a in movieList.css('td.left_col *::text').extract()),
'title': "".join(a for a in movieList.css('td.middle_col *::text').extract()),
'openingDate': "".join(a for a in movieList.css('td.right_col *::text').extract())
}

关于python - Scrapy 蜘蛛没有抓取正确的 div,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49188298/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com