gpt4 book ai didi

shell - Scrapy:在 shell 中使用xpath检索到的数据,但未在项中检索

转载 作者:行者123 更新时间:2023-12-03 17:36:40 24 4
gpt4 key购买 nike

我正在使用scrapy构建一个简单的网络刮板,以从BBC网站获得足球队的结果。页面(http://www.bbc.com/sport/football/teams/bolton-wanderers/results)中的相关HTML是这样的:

<tr class="report" id="match-row-EFBO755964">
<td class="statistics show" title="Show latest match stats">
<button>Show</button>
</td>
<td class="match-competition"> Championship </td>
<td class="match-details teams">
<p>
<span class="team-home teams"> <a href="/sport/football/teams/huddersfield-town">Huddersfield</a> </span>
<span class="score"> <abbr title="Score"> 2-1 </abbr> </span>
<span class="team-away teams"> <a href="/sport/football/teams/bolton-wanderers">Bolton</a> </span>
</p>
</td>
<td class="match-date"> Sun 28 Dec </td>
<td class="time"> Full time </td>
<td class="status"> <a class="report" href="/sport/football/30566395">Report</a>
</td>
</tr>


当我尝试使用scrapy shell进行抓取时,输出如下:

$ scrapy shell http://www.bbc.com/sport/football/teams/bolton-wanderers/results

>>> response.selector.xpath('//tr[@class="report"]/td[@class="match-date"]/text()').extract()
[u' Sun 28 Dec ', u' Fri 26 Dec ', u' Fri 19 Dec ', u' Sat 13 Dec ',...]


但是,当我在Spider中使用相同的xpath时,无法获得这些日期。
这是项目:

class resultsItem(scrapy.Item):
date = scrapy.Field()
homeTeam = scrapy.Field()
score = scrapy.Field()
awayTeam = scrapy.Field()


这是蜘蛛:

class resultsSpider(scrapy.Spider):
name = "results"
allowed_domains = ["bbc.com"]
start_urls = ["http://www.bbc.com/sport/football/teams/bolton-wanderers/results"]

def parse(self, response):
for sel in response.xpath('//tr[@class="report"]'):
game = resultsItem()
game['homeTeam'] = sel.xpath('td[@class="match-details teams"]/p/span[@class="team-home teams"]/a/text()').extract()
game['score'] = sel.xpath('td[@class="match-details teams"]/p/span[@class="score"]/abbr/text()').extract()
game['awayTeam'] = sel.xpath('td[@class="match-details teams"]/p/span[@class="team-away teams"]/a/text()').extract()
game['date'] = response.xpath('td[@class="match-date"]/text()').extract()

yield game


最后,输出的JSON:

[{"date": [], "awayTeam": ["Bolton"], "homeTeam": ["Huddersfield"], "score": [" 2-1 "]},
{"date": [], "awayTeam": ["Blackburn"], "homeTeam": ["Bolton"], "score": [" 2-1 "]},...


即使在Shell中使用相同的xpath,为什么我也无法获得日期?

最佳答案

不是吗

game['date'] = sel.xpath('td[@class="match-date"]/text()').extract()


代替

game['date'] = response.xpath('td[@class="match-date"]/text()').extract()


就像你在这个循环中一样

for sel in response.xpath('//tr[@class="report"]'):

关于shell - Scrapy:在 shell 中使用xpath检索到的数据,但未在项中检索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27707366/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com