gpt4 book ai didi

python - 使用python scrapy从网页中提取链接

转载 作者:行者123 更新时间:2023-12-01 04:47:29 24 4
gpt4 key购买 nike

我是Python初学者,使用scrapy从以下网页中提取链接 http://www.basketball-reference.com/leagues/NBA_2015_games.html .

我写的代码是

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from basketball.items import BasketballItem

class BasketballSpider(CrawlSpider):

name = 'basketball'
allowed_domains = ['basketball-reference.com/']
start_urls = ['http://www.basketball-reference.com/leagues/NBA_2015_games.html']
rules = [Rule(LinkExtractor(allow=['http://www.basketball-reference.com/boxscores/^\w+$']), 'parse_item')]

def parse_item(self, response):
item = BasketballItem()
item['url'] = response.url
return item

我通过命令提示符运行此代码,但创建的文件没有任何链接。有人可以帮忙吗?

最佳答案

找不到链接,请修复规则中的正则表达式:

rules = [
Rule(LinkExtractor(allow='boxscores/\w+'))
]

此外,当调用 parse_item 时,您不必设置 callback - 这是默认设置。

并且allow也可以设置为字符串。

关于python - 使用python scrapy从网页中提取链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29118470/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com