gpt4 book ai didi

python - Scrapy 荣誉 rel=nofollow

转载 作者:太空宇宙 更新时间:2023-11-04 09:05:11 28 4
gpt4 key购买 nike

scrapy 可以忽略 rel="nofollow" 链接吗?看着 sgml.pyscrapy 0.22 中看起来是这样的:

如何启用它?

最佳答案

Paul 说对了,我就是这样做的:

rules = (
# Extract all pages, follow links, call method 'parse_page' for response callback, before processing links call method links_processor
Rule(LinkExtractor(allow=('','/')),follow=True,callback='parse_page',process_links='links_processor'),

这是实际的功能(我是 python 的新手,我确信有一种更好的方法可以在不创建新列表的情况下从 for 循环中删除项目

def links_processor(self,links): 
# A hook into the links processing from an existing page, done in order to not follow "nofollow" links
ret_links = list()
if links:
for link in links:
if not link.nofollow: ret_links.append(link)
return ret_links

很简单。

关于python - Scrapy 荣誉 rel=nofollow,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21392222/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com