gpt4 book ai didi

python - 如何设置 scrapy 蜘蛛 ReturnsContract 的上限

转载 作者:行者123 更新时间:2023-12-01 08:44:01 26 4
gpt4 key购买 nike

我想限制在每个页面中找到的项目数量。

我找到了this documentation这似乎符合我的需要:

class scrapy.contracts.default.ReturnsContract

This contract (@returns) sets lower and upper bounds for the items and
requests returned by the spider. The upper bound is optional:

@returns item(s)|request(s) [min [max]]

但是我不明白如何使用这个类。在我的蜘蛛中,我尝试添加

ReturnsContract.__setattr__("max",10)

但是没有成功。我错过了什么吗?

最佳答案

Spider Contracts用于测试目的,而不是控制您的数据提取逻辑。

Testing spiders can get particularly annoying and while nothing prevents you from writing unit tests the task gets cumbersome quickly. Scrapy offers an integrated way of testing your spiders by the means of contracts.

This allows you to test each callback of your spider by hardcoding a sample url and check various constraints for how the callback processes the response. Each contract is prefixed with an @ and included in the docstring.

为了您的目的,您可以简单地在提取逻辑中设置上限,例如:

response.xpath('//my/xpath').extract()[:10]

关于python - 如何设置 scrapy 蜘蛛 ReturnsContract 的上限,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53380064/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com