gpt4 book ai didi

python - Scrapy - 激活项目管道组件 - ITEM_PIPELINES 设置

转载 作者:行者123 更新时间:2023-11-28 22:44:09 25 4
gpt4 key购买 nike

在 scrapy 文档中有这样的信息:

Activating an Item Pipeline component

To activate an Item Pipeline component you must add its class to the ITEM_PIPELINES setting, like in the following example:

ITEM_PIPELINES = { 'myproject.pipelines.PricePipeline': 300, 'myproject.pipelines.JsonWriterPipeline': 800, }

The integer values you assign to classes in this setting determine the order they run in- items go through pipelines from order number low to high. It’s customary to define these numbers in the 0-1000 range.

最后一段没看懂,主要是“确定 他们运行的顺序-项目从订单号低到 high”,你能换句话解释一下吗?选择数字是因为什么?范围是0-1000如何选择值?

最佳答案

因为 Python 中的字典是一个无序集合ITEM_PIPELINES 必须是一个字典(就像许多其他设置一样,例如 SPIDER_MIDDLEWARES ),您需要以某种方式定义应用管道的顺序。这就是为什么您需要为您定义的每个管道分配一个 0 到 1000 之间的数字。

仅供引用,如果你查看 Scrapy 源代码,你会发现 build_component_list()为每个设置调用的函数,如 ITEM_PIPELINES - 它使用字典值进行排序,从您在 ITEM_PIPELINES 中定义的字典中生成一个列表(有序集合):

def build_component_list(base, custom):
"""Compose a component list based on a custom and base dict of components
(typically middlewares or extensions), unless custom is already a list, in
which case it's returned.
"""
if isinstance(custom, (list, tuple)):
return custom
compdict = base.copy()
compdict.update(custom)
items = (x for x in six.iteritems(compdict) if x[1] is not None)
return [x[0] for x in sorted(items, key=itemgetter(1))]

关于python - Scrapy - 激活项目管道组件 - ITEM_PIPELINES 设置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29892547/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com