gpt4 book ai didi

Python - 根据列表字典中的出现对列表中的项目进行分类

转载 作者:太空宇宙 更新时间:2023-11-03 13:01:33 25 4
gpt4 key购买 nike

我有一个这样的数据集(简化):

foods_dict = {}
foods_dict['fruit'] = ['apple', 'orange', 'plum']
foods_dict['veg'] = ['cabbage', 'potato', 'carrot']

我有一个要分类的项目列表:

items = ['orange', 'potato', 'cabbage', 'plum', 'farmer', 'egg']

我希望能够根据它们在 foods_dict 中的出现将 items 列表中的项目分配到更小的列表中。我认为这些子列表实际上应该是 sets,因为我不希望其中有任何重复项。

我的第一遍代码是这样的:

fruits = set()
veggies = set()
others = set()
for item in items:
if item in foods_dict.get('fruit'):
fruits.add(item)
elif item in foods_dict.get('veg'):
veggies.add(item)
else:
others.add(item)

但这对我来说似乎真的很低效而且不必要地冗长。我的问题是,如何改进这段代码?我猜列表推导式在这里可能会有用,但我不确定列表的数量。

最佳答案

为了获得高效的解决方案,您希望尽可能避免显式循环:

items = set(items)
fruits = set(foods_dict['fruit']) & items
veggies = set(foods_dict['veg']) & items
others = items - fruits - veggies

这几乎肯定会比使用显式循环更快。特别是如果水果列表很长,则执行 item in foods_dict['fruit'] 非常耗时。


目前解决方案之间的非常简单基准:

In [5]: %%timeit
...: items2 = set(items)
...: fruits = set(foods_dict['fruit']) & items2
...: veggies = set(foods_dict['veg']) & items2
...: others = items2 - fruits - veggies
...:
1000000 loops, best of 3: 1.75 us per loop

In [6]: %%timeit
...: fruits = set()
...: veggies = set()
...: others = set()
...: for item in items:
...: if item in foods_dict.get('fruit'):
...: fruits.add(item)
...: elif item in foods_dict.get('veg'):
...: veggies.add(item)
...: else:
...: others.add(item)
...:
100000 loops, best of 3: 2.57 us per loop

In [7]: %%timeit
...: veggies = set(elem for elem in items if elem in foods_dict['veg'])
...: fruits = set(elem for elem in items if elem in foods_dict['fruit'])
...: others = set(items) - veggies - fruits
...:
100000 loops, best of 3: 3.34 us per loop

当然,在选择之前你应该用“真实输入”做一些测试。我不知道你的问题中有多少元素,而且随着输入的增加,时间可能会发生很大变化。无论如何,我的经验告诉我,至少在 CPython 中,显式循环往往比仅使用内置操作要慢。


Edit2:输入更大的示例:

In [9]: foods_dict = {}
...: foods_dict['fruit'] = list(range(0, 10000, 2))
...: foods_dict['veg'] = list(range(1, 10000, 2))

In [10]: items = list(range(5, 10000, 13)) #some odd some even

In [11]: %%timeit
...: fruits = set()
...: veggies = set()
...: others = set()
...: for item in items:
...: if item in foods_dict.get('fruit'):
...: fruits.add(item)
...: elif item in foods_dict.get('veg'):
...: veggies.add(item)
...: else:
...: others.add(item)
...:
10 loops, best of 3: 68.8 ms per loop

In [12]: %%timeit
...: veggies = set(elem for elem in items if elem in foods_dict['veg'])
...: fruits = set(elem for elem in items if elem in foods_dict['fruit'])
...: others = set(items) - veggies - fruits
...:
10 loops, best of 3: 99.9 ms per loop

In [13]: %%timeit
...: items2 = set(items)
...: fruits = set(foods_dict['fruit']) & items2
...: veggies = set(foods_dict['veg']) & items2
...: others = items2 - fruits - veggies
...:
1000 loops, best of 3: 445 us per loop

如您所见,仅使用内置函数比显式循环快约 20 倍。

关于Python - 根据列表字典中的出现对列表中的项目进行分类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19166170/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com