gpt4 book ai didi

python - 将列表转换为字典时的速度问题

转载 作者:太空宇宙 更新时间:2023-11-03 11:30:19 24 4
gpt4 key购买 nike

我在将列表转换为字典时遇到一些速度问题,其中以下操作占用了大约 90% 的总运行时间:

def list2dict(list_):
return_dict = {}

for idx, word in enumerate(list_):
if word in return_dict:
raise ValueError("duplicate string found in list: %s" % (word))
return_dict[word] = idx

return return_dict

我很难看清到底是什么导致了这种情况。您是否在代码中看到任何明显的瓶颈,或者关于如何加快速度的建议?

谢谢。

最佳答案

编辑:

我想我会把它放在最上面,因为它更大——事实证明,对 OP 代码的一个小调整可以显着提高性能。

def list2dict(list_):    # OLD
return_dict = {}
for idx, word in enumerate(list_):
if word in return_dict: # this compare is happening every iteration!
raise ValueError("duplicate string found in list: %s" % (word))
return_dict[word] = idx
return return_dict

def list2dictNEW(list_): #NEW HOTNESS
return_dict = {}
for idx, word in enumerate(list_):
return_dict[word] = idx # overwrite if you want to, because...
if len(return_dict) == len(list_): return return_dict
# if the lengths aren't the same, something got overwritten so we
# won't return. If they ARE the same, toss it back with only one
# compare (rather than n compares in the original
else: raise ValueError("There were duplicates in list {}".format(list_))

DEMO:
>>> timeit(lambda: list2dictNEW(TEST))
1.9117132451798682
>>> timeit(lambda: list2dict(TEST)):
2.2543816669587216
# gains of a third of a second per million iterations!
# that's a 15.2% speed bost

没有明显的答案,但您可以尝试类似的方法:

def list2dict(list_):
return_dict = dict()
for idx,word in enumerate(list_):
return_dict.setdefault(word,idx)
return return_dict

您也可以构建一个集合并执行 list.index 因为您说列表相当小,但我猜这会更慢而不是更快。这需要分析才能确定(使用 timeit.timeit)

def list2dict(list_):
set_ = set(list_)
return {word:list_.index(word) for word in set_}

我冒昧地在一组测试数据上运行了一些配置文件。以下是结果:

TEST = ['a','b','c','d','e','f','g','h','i','j'] # 10 items

def list2dictA(list_): # build set and index word
set_ = set(list_)
return {word:list_.index(word) for word in set_}

def list2dictB(list_): # setdefault over enumerate(list)
return_dict = dict()
for idx,word in enumerate(list_):
return_dict.setdefault(word,idx)
return return_dict

def list2dictC(list_): # dict comp over enumerate(list)
return_dict = {word:idx for idx,word in enumerate(list_)}
if len(return_dict) == len(list_):
return return_dict
else:
raise ValueError("Duplicate string found in list")

def list2dictD(list_): # Original example from Question
return_dict = {}
for idx, word in enumerate(list_):
if word in return_dict:
raise ValueError("duplicate string found in list: %s" % (word))
return_dict[word] = idx
return return_dict

>>> timeit(lambda: list2dictA(TEST))
5.336584700190931
>>> timeit(lambda: list2dictB(TEST))
2.7587691306531
>>> timeit(lambda: list2dictC(TEST))
2.1609074989233292
>>> timeit(lambda: list2dictD(TEST))
2.2543816669587216

关于python - 将列表转换为字典时的速度问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22053558/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com