gpt4 book ai didi

python - 如何有效地检查在python中有概念的第一个列表

转载 作者:行者123 更新时间:2023-12-04 10:59:48 24 4
gpt4 key购买 nike

我有 5 个列表如下。

list1 = [[111, ["food", "fruits", "vegetables"]], [112, ["mango", "apples", "grapes", "pears", "passion fruit"]]]
list2 = [[110, ["transport", "car", "van", "bus", "jeep"]], [109, ["trams", "trains", "passenger", "driver"]], [108, ["traffic", "lights"]]]
list3 = [[111, ["book", "letters", "library", "reading"]], [112, ["education", "jobs", "companies", "salary"]]]
list4 = [[111, ["food", "curry", "spices", "rice", "fruits", "vegetables"]], [112, ["fruits", "vegetables", "farms", "farmers"]]]
list5 = [[111, ["food", "industry", "delivery"]], [112, ["fresh", "curry", "food", "pears", "passion fruit"]]]

我也有一个概念 list 。
myconcepts = ["fruits", "curry"]

我想找到第一个包含 myconcepts 中的概念的列表列表。 IE。
"fruits" -> list1
"curry" -> list4

我目前正在使用以下代码来执行此操作
mylists = [list1, list2, list3, list4, list5]
for concept in myconcepts:
initial_list = ""
counting = 1

for mylist in mylists:
for item in mylist:
if concept in item[1]:
initial_year = str(counting)
break

if len(initial_year) > 0:
break
else:
counting = counting + 1
print(counting)

这适用于小型数据集。但是,我有一个包含近 25 个列表的庞大数据集,每个列表都有近 500 万条记录。我的概念列表大约有 15000 个。因此,我的代码需要很多时间来运行。我想知道在 python 中是否有更有效的方法?

如果需要,我很乐意提供更多详细信息。

最佳答案

这是使用 set 的方法,这将加速使用 in 的值的查找,与在 list 中查找相比.

list1 = [[111, ["food", "fruits", "vegetables"]], [112, ["mango", "apples", "grapes", "pears", "passion fruit"]]]
list2 = [[110, ["transport", "car", "van", "bus", "jeep"]], [109, ["trams", "trains", "passenger", "driver"]], [108, ["traffic", "lights"]]]
list3 = [[111, ["book", "letters", "library", "reading"]], [112, ["education", "jobs", "companies", "salary"]]]
list4 = [[111, ["food", "curry", "spices", "rice", "fruits", "vegetables"]], [112, ["fruits", "vegetables", "farms", "farmers"]]]
list5 = [[111, ["food", "industry", "delivery"]], [112, ["fresh", "curry", "food", "pears", "passion fruit"]]]

myconcepts = ["fruits", "curry"]

# flatten lists and generate frozensets
flatsets = [[frozenset(l[1]) for l in lists] for lists in [list1, list2, list3, list4, list5]]

# a function to retrieve indices for the strings to find
def get_idx(setlist, concept):
for ix_f, fset in enumerate(setlist):
for ix_s, s in enumerate(fset):
if concept in s:
return ix_f
return None

# generate a list holding the index of each concept
ix_concepts = [None for _ in myconcepts]
for ix_c, c in enumerate(myconcepts):
ix_concepts[ix_c] = get_idx(flatsets, c)

# show result
listnames = ['list1', 'list2', 'list3', 'list4', 'list5']
for i, c in enumerate(myconcepts):
print(f"concept '{c}' found first in {listnames[ix_concepts[i]]}")
# concept 'fruits' found first in list1
# concept 'curry' found first in list4

但是,考虑到您的大量数据,15k * 25 * 5M,我认为这不是实际问题的 1:1 解决方案。正如这里已经提到的,需要复杂的数据准备。此外,我认为现在的 O(N²) 搜索算法(忽略展平列表所需的时间等)有望消磨大量时间。

关于python - 如何有效地检查在python中有概念的第一个列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58891155/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com