gpt4 book ai didi

python - 如何有效地检查一个元素是否在Python中的列表列表中

转载 作者:行者123 更新时间:2023-11-28 21:31:08 24 4
gpt4 key购买 nike

我有一个列表,如下所示。

mylist = [[5274919, ["report", "porcelain", "firing", "technic"]], [5274920, ["implantology", "dentistry"]], [52749, ["method", "recognition", "long", "standing", "root", "perforation", "molar"]], [5274923, ["exogenic", "endogenic", "cause", "tooth", "jaw", "anomaly", "method", "method", "standing"]]]

我还有如下概念列表。

myconcepts = ["method", "standing"]

我想查看 myconcepts 中的每个概念在 mylist 记录中出现了多少次。即;

"method" = 2 times in records (i.e. in `52749` and `5274923`)
"standing" = 2 times in records

我当前的代码如下。

mycounting = 0
for concept in myconcepts:
for item in mylist:
if concept in item[1]:
mycounting = mycounting + 1
print(mycounting)

但是,我当前的 mylist 非常非常大,大约有 500 万条记录。 myconcepts 列表包含大约 10000 个概念。

在我当前的代码中,一个概念需要近 1 分钟才能获得计数,这非常慢。

我想知道在 python 中执行此操作的最有效方法?

出于测试目的,我已将数据集的一小部分附加到:https://drive.google.com/file/d/1z6FsBtLyDZClod9hK8nK4syivZToa7ps/view?usp=sharing

如果需要,我很乐意提供更多详细信息。

最佳答案

改编自 https://www.geeksforgeeks.org/python-count-the-sublists-containing-given-element-in-a-list/ 的方法 3

from itertools import chain 
from collections import Counter

mylist = [[5274919, ["report", "porcelain", "firing", "technic"]], [5274920, ["implantology", "dentistry"]], [52749, ["method", "recognition", "long", "standing", "root", "perforation", "molar"]], [5274923, ["exogenic", "endogenic", "cause", "tooth", "jaw", "anomaly", "method", "method", "standing"]]]

myconcepts = ["method", "standing"]

def countList(lst, x):
" Counts number of times item x appears in sublists "
return Counter(chain.from_iterable(set(i[1]) for i in lst))[x]

# Use dictionary comprehension to apply countList to concept list
result = {x:countList(mylist, x) for x in myconcepts}
print(result) # {'method':2, 'standing':2}

*修订了当前方法(仅计算一次)*

def count_occurences(lst):
" Number of counts of each item in all sublists "
return Counter(chain.from_iterable(set(i[1]) for i in lst))

cnts = count_occurences(mylist)
result = {x:cnts[x] for x in myconcepts}
print(result) # {'method':2, 'standing':2}

性能(使用 Jupyter Notebook 比较发布的方法)

结果显示此方法和 Barmar 发布的方法很接近(即 36 vs 42 us)

对当前方法的改进将时间大约减少了一半(即从 36 us 减少到 19 us)。对于更多数量的概念(即问题有 > 1000 个概念),这种改进应该更加显着。

但是,原始方法速度更快,为 2.55 us/loop。

方法当前方法

%timeit { x:countList(mylist, x) for x in myconcepts}
#10000 loops, best of 3: 36.6 µs per loop

Revised current method:

%%timeit
cnts = count_occurences(mylist)
result = {x:cnts[x] for x in myconcepts}
10000 loops, best of 3: 19.4 µs per loop

方法 2(来自 Barmar 帖子)

%%timeit
r = collections.Counter(flatten(mylist))
{i:r.get(i, 0) for i in myconcepts}
# 10000 loops, best of 3: 42.7 µs per loop

方法3(原始方法)

%%timeit

result = {}
for concept in myconcepts:
mycounting = 0
for item in mylist:
if concept in item[1]:
mycounting = mycounting + 1
result[concept] = mycounting
# 100000 loops, best of 3: 2.55 µs per loop

关于python - 如何有效地检查一个元素是否在Python中的列表列表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58827430/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com