gpt4 book ai didi

python - 从列表中删除与子字符串匹配的项目的最快方法 - Python

转载 作者:太空宇宙 更新时间:2023-11-03 15:03:05 26 4
gpt4 key购买 nike

删除列表中与集合中的子字符串匹配的项目的最快方法是什么?

例如,

the_list =
['Donald John Trump (born June 14, 1946) is an American businessman, television personality',
'and since June 2015, a candidate for the Republican nomination for President of the United States in the 2016 election.',
'He is the chairman and president of The Trump Organization and the founder of Trump Entertainment Resorts.',
'Trumps career',
'branding efforts',
'personal life',
'and outspoken manner have made him a celebrity.',
'Trump is a native of New York City and a son of Fred Trump, who inspired him to enter real estate development.',
'While still attending college he worked for his fathers firm',
'Elizabeth Trump & Son. Upon graduating in 1968 he joined the company',
'and in 1971 was given control, renaming the company The Trump Organization.',
'Since then he has built hotels',
'casinos',
'golf courses',
'and other properties',
'many of which bear his name. He is a major figure in the American business scene and has received prominent media exposure']

这个列表实际上比这个长很多(数百万个字符串元素),我想删除集合中包含字符串的所有元素,例如,

{"Donald Trump", "Trump Organization","Donald J. Trump", "D.J. Trump", "dump", "dd"} 

最快的方法是什么?循环是最快的吗?

最佳答案

Aho-Corasick algorithm专为完成这项任务而设计。它具有比嵌套循环 O(n*m) 低得多的时间复杂度 O(n+m) 的明显优势,其中 n 是要查找的字符串数,m 是要搜索的字符串数。

有个好Python implementation of Aho-Corasick附有解释。 Python Package Index 中还有一些实现但我没看过。

关于python - 从列表中删除与子字符串匹配的项目的最快方法 - Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35738965/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com