gpt4 book ai didi

python - 如何确定不带空格的字符串中列表中的匹配项数?

转载 作者:行者123 更新时间:2023-12-04 03:35:14 25 4
gpt4 key购买 nike

我有一个字符串列表

fruits = ["apple", "orange", "grape"]

和一个缺少空格的字符串

str = "ilikeapplesandoranges"

计算 str 中有多少来自 fruits 的单词的有效方法是什么?

对于上面的示例,它将是 2,因为我们有 ilike[apple]sandorangesilikeapplesand[orange]s

对于像 ilikebroccoli 这样的字符串,它将是 0。

我知道您可以执行 str.count(word) 但对列表中的每个单词执行此操作效率不高。

最佳答案

实际上,str.count(word) 的 C 语言速度很难超越。

我会使用 sum(your_string.count(e) for e in your_list) 并完成它。

这里是答案的快速基准:

import time 

def li_count(s, li):
return sum(s.count(e) for e in li)

import re

def li_regex(s,li):
return len(re.findall('|'.join(f'(?=({t}))' for t in li), s))

def li_split(s, li):
return sum(len(s.split(e))-1 for e in li)

def cmpthese(funcs, args=(), cnt=10, rate=True, micro=True, deepcopy=True):
from copy import deepcopy
"""Generate a Perl style function benchmark"""
def pprint_table(table):
"""Perl style table output"""
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)

def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])

col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))

results={}
for i in range(cnt):
for f in funcs:
if args:
local_args=deepcopy(args)
start=time.perf_counter_ns()
f(*local_args)
stop=time.perf_counter_ns()
results.setdefault(f.__name__, []).append(stop-start)
results={k:float(sum(v))/len(v) for k,v in results.items()}
fastest=sorted(results,key=results.get, reverse=True)
table=[['']]
if rate: table[0].append('rate/sec')
if micro: table[0].append('\u03bcsec/pass')
table[0].extend(fastest)
for e in fastest:
tmp=[e]
if rate:
tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))

if micro:
tmp.append('{:,.1f}'.format(results[e]/float(cnt)))

for x in fastest:
if x==e: tmp.append('--')
else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
table.append(tmp)

pprint_table(table)

if __name__=='__main__':
import sys
print(sys.version)

fruits = ["apple", "orange", "grape"]
s = "ilikeapplesandorangesandapples"

cases=(
('small list, small string', 1, 1),
('large list, small string', 2000, 1),
('small list, large string', 1, 2000),
('med list, med string', 500, 500)
)
for txt, x, y in cases:
print(f'\n{txt}:')
args=(s*y,fruits*x)
cmpthese([li_count,li_regex, li_split],args)

结果是:

3.9.1 (default, Feb  3 2021, 07:38:02) 
[Clang 12.0.0 (clang-1200.0.32.29)]

small list, small string:
rate/sec μsec/pass li_regex li_split li_count
li_regex 516 1,939.0 -- -92.4% -93.4%
li_split 6,810 146.8 1220.4% -- -12.5%
li_count 7,785 128.5 1409.4% 14.3% --

large list, small string:
rate/sec μsec/pass li_regex li_split li_count
li_regex 0 39,162,529.8 -- -99.7% -99.8%
li_split 9 106,144.2 36795.6% -- -21.8%
li_count 12 83,023.3 47070.6% 27.8% --

small list, large string:
rate/sec μsec/pass li_regex li_split li_count
li_regex 2 453,577.1 -- -93.9% -98.0%
li_split 36 27,754.5 1534.2% -- -67.6%
li_count 111 8,985.6 4947.8% 208.9% --

med list, med string:
rate/sec μsec/pass li_regex li_split li_count
li_regex 0 1,166,950,320.1 -- -99.7% -99.9%
li_split 0 3,418,900.8 34032.3% -- -67.4%
li_count 1 1,114,163.6 104637.8% 206.9% --

在所有情况下,str.count 方法都是最快的 -- 通常是数量级。

关于python - 如何确定不带空格的字符串中列表中的匹配项数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67036208/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com