gpt4 book ai didi

python - 简化/优化 for 循环链

转载 作者:太空狗 更新时间:2023-10-29 16:56:00 26 4
gpt4 key购买 nike

我有一个 for 循环链,它在原始字符串列表上工作,然后随着链的向下逐渐过滤列表,例如:

import re

# Regex to check that a cap exist in string.
pattern1 = re.compile(r'\d.*?[A-Z].*?[a-z]')
vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.

def check_no_caps(s):
return None if re.match(pattern1, s) else s

def check_nomorethan_five(s):
return s if len(s) <= 5 else None

def check_in_vocab_plus_x(s,x):
# s and x are both str.
return None if s not in vocab else s+x

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# filter with check_no_caps
slist = [check_no_caps(s) for s in slist]
# filter no more than 5.
slist = [check_nomorethan_five(s) for s in slist if s is not None]
# filter in vocab
slist = [check_in_vocab_plus_x(s, str(i)) for i,s in enumerate(slist) if s is not None]

以上只是一个例子,实际上我操作字符串的函数更复杂,但它们确实返回原始字符串或操作后的字符串。

我可以使用生成器而不是列表来做这样的事情:

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# filter with check_no_caps and no more than 5.
slist = (s2 check_no_caps(s1) for s1 in slist
for s2 in check_nomorethan_five(s1) if s1)
# filter in vocab
slist = [check_in_vocab_plus_x(s, str(i)) for i,s in enumerate(slist) if s is not None]

或者在一个疯狂的嵌套生成器中:

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
slist = (s3 check_no_caps(s1) for s1 in slist
for s2 in check_nomorethan_five(s1) if s1
for s3 in check_in_vocab_plus_x(s2, str(i)) if s2)

一定有更好的方法。 有没有办法让 for 循环链更快?

有没有办法用 mapreducefilter 来做到这一点?会更快吗?

想象一下,我的原始列表非常非常大,有 100 亿之多。我的函数并不像上面的函数那么简单,它们进行一些计算并且每秒调用大约 1,000 次。

最佳答案

首先是您在琴弦上制作的整个过程。您正在获取一些字符串,并对每个字符串应用特定的函数。然后你清理列表。暂时假设您应用于字符串的所有函数都在恒定时间工作(这不是真的,但现在这无关紧要)。在您的解决方案中,您迭代 list 应用一个函数(即 O(N))。然后你使用下一个函数并再次迭代(另一个 O(N)),依此类推。因此,加速的明显方法是减少循环次数。这并不难。

接下来要做的是尝试优化您的功能。例如。你使用 regexp 检查字符串是否有大写字母,但是有 str.islower (如果字符串中所有大小写字符都是小写并且至少有一个大小写字符,则返回 true,否则返回 false)。

因此,这是简化和加速代码的第一次尝试:

vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.

# note that first two functions can be combined in one
def no_caps_and_length(s):
return s if s.islower() and len(s)<=5 else None

# this one is more complicated and cannot be merged with first two
# (not really, but as you say, some functions are rather complicated)
def check_in_vocab_plus_x(s,x):
# s and x are both str.
return None if s not in vocab else s+x

# now let's introduce a function that would pipe a string through all functions you need
def pipe_through_funcs(s):
# yeah, here we have only two, but could be more
funcs = [no_caps_and_length, check_in_vocab_plus_x]
for func in funcs:
if s == None: return s
s = func(s)
return s

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# final step:
slist = filter(lambda a: a!=None, map(pipe_through_funcs, slist))

可能还有一件事可以改进。当前,您遍历列表修改元素,然后将其过滤掉。但是如果过滤然后修改可能会更快。像这样:

vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.

# make a function that does all the checks for filtering
# you can make a big expression and return its result,
# or a sequence of ifs, or anything in-between,
# it won't affect performance,
# but make sure you put cheaper checks first
def my_filter(s):
if len(s)>5: return False
if not s.islower(): return False
if s not in vocab: return False
# maybe more checks here
return True

# now we need modifying function
# there is a concern: if you need indices as they were in original list
# you might need to think of some way to pass them here
# as you iterate through filtered out list
def modify(s,x):
s += x
# maybe more actions
return s

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# final step:
slist = map(modify, filter(my_filter, slist))

另请注意,在某些情况下,生成器、 map 和其他东西可以更快,但这并不总是正确的。我相信,如果您过滤掉的项目数量很大,那么使用带追加的 for 循环可能会更快。我不保证它会更快,但你可以尝试这样的事情:

initial_list = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
new_list = []
for s in initial_list:
processed = pipe_through_funcs(s)
if processed != None: new_list.append(processed)

关于python - 简化/优化 for 循环链,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38424004/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com