python - 简化/优化 for 循环链-6ren

python - 简化/优化 for 循环链

转载作者：太空狗更新时间：2023-10-29 16:56:00

26

4

我有一个 for 循环链，它在原始字符串列表上工作，然后随着链的向下逐渐过滤列表，例如:

import re

# Regex to check that a cap exist in string.
pattern1 = re.compile(r'\d.*?[A-Z].*?[a-z]')
vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.

def check_no_caps(s):
    return None if re.match(pattern1, s) else s

def check_nomorethan_five(s):
    return s if len(s) <= 5 else None

def check_in_vocab_plus_x(s,x):
    # s and x are both str.
    return None if s not in vocab else s+x

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# filter with check_no_caps
slist = [check_no_caps(s) for s in slist]
# filter no more than 5.
slist = [check_nomorethan_five(s) for s in slist if s is not None]
# filter in vocab
slist = [check_in_vocab_plus_x(s, str(i)) for i,s in enumerate(slist) if s is not None]

以上只是一个例子，实际上我操作字符串的函数更复杂，但它们确实返回原始字符串或操作后的字符串。

我可以使用生成器而不是列表来做这样的事情:

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# filter with check_no_caps and no more than 5.
slist = (s2 check_no_caps(s1) for s1 in slist 
         for s2 in check_nomorethan_five(s1) if s1)
# filter in vocab
slist = [check_in_vocab_plus_x(s, str(i)) for i,s in enumerate(slist) if s is not None]

或者在一个疯狂的嵌套生成器中:

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
slist = (s3 check_no_caps(s1) for s1 in slist 
         for s2 in check_nomorethan_five(s1) if s1
         for s3 in check_in_vocab_plus_x(s2, str(i)) if s2)

一定有更好的方法。 有没有办法让 for 循环链更快？

有没有办法用 map、reduce 和 filter 来做到这一点？会更快吗？

想象一下，我的原始列表非常非常大，有 100 亿之多。我的函数并不像上面的函数那么简单，它们进行一些计算并且每秒调用大约 1,000 次。

最佳答案

首先是您在琴弦上制作的整个过程。您正在获取一些字符串，并对每个字符串应用特定的函数。然后你清理列表。暂时假设您应用于字符串的所有函数都在恒定时间工作(这不是真的，但现在这无关紧要)。在您的解决方案中，您迭代 list 应用一个函数(即 O(N))。然后你使用下一个函数并再次迭代(另一个 O(N))，依此类推。因此，加速的明显方法是减少循环次数。这并不难。

接下来要做的是尝试优化您的功能。例如。你使用 regexp 检查字符串是否有大写字母，但是有 str.islower (如果字符串中所有大小写字符都是小写并且至少有一个大小写字符，则返回 true，否则返回 false)。

因此，这是简化和加速代码的第一次尝试:

vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.

# note that first two functions can be combined in one
def no_caps_and_length(s):
    return s if s.islower() and len(s)<=5 else None

# this one is more complicated and cannot be merged with first two
# (not really, but as you say, some functions are rather complicated)
def check_in_vocab_plus_x(s,x):
    # s and x are both str.
    return None if s not in vocab else s+x

# now let's introduce a function that would pipe a string through all functions you need
def pipe_through_funcs(s):
    # yeah, here we have only two, but could be more
    funcs = [no_caps_and_length, check_in_vocab_plus_x]
    for func in funcs:
        if s == None: return s
        s = func(s)
    return s

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# final step:
slist = filter(lambda a: a!=None, map(pipe_through_funcs, slist))

可能还有一件事可以改进。当前，您遍历列表修改元素，然后将其过滤掉。但是如果过滤然后修改可能会更快。像这样:

vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.

# make a function that does all the checks for filtering
# you can make a big expression and return its result,
# or a sequence of ifs, or anything in-between,
# it won't affect performance,
# but make sure you put cheaper checks first
def my_filter(s):
    if len(s)>5: return False
    if not s.islower(): return False
    if s not in vocab: return False
    # maybe more checks here
    return True

# now we need modifying function
# there is a concern: if you need indices as they were in original list
# you might need to think of some way to pass them here
# as you iterate through filtered out list
def modify(s,x):
    s += x
    # maybe more actions
    return s

slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# final step:
slist = map(modify, filter(my_filter, slist))

另请注意，在某些情况下，生成器、 map 和其他东西可以更快，但这并不总是正确的。我相信，如果您过滤掉的项目数量很大，那么使用带追加的 for 循环可能会更快。我不保证它会更快，但你可以尝试这样的事情:

initial_list = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
new_list = []
for s in initial_list:
    processed = pipe_through_funcs(s)
    if processed != None: new_list.append(processed)

关于python - 简化/优化 for 循环链，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38424004/

26

4

0

文章推荐： angular - 在 Angular HttpClient 拦截器中使用 promise

文章推荐： python - 左对齐 Pandas 滚动对象

javascript - 简化 jQuery 代码简化
hello1 hello2 hello3 hello4 hello5 hello6
Clojure 简化
有没有更简短的写法: (apply f (cons a (cons b (cons c d)))) ？谢谢! (我正在编写一些调用其他函数的辅助函数，这种“模式”似乎经常出现
.NETAspire正式发布：简化.NET云原生开发
.NET团队北京时间2024年5月22日已正式发布.NET Aspire ，在博客文章里做了详细的介绍：.NET Aspire 正式发布：简化 .NET 云原生开发 - .NET 博客 (micros
sql - 简化 WHERE (NOT) IN (...) 和 WHERE (NOT) IN (...)
在this dbfiddle demo我有一个 DELETE FROM...WHERE 最后像这样: ...... DELETE FROM data_table WHERE
excel - 简化 if 语句
我有几个 if 语句，如下面的一个。我假设这是一种非常糟糕/长期的编码方式，但不确定我应该做些什么不同的事情。有人有什么建议吗？谢谢 For a = 1 To Leagues If a =
程序修复点的 Coq 简化
有什么类似的战术simpl为 Program Fixpoint ? 特别是，如何证明以下无关紧要的陈述？ Program Fixpoint bla (n:nat) {measure n} := mat
javascript - 简化 .on() 方法中同一父元素的多个子元素的选择器
我使用此代码来跟踪表单上是否有任何更改: $(document).on('input', '.track', function() { var form = $(this); }); 由于这不
JavaScript - 简化/缩短代码
我有以下函数，我想用 for 循环来简化它，但不知道该怎么做。任何帮助都感激不尽。基本上，如果字段值为 0 或 null，则我的总值(字段)应为 0，否则，如果字段值从 1 到 1000，则总值变为
haskell - 简化 do 表示法
我正在尝试对时间字符串执行非常简单的解析 data Time = Time Int Int Int String -- example input: 07:00:00AM timeParser ::
javascript - 简化 setInterval
为了使我的代码更具可读性和更简单，我对这段代码绞尽脑汁: var refresh = setInterval(datumTijd, 1000); function datumTijd() { do
c# - 简化 if 语句
这个问题已经有答案了: Check if a variable is in an ad-hoc list of values (8 个回答) 已关闭 9 年前。只是一个基本的if声明，试图使其更简单
java - 简化 if 语句
我有一个这样的 if 语句 int val = 1; if (val == 0 || val == 1 || val == 2 || ...); 有没有更简单的方法？例如: int val = 1;
java - 简化 if 语句
我有一个程序，其中有一些 if 语句，与我将要向您展示的程序类似。我想知道你们是否可以帮助我以任何方式简化这个方程。我之所以问这个问题，是因为在我的 Notepad++ 中，它持续了 443 列，如果
logic - 简化 if 语句？
是否可以简化这个 if 语句？如果是，答案是什么？ if (type) { if(NdotL >= 0.0) { color
R 简化 shapefile
我有一个包含亚马逊大河的 shapefile。仅 shapefile 就有 37.9 MB，连同属性表高达 42.1 MB。我正在生成所有巴西亚马逊的 PNG 图像，每个 1260x940 像素，sh
java - 简化 printf
System.out.printf("%7s", "a"); System.out.printf("%7s", "b"); System.out.printf("%7s", "c"); S
c - makefile 简化
假设我们有客户端-服务器应用程序，由一个 makefile 编译。服务器使用 libtask 为并行客户端提供服务。客户端使用 ncurses 来处理某些图形。目录树如下所示: ./ --bin/ -
c# - 处置对象(简化)
我在 Mono 密码转换的重新实现中找到了这段代码。我没有修改或简化任何东西 - 这就是它的实际运行方式(有评论如//Dispose unmanaged objects，但实际上什么也没做)。现在
c# - 简化 if if if 以减少代码大小和可读性
我需要一些帮助来简化这个包含数百行的庞大代码，但我真的不知道该怎么做。代码看起来真的很乱，我需要的是返回具有预定义文本颜色的模型。有什么简单的方法吗？我必须多解释一点:- 有一个包含许多型号的手机列
javascript - 简化/优雅此代码？
这里有一些代码可以正常工作，但我认为可以简化/缩短。它基本上是点击一个列表项，获取它的 ID，然后根据 ID 显示/隐藏/删除元素。关于如何使用函数或循环来简化它的建议？ $("#btn_remov

首页

博学

6Ren·AI

商城

python - 简化/优化 for 循环链