gpt4 book ai didi

python - 在 Python 中获取 2 个子字符串之间的所有可能字符串

转载 作者:行者123 更新时间:2023-12-04 15:19:37 25 4
gpt4 key购买 nike

获取中间字符串的问题has been asked之前但不涵盖某些条件。在我的例子中,我可能有如下字符串:

subject = '"lorem ipsum", "foo", "baz", "bar", "lorem ipsum", "bar", "ipsum", "foo", "baz", "bar"'

我想提取 foo"、""、"bar" 之间的所有文本实例。传统的“Get between”答案是:

import re
result = re.findall('foo", "(.*)", "bar', subject)
print(result)

只返回 1 个结果字符串:

'baz", "bar", "lorem ipsum", "bar", "ipsum", "foo", "baz'

但我想返回的是所有可能的“Between”字符串的列表,例如:

[
'baz',
'baz", "bar", "lorem ipsum',
'baz", "bar", "lorem ipsum", "bar", "ipsum", "foo", "baz',
'baz'
]

所以给定两个子字符串(开始和结束)和一个主题字符串,如何在 subject 中获取 startend 之间的所有可能子字符串>?计算效率最高的解决方案当然是可取的。

最佳答案

这是一种可行的方法,我认为相当有效。

  • 使用 re.finditer,找到起始和结束模式的索引/跨度
  • 创建索引/跨度的所有合理组合
  • 切出结果
subject = '"lorem ipsum", "foo", "baz", "bar", "lorem ipsum", "bar", "ipsum", "foo", "baz", "bar"'

import re
# spans of "start" pattern
sSpans = [match.span() for match in re.finditer('foo", "' ,subject)]
# spans of "end" pattern
eSpans = [match.span() for match in re.finditer('", "bar"',subject)]
# all possible combination of "between" spans
spans = [(s[1],e[0]) for s in sSpans for e in eSpans]
# filter only reasonable spand where end > start
spans = [(s,e) for s,e in spans if e > s]
# slice out the "between strings"
result = [subject[s:e] for s,e in spans]
for r in result: print(r)

也可以压缩成一行:

result = [subject[s.span()[1]:e.span()[0]] for s in re.finditer('foo", "' ,subject) for e in re.finditer('", "bar"',subject) if e.span()[0] > s.span()[1]]
for r in result: print(r)

关于python - 在 Python 中获取 2 个子字符串之间的所有可能字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63621276/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com