gpt4 book ai didi

python - 将字符串分隔成括号、方括号和平面文本中的内容

转载 作者:行者123 更新时间:2023-11-30 23:35:17 25 4
gpt4 key购买 nike

我需要一种方法,在 python 中给出一个文本字符串,将其内容分离到一个列表中,按 3 个参数分割 - 最外面的括号、最外面的括号和普通文本,保留原始语法。

例如,给定一个字符串

(([a] b) c ) [d] (e) f

预期的输出将是这个列表:

['(([a] b) c )', '[d]', '(e)', ' f']

我用正则表达式尝试了几件事,例如

\[.+?\]|\(.+?\)|[\w+ ?]+

这给了我

>>> re.findall(r'\[.+?\]|\(.+?\)|[\w+ ?]+', '(([a] b) c ) [d] (e) f')
['(([a] b)', ' c ', ' ', '[d]', ' ', '(e)', ' f']

(错误列表中的项目 c)

我也尝试过它的贪婪版本,

\[.+\]|\(.+\)|[\w+ ?]+

但是当字符串具有相同类型的单独运算符时,它就显得不足了:

>>> re.findall(r'\[.+\]|\(.+\)|[\w+ ?]+', '(([a] b) c ) [d] (e) f')
['(([a] b) c ) [d] (e)', ' f']

然后我从正则表达式转向使用堆栈:

>>> def parenthetic_contents(string):
stack = []
for i, c in enumerate(string):
if c == '(' or c == '[':
stack.append(i)
elif (c == ')' or c == ']'):
start = stack.pop()
yield (len(stack), string[start + 0:i+1])

对于括号和圆括号来说,这非常有效,除了我无法获取平面文本(或者我有,但我不知道它?):

>>> list(parenthetic_contents('(([a] b) c ) [d] (e) f'))
[(2, '[a]'), (1, '([a] b)'), (0, '(([a] b) c )'), (0, '[d]'), (0, '(e)')]

我对 pyparsing 不熟悉。乍一看,nestedExpr() 似乎可以解决问题,但它只需要一个分隔符(() 或 [],但不能同时使用两者),这对我来说不起作用。

我现在已经没有主意了。欢迎任何建议。

最佳答案

仅进行了非常轻微的测试(并且输出包含空格)。与@Marius 的回答(以及有关需要 PDA 的括号匹配的一般规则)一样,我使用堆栈。然而,我内心有一点额外的偏执。

def paren_matcher(string, opens, closes):
"""Yield (in order) the parts of a string that are contained
in matching parentheses. That is, upon encounting an "open
parenthesis" character (one in <opens>), we require a
corresponding "close parenthesis" character (the corresponding
one from <closes>) to close it.

If there are embedded <open>s they increment the count and
also require corresponding <close>s. If an <open> is closed
by the wrong <close>, we raise a ValueError.
"""
stack = []
if len(opens) != len(closes):
raise TypeError("opens and closes must have the same length")
# could make sure that no closes[i] is present in opens, but
# won't bother here...

result = []
for char in string:
# If it's an open parenthesis, push corresponding closer onto stack.
pos = opens.find(char)
if pos >= 0:
if result and not stack: # yield accumulated pre-paren stuff
yield ''.join(result)
result = []
result.append(char)
stack.append(closes[pos])
continue
result.append(char)
# If it's a close parenthesis, match it up.
pos = closes.find(char)
if pos >= 0:
if not stack or stack[-1] != char:
raise ValueError("unbalanced parentheses: %s" %
''.join(result))
stack.pop()
if not stack: # final paren closed
yield ''.join(result)
result = []
if stack:
raise ValueError("unclosed parentheses: %s" % ''.join(result))
if result:
yield ''.join(result)

print list(paren_matcher('(([a] b) c ) [d] (e) f', '([', ')]'))
print list(paren_matcher('foo (bar (baz))', '(', ')'))

关于python - 将字符串分隔成括号、方括号和平面文本中的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17479446/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com