gpt4 book ai didi

python - 使用 Parsimonious Python 库解析多行文本

转载 作者:太空宇宙 更新时间:2023-11-03 15:58:05 45 4
gpt4 key购买 nike

我正在尝试使用 python parsimonious 库解析多行文本。我已经玩了一段时间了,不知道如何有效地处理换行符。一个例子如下。下面的行为是有道理的。我看到了this comment来自 Erik Rose在简约的问题上,但我无法弄清楚如何在没有错误的情况下实现它。感谢这里的任何提示...

singleline_text = '''\
FIRST something cool'''

multiline_text = '''\
FIRST something very
cool
SECOND more awesomeness
'''

grammar = Grammar(
"""
bin = ORDER spaces description
ORDER = 'FIRST' / 'SECOND'
spaces = ~'\s*'
description = ~'[A-z0-9 ]*'
""")

适用于单行输出,print(grammar.parse(singleline_text)) 给出:

<Node called "bin" matching "FIRST   something cool">
<Node called "ORDER" matching "FIRST">
<Node matching "FIRST">
<RegexNode called "spaces" matching " ">
<RegexNode called "description" matching "something cool">

但是 multiline 给出了问题,我无法根据上面的链接解决,print(grammar.parse(multiline_text)) 给出:

---------------------------------------------------------------------------
IncompleteParseError Traceback (most recent call last)
<ipython-input-123-c346891dc883> in <module>()
----> 1 print(grammar.parse(multiline_text))

/Users/me/anaconda3/lib/python3.6/site-packages/parsimonious/grammar.py in parse(self, text, pos)
121 """
122 self._check_default_rule()
--> 123 return self.default_rule.parse(text, pos=pos)
124
125 def match(self, text, pos=0):

/Users/me/anaconda3/lib/python3.6/site-packages/parsimonious/expressions.py in parse(self, text, pos)
110 node = self.match(text, pos=pos)
111 if node.end < len(text):
--> 112 raise IncompleteParseError(text, node.end, self)
113 return node
114

IncompleteParseError: Rule 'bin' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with '
cool
SECOND' (line 1, column 23).

这是我试过但没有用的一件事:

grammar2 = Grammar(
"""
bin = ORDER spaces description newline
ORDER = 'FIRST' / 'SECOND'
spaces = ~'\s*'
description = ~'[A-z0-9 \n]*'
newline = ~r'#[^\r\n]*'
""")

print(grammar2.parse(multiline_text))

(从 211 行堆栈跟踪截断):

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 4))

---------------------------------------------------------------------------
SyntaxError Traceback (most recent call last)

...


VisitationError: SyntaxError: EOL while scanning string literal (<unknown>, line 1)

Parse tree:
<Node called "spaceless_literal" matching "'[A-z0-9
]*'"> <-- *** We were here. ***
<RegexNode matching "'[A-z0-9
]*'">

最佳答案

看起来您需要在语法中重复 bin 元素:

grammar = Grammar(
r"""
one = bin +
bin = ORDER spaces description newline
ORDER = 'FIRST' / 'SECOND'
newline = ~"\n*"
spaces = ~"\s*"
description = ~"[A-z0-9 ]*"i
""")

有了它你可以解析像这样的东西:

multiline_text = '''\
FIRST something very cool
SECOND more awesomeness
SECOND even better
'''

关于python - 使用 Parsimonious Python 库解析多行文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42107496/

45 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com