gpt4 book ai didi

python - PLY - 返回多个 token

转载 作者:太空宇宙 更新时间:2023-11-04 03:37:29 25 4
gpt4 key购买 nike

据我所知,对 Python 源代码进行词法分析的技术是:

  • 当当前行的缩进级别小于上一行的缩进级别时,产生 DEDENT。如果要关闭多个 INDENT,则生成多个 DEDENT。
  • 当到达输入末尾时,如果有未关闭的 INDENT(s),则产生 DEDENT(s)。

现在,使用 PLY:

  • 如何从 t_definition 返回多个标记?
  • 如何创建一个在到达 EOF 时调用的 t_definition?简单的 \Z 不起作用——PLY 提示它匹配空字符串。

最佳答案

据我所知,PLY 没有实现推送解析器接口(interface),而这是使用 bison 最容易解决这个问题的方法。但是,很容易注入(inject)您自己的词法分析器包装器,它可以处理 dedent 标记队列。

最小的词法分析器实现需要实现一个 token() 方法,该方法返回一个具有 typevalue 属性的对象。 (如果您的解析器使用它,您也需要它,但我不会在这里担心。)

现在,让我们假设底层(PLY 生成的)词法分析器生成 NEWLINE 标记,其值是换行符后前导空格的长度。如果某些行不参与 INDENT/DEDENT 算法,则应为这些行抑制 NEWLINE;我们这里不考虑这种情况。一个简单的示例词法分析器函数(仅适用于空格,不适用于制表符)可能是:

# This function doesn't handle tabs. Beware!
def t_NEWLINE(self, t):
r'\n(?:\s*(?:[#].*)?\n)*\s*'
t.value = len(t.value) - 1 - t.value.rfind('\n')
return t

现在我们用处理缩进的包装器包装 PLY 生成的词法分析器:

# WARNING:
# This code hasn't been tested much and it also may be inefficient
# and/or inexact. It doesn't do python-style tab handling. Etc. etc.

from collections import namedtuple, deque

# These are the tokens. We only generate one of each here. If
# we used lineno or didn't trust the parser to not mess with the
# token, we could generate a new one each time.
IndentToken = namedtuple('Token', 'type value')
dedent = IndentToken('DEDENT', None)
indent = IndentToken('INDENT', None)
newline= IndentToken('NEWLINE', None)

class IndentWrapper(object):

def __init__(self, lexer):
"""Create a new wrapper given the lexer which is being wrapped"""
self.lexer = lexer
self.indent_stack = [0]
# A queue is overkill for this case, but it's simple.
self.token_queue = deque()
# This is just in case the ply-generated lexer cannot be called again
# after it returns None.
self.eof_reached = False

def token(self):
"""Return the next token, or None if end of input has been reached"""
# Do we have any queued tokens?
if self.token_queue:
return self.token_queue.popleft()
# Are we done?
if self.eof_reached:
return None
# Get a token
t = self.lexer.token()
if t is None:
# At end of input, we might need to send some dedents
self.eof_reached = True
if len(self.indent_stack) > 1:
t = dedent
for i in range(len(self.indent_stack) - 1):
self.token_queue.append(dedent)
self.indent_stack = [0]
elif t.type == "NEWLINE":
# The NEWLINE token includes the amount of leading whitespace.
# Fabricate indent or dedents as/if necessary and queue them.
if t.value > self.indent_stack[-1]:
self.indent_stack.append(t.value)
self.token_queue.append(indent)
else:
while t.value < self.indent_stack[-1]:
self.indent_stack.pop()
self.token_queue.append(dedent)
if t.value != self.indent_stack[-1]:
raise IndentError # Or however you indicate errors
return t

关于python - PLY - 返回多个 token ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28259366/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com