gpt4 book ai didi

python - JSON 文件中的变音符号会导致 ANTLR4 创建的 Python 代码出现错误

转载 作者:行者123 更新时间:2023-11-28 19:08:12 25 4
gpt4 key购买 nike

我在 github/antlr4 上用 JSON 语法创建了 python 模块

antlr4 -Dlanguage=Python3 JSON.g4

我已经按照本指南编写了一个主程序“JSON2.py”:https://github.com/antlr/antlr4/blob/master/doc/python-target.md并从 github 下载了 example1.json。

python3 ./JSON2.py example1.json # works perfectly, but 
python3 ./JSON2.py bookmarks-2017-05-24.json # the bookmarks contain German Umlauts like "ü"

...
File "/home/xyz/lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)

JSON2.py 中有问题的行是

input = FileStream(argv[1])

我搜索了 stackoverflow 并尝试了这个而不是使用上面的 FileStream:

fp = codecs.open(argv[1], 'rb', 'utf-8')
try:
input = fp.read()
finally:
fp.close()
lexer = JSONLexer(input)
stream = CommonTokenStream(lexer)
parser = JSONParser(stream)
tree = parser.json() # This is line 39, mentioned in the error message

此程序的执行以错误消息结束,即使输入文件不包含变音符号也是如此:

python3 ./JSON2.py example1.json 
Traceback (most recent call last):
File "./JSON2.py", line 46, in <module>
main(sys.argv)
File "./JSON2.py", line 39, in main
tree = parser.json()
File "/home/x/Entwicklung/antlr/links/JSONParser.py", line 108, in json
self.enterRule(localctx, 0, self.RULE_json)
File "/home/xyz/lib/python3.5/site-packages/antlr4/Parser.py", line 358, in enterRule
self._ctx.start = self._input.LT(1)
File "/home/xyz/lib/python3.5/site-packages/antlr4/CommonTokenStream.py", line 61, in LT
self.lazyInit()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 186, in lazyInit
self.setup()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 189, in setup
self.sync(0)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 111, in sync
fetched = self.fetch(n)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 123, in fetch
t = self.tokenSource.nextToken()
File "/home/xyz/lib/python3.5/site-packages/antlr4/Lexer.py", line 111, in nextToken
tokenStartMarker = self._input.mark()
AttributeError: 'str' object has no attribute 'mark'

这解析正确:


javac *.java
grun JSON json -gui bookmarks-2017-05-24.json
所以语法本身不是问题。

最后的问题是:我应该如何在 python 中处理输入文件,以便词法分析器和解析器能够消化它?

提前致谢。

最佳答案

确保您的输入文件实际编码为UTF-8。词法分析器的字符识别的许多问题都是由使用其他编码引起的。我刚刚采用了一个测试平台应用程序,将 ë 添加到 IDENTIFIER 的可用字符列表中,然后它再次运行。 UTF-8 是关键 - 并确保您的语法也允许在您想要接受这些字符的地方使用它们。

关于python - JSON 文件中的变音符号会导致 ANTLR4 创建的 Python 代码出现错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44194089/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com