gpt4 book ai didi

来自 ANTLR 解析树的 Python AST?

转载 作者:太空狗 更新时间:2023-10-30 02:30:50 26 4
gpt4 key购买 nike

我找到了一个 ANTLRv4 Python3 grammer , 但它会生成一个解析树,它通常有很多无用的节点。

我正在寻找一个已知的包来从该解析树中获取 Python AST。

有这样的东西吗?

编辑:关于使用 Python ast 包的说明:我的项目是用 Java 编写的,我需要解析 Python 文件。

编辑 2:“AST”是指 http://docs.python.org/2/library/ast.html#abstract-grammar ,而“解析树”是指 http://docs.python.org/2/reference/grammar.html .

最佳答案

以下可能是一个开始:

public class AST {

private final Object payload;

private final List<AST> children;

public AST(ParseTree tree) {
this(null, tree);
}

private AST(AST ast, ParseTree tree) {
this(ast, tree, new ArrayList<AST>());
}

private AST(AST parent, ParseTree tree, List<AST> children) {

this.payload = getPayload(tree);
this.children = children;

if (parent == null) {
walk(tree, this);
}
else {
parent.children.add(this);
}
}

public Object getPayload() {
return payload;
}

public List<AST> getChildren() {
return new ArrayList<>(children);
}

private Object getPayload(ParseTree tree) {
if (tree.getChildCount() == 0) {
return tree.getPayload();
}
else {
String ruleName = tree.getClass().getSimpleName().replace("Context", "");
return Character.toLowerCase(ruleName.charAt(0)) + ruleName.substring(1);
}
}

private static void walk(ParseTree tree, AST ast) {

if (tree.getChildCount() == 0) {
new AST(ast, tree);
}
else if (tree.getChildCount() == 1) {
walk(tree.getChild(0), ast);
}
else if (tree.getChildCount() > 1) {

for (int i = 0; i < tree.getChildCount(); i++) {

AST temp = new AST(ast, tree.getChild(i));

if (!(temp.payload instanceof Token)) {
walk(tree.getChild(i), temp);
}
}
}
}

@Override
public String toString() {

StringBuilder builder = new StringBuilder();

AST ast = this;
List<AST> firstStack = new ArrayList<>();
firstStack.add(ast);

List<List<AST>> childListStack = new ArrayList<>();
childListStack.add(firstStack);

while (!childListStack.isEmpty()) {

List<AST> childStack = childListStack.get(childListStack.size() - 1);

if (childStack.isEmpty()) {
childListStack.remove(childListStack.size() - 1);
}
else {
ast = childStack.remove(0);
String caption;

if (ast.payload instanceof Token) {
Token token = (Token) ast.payload;
caption = String.format("TOKEN[type: %s, text: %s]",
token.getType(), token.getText().replace("\n", "\\n"));
}
else {
caption = String.valueOf(ast.payload);
}

String indent = "";

for (int i = 0; i < childListStack.size() - 1; i++) {
indent += (childListStack.get(i).size() > 0) ? "| " : " ";
}

builder.append(indent)
.append(childStack.isEmpty() ? "'- " : "|- ")
.append(caption)
.append("\n");

if (ast.children.size() > 0) {
List<AST> children = new ArrayList<>();
for (int i = 0; i < ast.children.size(); i++) {
children.add(ast.children.get(i));
}
childListStack.add(children);
}
}
}

return builder.toString();
}
}

并可用于为输入 "f(arg1='1')\n" 创建 AST,如下所示:

public static void main(String[] args) {

Python3Lexer lexer = new Python3Lexer(new ANTLRInputStream("f(arg1='1')\n"));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

ParseTree tree = parser.file_input();
AST ast = new AST(tree);

System.out.println(ast);
}

这将打印:

'- file_input   |- stmt   |  |- small_stmt   |  |  |- atom   |  |  |  '- TOKEN[type: 35, text: f]   |  |  '- trailer   |  |     |- TOKEN[type: 47, text: (]   |  |     |- arglist   |  |     |  |- test   |  |     |  |  '- TOKEN[type: 35, text: arg1]   |  |     |  |- TOKEN[type: 53, text: =]   |  |     |  '- test   |  |     |     '- TOKEN[type: 36, text: '1']   |  |     '- TOKEN[type: 48, text: )]   |  '- TOKEN[type: 34, text: \n]   '- TOKEN[type: -1, text: ]

我意识到这仍然包含您可能不想要的节点,但您甚至可以添加一组您想要排除的标记类型。随意破解!

Here is a Gist包含上面代码的一个版本,带有正确的导入语句和一些 JavaDocs 和内联注释。

关于来自 ANTLR 解析树的 Python AST?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24766537/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com