ANTLR 实现 python 之类的缩进依赖语法的最简单方法是什么？-6ren

ANTLR 实现 python 之类的缩进依赖语法的最简单方法是什么？

转载作者：行者123 更新时间：2023-12-03 12:11:56

我正在尝试实现 python 之类的依赖缩进的语法。

源示例:

ABC QWE
  CDE EFG
  EFG CDE
    ABC 
  QWE ZXC

如我所见，我需要的是实现两个标记 INDENT 和 DEDENT，所以我可以写如下:

grammar mygrammar;
text: (ID | block)+;
block: INDENT (ID|block)+ DEDENT;
INDENT: ????;
DEDENT: ????;

有没有什么简单的方法可以使用 ANTLR 来实现这一点？

(如果可能的话，我更喜欢使用标准的 ANTLR 词法分析器。)

最佳答案

我不知道最简单的处理方法是什么，但以下是一种相对简单的方法。每当您在词法分析器中匹配换行符时，可选择匹配一个或多个空格。如果换行后有空格，则将这些空格的长度与当前的缩进大小进行比较。如果它大于当前缩进大小，则发出 Indent token ，如果它小于当前缩进大小，则发出 Dedent token ，如果相同，则不执行任何操作。

您还需要发出许多 Dedent文件末尾的标记让每个 Indent有一个匹配的 Dedent token 。

为了使其正常工作，您必须在输入源文件中添加前导和尾随换行符!

ANTRL3

快速演示:

grammar PyEsque;

options {
  output=AST;
}

tokens {
  BLOCK;
}

@lexer::members {

  private int previousIndents = -1;
  private int indentLevel = 0;
  java.util.Queue<Token> tokens = new java.util.LinkedList<Token>();

  @Override
  public void emit(Token t) {
    state.token = t;
    tokens.offer(t);
  }

  @Override
  public Token nextToken() {
    super.nextToken();
    return tokens.isEmpty() ? Token.EOF_TOKEN : tokens.poll();
  }

  private void jump(int ttype) {
    indentLevel += (ttype == Dedent ? -1 : 1);
    emit(new CommonToken(ttype, "level=" + indentLevel));
  }
}

parse
 : block EOF -> block
 ;

block
 : Indent block_atoms Dedent -> ^(BLOCK block_atoms)
 ;

block_atoms
 :  (Id | block)+
 ;

NewLine
 : NL SP?
   {
     int n = $SP.text == null ? 0 : $SP.text.length();
     if(n > previousIndents) {
       jump(Indent);
       previousIndents = n;
     }
     else if(n < previousIndents) {
       jump(Dedent);
       previousIndents = n;
     }
     else if(input.LA(1) == EOF) {
       while(indentLevel > 0) {
         jump(Dedent);
       }
     }
     else {
       skip();
     }
   }
 ;

Id
 : ('a'..'z' | 'A'..'Z')+
 ;

SpaceChars
 : SP {skip();}
 ;

fragment NL     : '\r'? '\n' | '\r';
fragment SP     : (' ' | '\t')+;
fragment Indent : ;
fragment Dedent : ;

您可以使用该类测试解析器:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    PyEsqueLexer lexer = new PyEsqueLexer(new ANTLRFileStream("in.txt"));
    PyEsqueParser parser = new PyEsqueParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

如果您现在将以下内容放入名为 in.txt 的文件中:

AAA AAAAA  BBB BB B  BB BBBBB BB    CCCCCC C CC  BB BBBBBB    C CCC      DDD DD D      DDD D DDD

(Note the leading and trailing line breaks!)

then you'll see output that corresponds to the following AST:

enter image description here

Note that my demo wouldn't produce enough dedents in succession, like dedenting from ccc to aaa (2 dedent tokens are needed):

aaa
  bbb
    ccc
aaa

您需要调整 else if(n < previousIndents) { ... } 中的代码根据 n 之间的差异，可能会发出 1 个以上的 dedent token 和 previousIndents .在我的头顶上，这可能是这样的:

 else if(n < previousIndents) {
   // Note: assuming indent-size is 2. Jumping from previousIndents=6 
   // to n=2 will result in emitting 2 `Dedent` tokens
   int numDedents = (previousIndents - n) / 2; 
   while(numDedents-- > 0) {
     jump(Dedent);
   }
   previousIndents = n;
 }

ANTLR4

对于 ANTLR4，请执行以下操作:

grammar Python3;

tokens { INDENT, DEDENT }

@lexer::members {
  // A queue where extra tokens are pushed on (see the NEWLINE lexer rule).
  private java.util.LinkedList<Token> tokens = new java.util.LinkedList<>();
  // The stack that keeps track of the indentation level.
  private java.util.Stack<Integer> indents = new java.util.Stack<>();
  // The amount of opened braces, brackets and parenthesis.
  private int opened = 0;
  // The most recently produced token.
  private Token lastToken = null;
  @Override
  public void emit(Token t) {
    super.setToken(t);
    tokens.offer(t);
  }

  @Override
  public Token nextToken() {
    // Check if the end-of-file is ahead and there are still some DEDENTS expected.
    if (_input.LA(1) == EOF && !this.indents.isEmpty()) {
      // Remove any trailing EOF tokens from our buffer.
      for (int i = tokens.size() - 1; i >= 0; i--) {
        if (tokens.get(i).getType() == EOF) {
          tokens.remove(i);
        }
      }

      // First emit an extra line break that serves as the end of the statement.
      this.emit(commonToken(Python3Parser.NEWLINE, "\n"));

      // Now emit as much DEDENT tokens as needed.
      while (!indents.isEmpty()) {
        this.emit(createDedent());
        indents.pop();
      }

      // Put the EOF back on the token stream.
      this.emit(commonToken(Python3Parser.EOF, "<EOF>"));
    }

    Token next = super.nextToken();

    if (next.getChannel() == Token.DEFAULT_CHANNEL) {
      // Keep track of the last token on the default channel.
      this.lastToken = next;
    }

    return tokens.isEmpty() ? next : tokens.poll();
  }

  private Token createDedent() {
    CommonToken dedent = commonToken(Python3Parser.DEDENT, "");
    dedent.setLine(this.lastToken.getLine());
    return dedent;
  }

  private CommonToken commonToken(int type, String text) {
    int stop = this.getCharIndex() - 1;
    int start = text.isEmpty() ? stop : stop - text.length() + 1;
    return new CommonToken(this._tokenFactorySourcePair, type, DEFAULT_TOKEN_CHANNEL, start, stop);
  }

  // Calculates the indentation of the provided spaces, taking the
  // following rules into account:
  //
  // "Tabs are replaced (from left to right) by one to eight spaces
  //  such that the total number of characters up to and including
  //  the replacement is a multiple of eight [...]"
  //
  //  -- https://docs.python.org/3.1/reference/lexical_analysis.html#indentation
  static int getIndentationCount(String spaces) {
    int count = 0;
    for (char ch : spaces.toCharArray()) {
      switch (ch) {
        case '\t':
          count += 8 - (count % 8);
          break;
        default:
          // A normal space char.
          count++;
      }
    }

    return count;
  }

  boolean atStartOfInput() {
    return super.getCharPositionInLine() == 0 && super.getLine() == 1;
  }
}

single_input
 : NEWLINE
 | simple_stmt
 | compound_stmt NEWLINE
 ;

// more parser rules

NEWLINE
 : ( {atStartOfInput()}?   SPACES
   | ( '\r'? '\n' | '\r' ) SPACES?
   )
   {
     String newLine = getText().replaceAll("[^\r\n]+", "");
     String spaces = getText().replaceAll("[\r\n]+", "");
     int next = _input.LA(1);
     if (opened > 0 || next == '\r' || next == '\n' || next == '#') {
       // If we're inside a list or on a blank line, ignore all indents, 
       // dedents and line breaks.
       skip();
     }
     else {
       emit(commonToken(NEWLINE, newLine));
       int indent = getIndentationCount(spaces);
       int previous = indents.isEmpty() ? 0 : indents.peek();
       if (indent == previous) {
         // skip indents of the same size as the present indent-size
         skip();
       }
       else if (indent > previous) {
         indents.push(indent);
         emit(commonToken(Python3Parser.INDENT, spaces));
       }
       else {
         // Possibly emit more than 1 DEDENT token.
         while(!indents.isEmpty() && indents.peek() > indent) {
           this.emit(createDedent());
           indents.pop();
         }
       }
     }
   }
 ;

// more lexer rules

取自: https://github.com/antlr/grammars-v4/blob/master/python3/Python3.g4

关于ANTLR 实现 python 之类的缩进依赖语法的最简单方法是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8642154/

文章推荐： wpf - 隐藏 WPF 网格列时的 Gridsplitter 行为

文章推荐： http-status-code-302 - 302找到回应

缩进
我正在使用 libxml2 的 xmlwriter api 编写一个 xml 文件。当我使用记事本打开文件时，缩进不正确。有人知道怎么解决吗？非常感谢。最佳答案我在这里有点冒进，但我会说“缩
python 缩进
我正在尝试让这个脚本工作，但它...给我缩进错误 #!/usr/bin/env python import io myfile = open('stats.txt', 'r') dan = myfil
Emacs 缩进/取消缩进当前行
我使用 Emacs 有一段时间了，我真的很想念一个古老的 Geany 快捷方式 - “C-i”和“C-u”。 “C-i”缩进整个当前行(将鼠标光标保持在原处)，“C-u”取消整个当前行的缩进。我发现
iphone - UILabel文本偏移/缩进
如何向 UILabel 内的文本添加缩进或偏移？它需要是特定的像素大小，与字体大小无关。最佳答案您可以创建另一个UILabel，然后将每个标签的框架设置为一定的宽度，这样，如果您想要实现这一目标，
Emacs Haskell 缩进
请帮我在 Emacs haskell-mode 中设置正确的缩进当我尝试输入诸如 ADT 或记录之类的内容时，按后我进入了错误的列。，然后按不会切换到右边，直到我输入 |或者 ';'! d
子项目符号点的 Visio 缩进
我在 Visio 2010 中有一个项目符号列表，我试图在其中缩进二级项目符号。例如: 我希望“子项目符号”项目向右缩进，这样很明显它是一个子元素。我认为功能区上的“增加缩进”选项可以做到这一点，但这
haskell - 为什么要解析错误？缩进？
我写了这段代码: addNums key num = add [] key num where add res a:as b:bs | a == [] = res
谷歌代码美化 xml 缩进
我在生成的 xml 文档中添加了换行符。 "\n" some text etc."\n" "\n" 这最终应该是: some text etc. 这是否可以通过 google-code-pre
java - JTabbedPane 缩进
使用 JTabbedPane 时，如何缩进选项卡？ Swing 默认输出: ------- --------- ------ | A | | B | | C | --------
javascript - CoffeeScript 缩进
我收到这些行的缩进错误有没有在线验证器可以帮助我？ showAliveTests : (pageIndex, statusFilter) -> data= pageI
将循环语句添加到现有代码时的 Python 缩进
在 Python 中，当你写了 100 行代码而忘记在某个地方添加一堆循环语句时，你会怎么做？我的意思是，如果您在某处添加一个 while 语句，您现在必须缩进它下面的所有行。这不像您可以戴上牙套并
html - CSS 缩进
我喜欢这样做，如 indesign 或 quark...段落缩进...图片如何在 html 和 css 中做到这一点的正确方法我不希望文字环绕图像...我喜欢保护整个左边的部分给图片留边距就可以了
css -
缩进
我试过添加 10px 的内边距但没有成功。你可以看到它的一个例子lower down on this page . #menu li { float: left;
python - python中简单while循环的问题(缩进)
这个问题在这里已经有了答案: I'm getting an IndentationError. How do I fix it? (6 个答案) 关闭去年。 while 1 == 1:
C# 打印空格/缩进
您好，我正在尝试使用来自 C# 应用程序的收据打印机打印帐单/收据。预期的输出是这样的: ITEM NAME QTY PRICE Banana Large Y
java - JTextPane 缩进
有没有办法在 JTextPane 中缩进一段文本？ import javax.swing.*; import java.awt.*; import javax.swing.text.StyledDoc
c - 缩进#define
我知道 #define 等通常从不缩进。为什么？我目前正在编写一些代码，其中混合了#define、#ifdef、#else s、#endifs 等。所有这些通常与普通 C 代码混合在一起。 #def
哈希数组的 YAML 缩进
我认为缩进在 YAML 中很重要。我在 irb 中测试了以下内容: > puts({1=>[1,2,3]}.to_yaml) --- 1: - 1 - 2 - 3 => nil 我期待这样的事情:
Vim:C++ 用 # 缩进
我在带有 openmp 语句的 C++ 代码中使用 Vim。而在我的 ~/.vimrc set ai " auto indent 我的问题:当我使用 openmp 语句(以 # 开头)时，光标会跳
parsing - 使用 Megaparsec 缩进
我想使用 Megaparsec 解析一种基本的缩进语言。最初我使用的是 Parsec，我设法通过缩进正常工作，但现在我遇到了一些麻烦。我一直在关注一个教程here这是我必须解析一种忽略缩进的语言的代

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

ANTLR 实现 python 之类的缩进依赖语法的最简单方法是什么？