java - 通过ErrorListener累积/收集错误以在解析后进行处理-6ren

java - 通过ErrorListener累积/收集错误以在解析后进行处理

转载作者：行者123 更新时间：2023-12-03 18:08:33

Antlr4中的ErrorListener机制非常适合记录和分析语法错误，因为语法错误是在解析过程中发生的，但是对于解析完成后的批处理错误，它可能会变得更好。解析完成后，您可能想处理错误的原因有很多，包括:

我们需要一种干净的方法来在解析过程中以编程方式检查错误并在事实发生后处理它们，

有时，一种语法错误会导致其他几种语法错误(例如，当未在线恢复时)，因此在向用户显示输出时按父上下文将这些错误分组或嵌套会很有帮助，直到您知道所有错误为止，解析完成，

，您可能希望根据错误的数量和严重程度向用户显示不同的错误，例如，退出规则的单个错误或所有已在线恢复的错误可能只是要求用户修复这些局部区域-否则，您可能需要用户编辑整个输入，并且需要具有所有错误才能确定。

底线是，如果我们知道错误发生的完整上下文(包括其他错误)，我们可以更聪明地报告和要求用户修复语法错误。为此，我有以下三个目标:

来自给定解析

的所有错误的完整集合
每种错误的

上下文信息，以及

每种错误的

严重性和恢复信息。

我已经编写了＃1和＃2的代码，并且正在寻求有关＃3的帮助。我还将建议进行一些小的更改，以使每个人都更容易＃1和＃2。

首先，要完成第一项(错误的完整集合)，我创建了CollectionErrorListener，如下所示:

public class CollectionErrorListener extends BaseErrorListener {

    private final List<SyntaxError> errors = new ArrayList<SyntaxError>();

    public List<SyntaxError> getErrors() {
        return errors;
    }

    @Override
    public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
        if (e == null) {
            // e is null when the parser was able to recover in line without exiting the surrounding rule.
            e = new InlineRecognitionException(msg, recognizer, ((Parser)recognizer).getInputStream(), ((Parser)recognizer).getContext(), (Token) offendingSymbol);
        }
        this.errors.add(new SyntaxError(msg, e));
    }  
}

这是我的InlineRecognitionException类:

public class InlineRecognitionException extends RecognitionException {

    public InlineRecognitionException(String message, Recognizer<?, ?> recognizer, IntStream input, ParserRuleContext ctx, Token offendingToken) {
        super(message, recognizer, input, ctx);
        this.setOffendingToken(offendingToken);
    }    
}

这是我的SyntaxError容器类:

public class SyntaxError extends RecognitionException {

    public SyntaxError(String message, RecognitionException e) {
        super(message, e.getRecognizer(), e.getInputStream(), (ParserRuleContext) e.getCtx());
        this.setOffendingToken(e.getOffendingToken());
        this.initCause(e);
    }
}

这与280Z28对 Antlr error/exception handling的回答所引用的SyntaxErrorListener非常相似。我需要InlineRecognitionException和SyntaxError包装器，因为如何填充CollectionErrorListener.syntaxError的参数。

首先，如果解析器从异常中恢复(不离开规则)，则RecognitionException参数“e”为null。我们不能仅仅实例化一个新的RecognitionException，因为没有构造函数或方法可以设置令人讨厌的 token 。无论如何，能够区分在线中恢复的错误(使用testof实例)是实现目标3的有用信息，因此我们可以使用InlineRecognitionException类指示在线恢复。

接下来，我们需要SyntaxError包装器类，因为即使RecognitionException“e”不为空(例如，恢复不符合要求时)，e.getMessage()的值也为空(出于某种未知的原因)。因此，我们需要将msg参数存储到CollectionErrorListener.syntaxError。因为RecognitionException上没有setMessage()修饰符方法，并且我们不能仅实例化一个新的RecognitionException(我们丢失了上一段中讨论的令人讨厌的 token 信息)，所以我们留下了子类以能够设置该消息， token ，并适当地引起。

这个机制真的很好:

    CollectionErrorListener collector = new CollectionErrorListener();
    parser.addErrorListener(collector);
    ParseTree tree = parser.prog();

    //  ...  Later ...
    for (SyntaxError e : collector.getErrors()) {
        // RecognitionExceptionUtil is my custom class discussed next.
        System.out.println(RecognitionExceptionUtil.formatVerbose(e));
    }

这到我的下一点。从RecognitionException格式化输出有点烦人。 The Definitive ANTLR 4 Reference书籍的第9章显示了如何显示质量错误消息，这意味着您需要分割输入行，反转规则调用堆栈，以及将有问题的 token 中的很多内容拼凑起来，以说明错误发生的位置。并且，如果您在分析完成后报告错误，则以下命令将不起作用:

// The following doesn't work if you are not reporting during the parse because the
// parser context is lost from the RecognitionException "e" recognizer.
List<String> stack = ((Parser)e.getRecognizer()).getRuleInvocationStack();

问题是我们丢失了RuleContext，而这对于getRuleInvocationStack是必需的。幸运的是，RecognitionException保留了上下文的副本，而getRuleInvocationStack则使用了一个参数，因此这是解析完成后如何获取规则调用堆栈的方法:

// Pass in the context from RecognitionException "e" to get the rule invocation stack
// after the parse is finished.
List<String> stack = ((Parser)e.getRecognizer()).getRuleInvocationStack(e.getCtx());

通常，如果我们在RecognitionException中使用一些便捷方法来使错误报告更加友好，那将特别好。这是我对可能是RecognitionException一部分的方法的实用工具类的首次尝试:

public class RecognitionExceptionUtil {

    public static String formatVerbose(RecognitionException e) {
        return String.format("ERROR on line %s:%s => %s%nrule stack: %s%noffending token %s => %s%n%s",
                getLineNumberString(e),
                getCharPositionInLineString(e),
                e.getMessage(),
                getRuleStackString(e),
                getOffendingTokenString(e),
                getOffendingTokenVerboseString(e),
                getErrorLineStringUnderlined(e).replaceAll("(?m)^|$", "|"));
    }

    public static String getRuleStackString(RecognitionException e) {
        if (e == null || e.getRecognizer() == null
                || e.getCtx() == null
                || e.getRecognizer().getRuleNames() == null) {
            return "";
        }
        List<String> stack = ((Parser)e.getRecognizer()).getRuleInvocationStack(e.getCtx());
        Collections.reverse(stack);
        return stack.toString();
    }

    public static String getLineNumberString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return String.format("%d", e.getOffendingToken().getLine());
    }

    public static String getCharPositionInLineString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return String.format("%d", e.getOffendingToken().getCharPositionInLine());
    }

    public static String getOffendingTokenString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return e.getOffendingToken().toString();
    }

    public static String getOffendingTokenVerboseString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return String.format("at tokenStream[%d], inputString[%d..%d] = '%s', tokenType<%d> = %s, on line %d, character %d",
                e.getOffendingToken().getTokenIndex(),
                e.getOffendingToken().getStartIndex(),
                e.getOffendingToken().getStopIndex(),
                e.getOffendingToken().getText(),
                e.getOffendingToken().getType(),
                e.getRecognizer().getTokenNames()[e.getOffendingToken().getType()],
                e.getOffendingToken().getLine(),
                e.getOffendingToken().getCharPositionInLine());
    }

    public static String getErrorLineString(RecognitionException e) {
        if (e == null || e.getRecognizer() == null
                || e.getRecognizer().getInputStream() == null
                || e.getOffendingToken() == null) {
            return "";
        }
        CommonTokenStream tokens =
            (CommonTokenStream)e.getRecognizer().getInputStream();
        String input = tokens.getTokenSource().getInputStream().toString();
        String[] lines = input.split(String.format("\r?\n"));
        return lines[e.getOffendingToken().getLine() - 1];
    }

    public static String getErrorLineStringUnderlined(RecognitionException e) {
        String errorLine = getErrorLineString(e);
        if (errorLine.isEmpty()) {
            return errorLine;
        }
        // replace tabs with single space so that charPositionInLine gives us the
        // column to start underlining.
        errorLine = errorLine.replaceAll("\t", " ");
        StringBuilder underLine = new StringBuilder(String.format("%" + errorLine.length() + "s", ""));
        int start = e.getOffendingToken().getStartIndex();
        int stop = e.getOffendingToken().getStopIndex();
        if ( start>=0 && stop>=0 ) {
            for (int i=0; i<=(stop-start); i++) {
                underLine.setCharAt(e.getOffendingToken().getCharPositionInLine() + i, '^');
            }
        }
        return String.format("%s%n%s", errorLine, underLine);
    }
}

我的RecognitionExceptionUtil有很多需要(总是返回字符串，不检查识别器的类型是否为Parser，不处理getErrorLineString中的多行，等等)，但是我希望您能理解。

我对ANTLR future 版本的建议摘要:

始终填充ANTLRErrorListener.syntaxError(包括OffendingToken)的“RecognitionException e”参数，以便我们可以在解析后收集这些异常以进行批处理。在执行此操作时，请确保将e.getMessage()设置为返回msg参数中当前的值。

为RecognitionException添加一个包含OffendingToken的构造函数。

删除ANTLRErrorListener.syntaxError的方法签名中的其他参数，因为它们将是多余的并导致混淆。

在RecognitionException中为常见的东西(如getCharPositionInLine，getLineNumber，getRuleStack和上面定义的我的RecognitionExceptionUtil类中的其余东西)添加便捷方法。当然，这些方法中的某些方法必须检查是否为null，还必须检查识别器的类型是否为Parser。

在调用ANTLRErrorListener.syntaxError时，请克隆识别器，以使我们在解析完成时不会丢失上下文(并且可以更轻松地调用getRuleInvocationStack)。

如果克隆识别器，则无需将上下文存储在RecognitionException中。我们可以对e.getCtx()进行两项更改:首先，将其重命名为e.getContext()以使其与Parser.getContext()保持一致，其次，使之成为RecognitionException(检查识别器是否是解析器的实例)。

在RecognitionException中包括有关错误的严重性以及解析器如何恢复的信息。从一开始这就是我的目标3。最好通过解析器对语法错误的分类来对语法错误进行分类。这个错误是炸毁了整个解析还是只是显示为一行？跳过/插入了多少个 token 以及哪些 token ？

因此，我正在寻找有关我的三个目标的反馈，尤其是关于收集有关目标＃3的更多信息的建议:每个错误的严重性和恢复信息。

最佳答案

我将这些建议发布到了Antlr4 GitHub问题列表中，并收到了以下回复。我相信ANTLRErrorListener.syntaxError方法包含冗余/令人困惑的参数，并且需要大量API知识才能正确使用，但是我理解该决定。这是问题的链接和文本回复的副本:

来自:https://github.com/antlr/antlr4/issues/396

关于您的建议:

填充语法错误的RecognitionException e参数:如文档中所述:

The RecognitionException is non-null for all syntax errors except when we discover mismatched token errors that we can recover from in-line, without returning from the surrounding rule (via the single token insertion and deletion mechanism).

使用令人反感的 token 向RecognitionException添加一个构造函数:这与此问题并不真正相关，将单独解决(如果有的话)。

从语法错误中删除参数:这不仅会给以前的ANTLR 4版本中实现此方法的用户带来重大更改，而且还会消除报告内联发生的错误(即没有RecognitionException是可用)。

RecognitionException中的便捷方法:这与此问题并不真正相关，将单独解决(如果有的话)。 (还要注意:记录API的现状非常困难。这只是增加了更多方法来完成已经可以轻松访问的事情，因此我反对这一更改。)

在调用语法错误时克隆识别器:这是对性能至关重要的方法，因此仅在绝对必要时才创建新对象。

“如果克隆识别器”:在调用语法错误之前，将永远不会克隆识别器。

如果您的应用程序需要此信息，可以将其存储在ANTLRErrorListener和/或ANTLRErrorStrategy的实现中的关联映射中。

我现在要关闭此问题，因为我没有看到任何需要从此列表中更改运行时的操作项。

关于java - 通过ErrorListener累积/收集错误以在解析后进行处理，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20828864/

文章推荐： sqlite - Sqlite.swift没有这样的表错误

文章推荐： python - 如何从 SQLite 表返回实际值

文章推荐： java - Android Studio-RatingBar setOnTouchListener无法正常工作

文章推荐： android - 新项目MainActivity错误

java - 累积 getClickCount()
美好的一天! 我正在制作一个出勤检查程序，单击一次时显示橙色按钮，单击两次时显示红色按钮，单击 3 次时显示黑色按钮。我在如何累积 getClickCount() 值方面遇到问题，因为对于按钮要注册
ColdFusion 更新 - 累积？
我似乎无法在 Adobe 网站上找到明确的答案。使用 ColdFusion 10,11 甚至 2016，更新(修补程序)是否累积？例如，ColdFusion 的修补程序高达 hotfix_023
随机森林回归 - 累积 MSE？
我是随机森林新手，我有一个关于回归的问题。我正在使用 R 包 randomForests 来计算 RF 模型。我的最终目标是选择对预测连续性状很重要的变量集，因此我正在计算一个模型，然后删除准确度平
java - 累积/刷新消息的设计模式
目前我们有一个发布/消费者服务，消费者将收到的消息写入 AWS S3。我们目前每月编写超过 100.000.000 个对象。但是，我们可以根据一些规则对这些消息进行分组，以节省一些钱。这些规则可以是这
haskell - 是否有不可折叠的东西的 map 累积？
假设我有一个二叉树: data BinTree a = Nil | Branch a (BinTree a) (BinTree a) 我想在这样的结构上做一个累积映射: mapAccum ::
r - 非参数逆(累积)分布函数
我正在使用内核估计，我应用了 density函数从 R 到我的数据文件(双变量)，经过几次统计处理后，我需要转换这些数据，这就是我的问题: 是否有非参数方法的逆累积分布函数？我尝试过 Google、
sql - 如何获得不同值的运行(累积)字符串聚合
不确定以前是否有人问过这个问题，尝试搜索它但找不到任何相关内容。我试图获得一个累积的字符串聚合，即仅运行不同值的聚合。这是我正在寻找的结果的示例。我尝试使用 string_agg 函数，但它仅在用
R 累积 bind.rows
我想找到累积的 bind.rows。这是我想要实现的小例子。我将使用 dslabs 包中的 gapminder 数据集进行演示。 library(tidyverse) library(dslabs)
linux - 累积 CPU 时间到底是什么
在 Linux 中使用 tomcat 进程时，我们观察到时间字段显示5506:34(累积 CPU 时间)。在探索时，这是在进程的整个生命周期中运行所花费的 CPU 时间的百分比。由于这是一个 Jav
python - 使用 pyparsing 累积
我有一些数据可以使用 pyparsing 中的 OneorMore 函数进行解析。比如， fun = OneorMore( foo.setResultsName("foo") + bar.setRe
python - Pandas 累积/元素方式
我试图弄清楚是否有一种简单的方法可以解决 pandas 的以下问题。假设我有四个容器，A、B、C、D，每个容器都有特定的体积。假设我现在得到了一定量的液体，我想用它来填充这些容器。我怎样才能想出一个“
python - numpy:累积 'greater' 操作
我正在尝试编写一个函数来检测所有上升沿 - 向量中值超过特定阈值的索引。这里描述了类似的东西:Python rising/falling edge oscilloscope-like trigger
在 R 中运行(累积)产品？
这个问题在这里已经有了答案: Multiplying elements of a column in skipping an element after each iteration (3 个答案)
python - pandas 扩展(累积)value_counts
有没有办法获取数据框中每一行的值计数？ |f1|f2| ------- v1 | a value_counts -> {a:1} v2 | a value_counts -> {a:2} v3 |
c# - mstest 如何创建对同一对象(累积)起作用的测试方法？
我目前正在尝试对我正在构建的计算器(使用复合模式)进行测试。第一种方法应该添加 75 美元，效果很好，但是当第二种方法运行时，“服务”被重置并且有0 美元作为工作成本。如果我将这两种方法合二为一，那么
java - 累积 Java Stream，然后才处理它
我有一个如下所示的文档: 数据.txt 100, "some text" 101, "more text" 102, "even more text" 我使用正则表达式处理它并返回一个新的处理文档，如
javascript - 累积 promise 值 "functionally"
假设我有这个: function getAllPromises(key: string, val: any): Promise { const subDeps = someHash[key]; c
mysql - 根据条件 fork 累积 _sum
我在 mysql 中有表“cumul_sum”，我想根据条件划分“cumulative”列，即如果此列中的值 >= 70，则这些值应存储在名为“others”的新列中"并且前面应该存放对应的sku_i
c++ - 改变 switch() 中的数据，累积
我正在做一个用 C++ 刺激 ATM 的项目，但在使用累加器时遇到了一些问题，我的问题是:我正在使用开关(这里是情况 1)来更改在包含的函数中声明的 2 个变量的值switch()，但是值只在情况 1
c++ - 累积 vector C++ 中的所有其他元素
我希望能够使用 accumulate 对 vector 中的每隔一对元素进行累加。我尝试了以下但没有成功，为非空、非零 vector 返回错误 return std::accumulate(vec.b

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 通过ErrorListener累积/收集错误以在解析后进行处理