java - 为什么我的文本解析器会进入无限循环，尽管循环已明确中断？-6ren

java - 为什么我的文本解析器会进入无限循环，尽管循环已明确中断？

转载作者：行者123 更新时间：2023-12-01 21:11:15

我一直在开发一个实用程序，用于解析 Paradox Interactive 在其大战略游戏中使用的格式的文本文件，以便与我也在开发的基于视觉的修改工具一起使用。我写出了一个大部分实现的、粗糙的、早期版本的解析器，它基本上按预期工作。这是我第二次尝试编写文本解析器(第一次，最终工作得很好，解析了 XML 的子集)。

我在 9 号快速编写了我的解析器，并花了整个周末尝试调试它，但我所有的努力都失败了。我已将问题追溯到 nextChar() 的第三行。它抛出一个 ArrayIndexOutOfBounds 错误，错误的数字非常小(-2 百万)。添加边界检查后，程序就......继续。它根据需要读取所有信息，只是永远不会退出解析循环。

格式基本上是这样的:

car = {
    model_year = 1966
    model_name = "Chevy"
    components = {
        "engine", "frame", "muffler"
    }
}

虽然我还没有像我计划的那样添加对嵌套列表的支持，所以我的测试字符串是:

car = {
    model_year = 1966
    model_name = "Chevy"
}

为了我的理解和任何会看到我的代码的人，我尝试在我认为可能有必要的地方慷慨地评论我的代码，但如果需要任何澄清，我很乐意提供。

我的代码:

/**
 * Parses text files in the format used by Paradox Interactive in their computer games EUIV, CK2, and Stellaris.
 * 
 * @author DJMethaneMan 
 * @date 12/9/2016
 */
public class Parser
{
    private int pos, line, len, depth;
    public String text;
    private char[] script; //TODO: Initialize in the parse method

    public Parser()
    {
        pos = 0;
        line = 1;
        len = 0;
        depth = 0;
        text = "car = {\n" +
               "    model_year = 1966 \n" +
               "    model_name = \"Chevy\"\n" +
               "}\u0003";
        //text = "Hello World";
        //Car c = new Car();
        //parse(text, c);
    }

    public static void main()
    {
        Car c = new Car();
        Parser p = new Parser();
        p.parse(p.text, c);
        System.out.println("The model name is " + c.model_name);
        System.out.println("The model year is " + c.model_year);
    }

    //TODO: Work
    public void parse(String text, Parseable parsed)
    {
        char[] script = text.toCharArray();
        this.script = script;
        boolean next_char = false;
        PARSE_LOOP:while(true)
        {
            char c;
            if(next_char)
            {
                c = nextChar();
            }
            else
            {
                c = script[0];
                next_char = true;
            }

            switch(c)
            {
                case 'A':
                case 'a':
                case 'B':
                case 'b':
                case 'C':
                case 'c':
                case 'D':
                case 'd':
                case 'E':
                case 'e':
                case 'F':
                case 'f':
                case 'G':
                case 'g':
                case 'H':
                case 'h':
                case 'I':
                case 'i':
                case 'J':
                case 'j':
                case 'K':
                case 'k':
                case 'L':
                case 'l':
                case 'M':
                case 'm':
                case 'N':
                case 'n':
                case 'O':
                case 'o':
                case 'P':
                case 'p':
                case 'Q':
                case 'q':
                case 'R':
                case 'r':
                case 'S':
                case 's':
                case 'T':
                case 't':
                case 'U':
                case 'u':
                case 'V':
                case 'v':
                case 'W':
                case 'w':
                case 'X':
                case 'x':
                case 'Y':
                case 'y':
                case 'Z':
                case 'z':
                case '_'://TODO: HERE
                    if(depth > 0) //
                    {
                        parsed.parseRead(buildWordToken(true), this);//Let the class decide how to handle this information. Best solution since I do not know how to implement automatic deserialization.
                    }
                    continueUntilChar('=', false); //A value must be assigned because it is basically a key value pair with {} or a string or number as the value
                    skipWhitespace();//Skip any trailing whitespace straight to the next token.
                    break;
                case '{':
                    depth++;
                    break;
                case '}':
                    depth--;
                    break;
                case '\n':
                    line++;
                    break;
                case ' ':
                case '\t':
                    skipWhitespace();
                    break;
                case '\u0003': //End of Text Character... Not sure if it will work in a file...
                    break PARSE_LOOP;
            }
        }
    }

    //Returns a string from the next valid token
    public String parseString()
    {
        String retval = "";
        continueUntilChar('=', false);
        continueUntilChar('"', false);
        retval = buildWordToken(false);
        continueUntilChar('"', false); //Don't rewind because we want to skip over the quotation and not append it.
        return retval;
    }

    //Returns a double from the next valid token
    public double parseNumber()
    {
        double retval = 0;
        continueUntilChar('=', false); //False because we don't want to include the = in any parsing...
        skipWhitespace(); //In case we encounter whitespace.
        try
        {
            retval = Double.parseDouble(buildNumberToken(false));
        }
        catch(Exception e)
        {
            System.out.println("A token at line " + line + " is not a valid number but is being passed as such.");
        }
        return retval;
    }


    /**********************************Utility Methods for Parsing****************************************/

    protected void continueUntilChar(char target, boolean rewind)
    {
        while(true)
        {
            char c = nextChar();
            if(c == target)
            {
                break;
            }
        }
        if(rewind)
        {
            pos--;
        }
    }

    protected void skipWhitespace()
    {
        while(true)
        {
            char c = nextChar();
            if(!Character.isWhitespace(c))
            {
                break;
            }
        }
        pos--;//Rewind because by default parse increments pos by 1 one when fetching nextChar each iteration.
    }

    protected String buildNumberToken(boolean rewind)
    {
        StringBuilder token = new StringBuilder();
        String retval = "INVALID_NUMBER";
        char token_start = script[pos];
        System.out.println(token_start + " is a valid char for a word token."); //Print it.
        token.append(token_start);
        while(true)
        {
            char c = nextChar();
            if(Character.isDigit(c) || (c == '.' && (Character.isDigit(peek(1)) || Character.isDigit(rewind(1))))) //Makes sure things like 1... and ...1234 don't get parsed as numbers.
            {
                token.append(c);
                System.out.println(c + " is a valid char for a word token."); //Print it for debugging
            }
            else
            {
                break;
            }
        }
        return retval;
    }

    protected String buildWordToken(boolean rewind)
    {
        StringBuilder token = new StringBuilder(); //Used to build the token
        char token_start = script[pos]; //The char the parser first found would make this a valid token
        token.append(token_start); //Add said char since it is part of the token
        System.out.println(token_start + " is a valid char for a word token."); //Print it.
        while(true)
        {
            char c = nextChar();
            if(Character.isAlphabetic(c) || Character.isDigit(c) || c == '_')//Make sure it is a valid token for a word
            {
                System.out.println(c + " is a valid char for a word token."); //Print it for debugging
                token.append(c); //Add it to the token since its valid
            }
            else
            {
                if(rewind)//If leaving the method will make this skip over a valid token set this to true.
                {
                    //Rewind by 1 because the main loop in parse() will still check pos++ and we want to check the pos of the next char after the end of the token.
                    pos--;
                    break; //Leave the loop and return the token.
                }
                else //Otherwise
                {
                    break; //Just leave the loop and return the token.
                }
            }
        }
        return token.toString(); //Get the string value of the token and return it.
    }

    //Returns the next char in the script by amount but does not increment pos.
    protected char peek(int amount)
    {
        int lookahead = pos + amount; //pos + 1;
        char retval = '\u0003'; //End of text character
        if(lookahead < script.length)//Make sure lookahead is in bounds.
        {
            retval = script[lookahead]; //Return the char at the lookahead.
        }
        return retval; //Return it.
    }

    //Returns the previous char in the script by amount but does not decrement pos.
    //Basically see peek only this is the exact opposite.
    protected char rewind(int amount)
    {
        int lookbehind = pos - amount; //pos + 1;
        char retval = '\u0003';
        if(lookbehind > 0)
        {
            retval = script[lookbehind];
        }
        return retval;
    }

    //Returns the next character in the script.
    protected char nextChar()
    {
        char retval = '\u0003';
        pos++;
        if(pos < script.length && !(pos < 0))
        {
            retval = script[pos]; //It says this is causing an ArrayIndexOutOfBoundsException with the following message. Shows a very large (small?) negative number.
        }
        return retval;
    }
}

//TODO: Extend
interface Parseable
{
    public void parseRead(String token, Parser p);
    public void parseWrite(ParseWriter writer);
}


//TODO: Work on
class ParseWriter
{

}

class Car implements Parseable
{
    public String model_name;
    public int model_year;

    @Override
    public void parseRead(String token, Parser p)
    {
        if(token.equals("model_year"))
        {
            model_year = (int)p.parseNumber();
        }
        else if(token.equals("model_name"))
        {
            model_name = p.parseString();
        }
    }

    @Override
    public void parseWrite(ParseWriter writer)
    {
        //TODO: Implement along with the ParseWriter
    }
}

最佳答案

使用带标签的break语句break PARSE_LOOP;通常被认为是不好的做法。您本质上是在编写一个“goto”语句:每当满足 break PARSE_LOOP; 条件时，它就会跳回到 while 循环的开头(因为那是您编写 PARSE_LOOP: 的地方) >)。这可能就是你无限循环的原因。我也不明白为什么你要重新启动一个已经无限的 while 循环(while true)。

将代码更改为:

 public void parse(String text, Parseable parsed)
        {
            char[] script = text.toCharArray();
            this.script = script;
            boolean next_char = false;
            boolean parsing = true;

            while(parsing)
            {
                char c;
                if(next_char)
                {
                    c = nextChar();
                }
                else
                {
                    c = script[0];
                    next_char = true;
                }

                switch(c)
                {
                    case 'A':
                    case 'a':
                    case 'B':
                    case 'b':
                    case 'C':
                    case 'c':
                    case 'D':
                    case 'd':
                    case 'E':
                    case 'e':
                    case 'F':
                    case 'f':
                    case 'G':
                    case 'g':
                    case 'H':
                    case 'h':
                    case 'I':
                    case 'i':
                    case 'J':
                    case 'j':
                    case 'K':
                    case 'k':
                    case 'L':
                    case 'l':
                    case 'M':
                    case 'm':
                    case 'N':
                    case 'n':
                    case 'O':
                    case 'o':
                    case 'P':
                    case 'p':
                    case 'Q':
                    case 'q':
                    case 'R':
                    case 'r':
                    case 'S':
                    case 's':
                    case 'T':
                    case 't':
                    case 'U':
                    case 'u':
                    case 'V':
                    case 'v':
                    case 'W':
                    case 'w':
                    case 'X':
                    case 'x':
                    case 'Y':
                    case 'y':
                    case 'Z':
                    case 'z':
                    case '_'://TODO: HERE
                        if(depth > 0) //
                        {
                            parsed.parseRead(buildWordToken(true), this);//Let the class decide how to handle this information. Best solution since I do not know how to implement automatic deserialization.
                        }
                        continueUntilChar('=', false); //A value must be assigned because it is basically a key value pair with {} or a string or number as the value
                        skipWhitespace();//Skip any trailing whitespace straight to the next token.
                        break;
                    case '{':
                        depth++;
                        break;
                    case '}':
                        depth--;
                        break;
                    case '\n':
                        line++;
                        break;
                    case ' ':
                    case '\t':
                        skipWhitespace();
                        break;
                    case '\u0003': //End of Text Character... Not sure if it will work in a file...
                        parsing = false;
                        break;
                }
            }
        }

关于java - 为什么我的文本解析器会进入无限循环，尽管循环已明确中断？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41107323/

文章推荐： java - 按值匹配类列表

文章推荐： Java 行号表 : Entry explanation

文章推荐： JavaFX 只能输入数字或字母，并允许按 TAB/ESCAPE/BACKSPACE 等

文章推荐： java - 如何使用 stax/stax2 获取 XML 元素路径？

明确 Windows 中的权限
我在 linux 上工作。我对windows没有太多想法。 windows中文件的权限是如何组织的？我们在unix中是否有像chmod这样的api来更改权限？最佳答案对于 Windows，有一个名
api - 明确 API 的含义
应用程序编程接口(interface) (API) 是一组用于访问基于 Web 的软件应用程序的编程指令和标准。如果出现，有人可以向我解释一下吗？谷歌地图或优酷这是API哪个是softwar
android - 如何使隐含的 Intent 明确？
我有两个应用程序，A 和 B，它们使用 android 库 C。B 有一个服务 A 想通过 C 使用，例如在我的库中有一个类试图将它绑定(bind)到服务，
java - 明确 Intent 操作默认值
我正在正常或安全模式下启动相机应用程序，具体取决于使用我的应用程序执行的手势，但一旦用户选择应用程序并点击始终，则没有选项可以更改默认值，即使是从 Android 的设置菜单中也是如此. camera
python:我可以在不(明确)使用整数索引的情况下获得稀疏矩阵表示吗？
我有一个数据集，本质上是一个稀疏二进制矩阵，表示两个集合的元素之间的关系。例如，让第一组是人(用他们的名字表示)，例如像这样的东西: people = set(['john','jane','mike
让你的python代码更加pythonic(简练、明确、优雅)
何为pythonic? pythonic如果翻译成中文的话就是很python。很+名词结构的用法在中国不少，比如：很娘，很国足，很CCTV等等。我的理解为，很+名词表达了一种特殊和强调的意味。
prolog - 明确 Prolog 目标的 "deterministic success"
某些 Prolog 目标的确定性成功问题已经一次又一次地出现在 - 至少 - 以下问题: Reification of term equality/inequality Intersection an
c# - DateTime.TryParse 可以(明确)识别哪些格式？
我指的是 DateTime.TryParse(string s, out DateTime result) 重载，它尝试从字符串中解析 DateTime - 没有特定的格式正在指定。我可以从http
《数据资产》专题：《数据资产、数据权属、数据产权》如何估值、明确、保护？
2020 年 04 月 10 日，《中共中央国务院关于构建更加完善的要素市场化配置体制机制的意见》正式公布，将数据确立为五大生产要素（土地、资本、劳动力以及技术）之
ios - 明确 NSNotification 的 addObserver 函数中 notificationSender 的用途
有人可以解释一下 NSNotification 的 addObserver 函数中 notificationSender 的用途吗？这是 Apple 文档的解释: notificationSende

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 为什么我的文本解析器会进入无限循环，尽管循环已明确中断？