gpt4 book ai didi

c# - 文本解析 - 我的解析器跳过命令

转载 作者:太空宇宙 更新时间:2023-11-03 20:40:40 25 4
gpt4 key购买 nike

我正在尝试解析文本格式。我想用反引号( ` )标记内联代码,就像 SO 一样。规则应该是,如果您想在内联代码元素内部使用反引号,您应该在内联代码周围使用双反引号。

像这样:

`` 用反引号 (`) 标记内联代码``

出于某种原因,我的解析器似乎完全跳过了双反引号。下面是执行内联代码解析的函数代码:

    private string ParseInlineCode(string input)
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '`' && input[i - 1] != '\\')
{
if (input[i + 1] == '`')
{
string str = ReadToCharacter('`', i + 2, input);
while (input[i + str.Length + 2] != '`')
{
str += ReadToCharacter('`', i + str.Length + 3, input);
}
string tbr = "``" + str + "``";
str = str.Replace("&", "&amp;");
str = str.Replace("<", "&lt;");
str = str.Replace(">", "&gt;");
input = input.Replace(tbr, "<code>" + str + "</code>");
i += str.Length + 13;
}
else
{
string str = ReadToCharacter('`', i + 1, input);
input = input.Replace("`" + str + "`", "<code>" + str + "</code>");
i += str.Length + 13;
}
}
}
return input;
}

如果我在某物周围使用单个反引号,它会将它包装在 <code> 中正确标记。

最佳答案

while 循环中

while (input[i + str.Length + 2] != '`')
{
str += ReadToCharacter('`', i + str.Length + 3, input);
}

你看错了索引 - i + str.Length + 2 而不是 i + str.Length + 3 - 反过来你必须添加反引号在 body 里。应该是

while (input[i + str.Length + 3] != '`')
{
str += '`' + ReadToCharacter('`', i + str.Length + 3, input);
}

但是您的代码中还有一些错误。如果输入的第一个字符是反引号,则以下行将导致 IndexOutOfRangeException

 if (input[i] == '`' && input[i - 1] != '\\')

如果输入包含奇数个分隔的反引号并且输入的最后一个字符是反引号,则以下行将导致 IndexOutOfRangeException

if (input[i + 1] == '`')

您可能应该将您的代码重构为更小的方法,而不是在一个方法中处理许多情况 - 这很容易出现错误。如果您还没有为代码编写单元测试,我强烈建议您这样做。由于各种无效输入,解析器并不是很容易测试,因此您必须做好准备,您可以查看 PEX - 一种通过分析所有分支点并尝试采用每条可能的代码路径自动为您的代码生成测试用例的工具。

我快速启动 PEX 并针对代码运行它 - 它发现了我想到的 IndexOutOfRangeException 以及更多。当然,如果输入是空引用,PEX 会发现明显的 NullReferenceExceptions。以下是 PEX 发现导致异常的输入。

case1 = "`"

case2 = "\0`"

case3 = "\0``"

case4 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0````"

case5 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`"

case6 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``<\0\0```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0`\0```````````````"

我对您代码的“修复”更改了导致异常的输入(并且可能还引入了新错误)。 PEX 在修改后的代码中捕获了以下内容。

case7 = "\0```"

case8 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0`\0"

case9 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``\0`\0`\0``"

所有三个输入在原始代码中都没有导致异常,而情况 4 和 6 在修改后的代码中不再导致异常。

关于c# - 文本解析 - 我的解析器跳过命令,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2907691/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com