gpt4 book ai didi

c# - 使用命令列表解析自定义文件

转载 作者:太空宇宙 更新时间:2023-11-04 12:47:33 26 4
gpt4 key购买 nike

文件中有一个命令列表,如下所示:

command1 argument1 argument2
command2 argument3 argument4

结果应该是这样的

//Dictionary<command name,list of arguments>
Dictionary<string,List<string>>

当然,可以有任意数量的参数,而不仅仅是其中的两个。解析它是小菜一碟。但问题是,可以有多行参数。

command {some 
amount
of random text
} {and the second
argument} and_here_goes_argument_3

这就是它变得棘手的地方。我创建了一个带有 if 条件的 while 循环来解析这个文件,但它花了我 200 多行代码,而且完全不可读。我敢打赌有更好的方法来做到这一点。当然,我不是要你为我写代码。我只需要一个基本方法。至于语言 - 它可以是 C# 或 C++。

最佳答案

展示用正则表达式做这件事有多痛苦:

string text = @"command1 argument1 argument2
command2 argument3 argument4
command {some
amount
of random text
} {and the second
argument} and_here_goes_argument_3";

var rx = new Regex(@"^(?<command>(?:(?!\r|$)[^ ])*) +(?:(?<argument>{[^}]*}|(?!\r?$|{)(?:(?!\r|$)[^ ])+)(?: +\r?$?|\r?$))*", RegexOptions.Multiline | RegexOptions.ExplicitCapture);

var matches = rx.Matches(text);

foreach (Match match in matches)
{
Console.WriteLine($"Command: {match.Groups["command"].Value}");

foreach (Capture capture in match.Groups["argument"].Captures)
{
Console.WriteLine($" - arg: [{capture.Value}]");
}

Console.WriteLine();
}

问题是这个正则表达式既不可读 脆弱。尝试在 argument} 之后添加一个 x,例如 argument}x。处理格式错误的文本非常困难。

唯一有趣的部分是我使用 RegexOptions.Multiline 来处理多行文本,并且 $ 匹配 \n但不是我手动处理的 \r

自相矛盾的是,使用库的小型语法可能是“最简单”的解决方案...

现在是一些“真正的”代码:

private static readonly string[] commandDelimiters = new[] { " ", "\r", "\n" };

// We don't want the { to be used inside arguments that aren't in the form {...}
// Note that at this time there is no way to "escape" the }
private static readonly string[] argumentDelimiters = new[] { " ", "\r", "\n", "{" };

public static IEnumerable<Tuple<string, string[]>> ParseCommands(string str)
{
int ix = 0;
int line = 0;
int ixStartLine = 0;

var args = new List<string>();

while (ix < str.Length)
{
string command = ParseWord(str, ref ix, commandDelimiters);

if (command.Length == 0)
{
throw new Exception($"No command, at line {line}, col {ix - ixStartLine}");
}

while (true)
{
SkipSpaces(str, ref ix);

if (IsEOL(str, true, ref ix))
{
line++;
ixStartLine = ix;
break;
}

if (str[ix] == '{')
{
int ix2 = str.IndexOf('}', ix + 1);

if (ix2 == -1)
{
throw new Exception($"Unclosed {{ at line {line}, col {ix - ixStartLine}");
}

// Skipping the {
ix++;

// Skipping the }, because we don't do ix2 - ix -1
string arg = str.Substring(ix, ix2 - ix);

// We count the new lines "inside" the { }
for (int i = 0; i < arg.Length; )
{
if (IsEOL(arg, true, ref i))
{
line++;
ixStartLine = ix + i + 1;
}
else
{
i++;
}
}

// Skipping the }
ix = ix2 + 1;

// If there is no space of eol after the } then error
if (ix < str.Length && str[ix] != ' ' && !IsEOL(str, false, ref ix))
{
throw new Exception($"Unexpected character at line {line}, col {ix - ixStartLine}");
}

args.Add(arg);
}
else
{
string arg = ParseWord(str, ref ix, commandDelimiters);

// If the terminator is {, then error.
if (ix < str.Length && str[ix] == '{')
{
throw new Exception($"Unexpected character at line {line}, col {ix - ixStartLine}");
}

args.Add(arg);
}
}

var args2 = args.ToArray();
args.Clear();

yield return Tuple.Create(command, args2);
}
}

// Stops at any of terminators, doesn't "consume" it advancing ix
public static string ParseWord(string str, ref int ix, string[] terminators)
{
int start = ix;
int curr = ix;

while (curr < str.Length && !terminators.Any(x => string.CompareOrdinal(str, curr, x, 0, x.Length) == 0))
{
curr++;
}

ix = curr;
return str.Substring(start, curr - start);
}

public static bool SkipSpaces(string str, ref int ix)
{
bool atLeastOne = false;

while (ix < str.Length && str[ix] == ' ')
{
atLeastOne = true;
ix++;
}

return atLeastOne;
}

// \r\n, \r, \n, end-of-string == true
public static bool IsEOL(string str, bool advance, ref int ix)
{
if (ix == str.Length)
{
return true;
}

if (str[ix] == '\r')
{
if (advance)
{
if (ix + 1 < str.Length && str[ix + 1] == '\n')
{
ix += 2;
}

ix += 2;
}

return true;
}

if (str[ix] == '\n')
{
if (advance)
{
ix++;
}

return true;
}

return false;
}

虽然很长,但我觉得读起来还是挺清晰的。错误应该非常准确(给出了linecol)。请注意, 无法转义。以优雅的方式做到这一点很复杂。

像这样使用它:

var res = ParseCommands(text).ToArray();

关于c# - 使用命令列表解析自定义文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50675222/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com