gpt4 book ai didi

c# - .Net Core 正则表达式、命名组、嵌套组、反向引用和惰性限定符

转载 作者:太空狗 更新时间:2023-10-29 23:36:06 25 4
gpt4 key购买 nike

我正在尝试使用 .Net Core 2.1 从看起来像标记的字符串中解析键值对。

考虑下面的示例 Program.cs 文件...

我的问题是:

1.

我怎样才能写出模式 kvp充当“键和值(如果存在)”而不是当前行为的“键或值”?

例如,在测试用例2的输出中,而不是:

=============================
input = <tag KEY1="vAl1">

--------------------
kvp[0] = KEY1
key = KEY1
value =
--------------------
kvp[1] = vAl1
key =
value = vAl1
=============================

我想看看:

=============================
input = <tag KEY1="vAl1">

--------------------
kvp[0] = KEY1="vAl1"
key = KEY1
value = vAl1
=============================

不破坏测试用例 9:

=============================
input = <tag noValue1 noValue2>

--------------------
kvp[0] = noValue1
key = noValue1
value =
--------------------
kvp[1] = noValue2
key = noValue2
value =
=============================

2.

我怎样才能写出模式 value在名为“quotes”的组匹配的下一个字符处停止匹配?换句话说,下一个平衡报价。我显然误解了反向引用的工作原理,我的理解是 \k<quotes>将替换为运行时匹配的值(不是设计时定义的模式)由 (?<quotes>[""'`]) .

例如,在测试用例5的输出中,而不是:

--------------------
kvp[4] = key3='hello,
key =
value = key3='hello,
--------------------
kvp[5] = experts
key =
value = experts
=============================

我想看看(尽管有问题 1 的解决方案):

--------------------
kvp[4] = key3
key = key3
value =
--------------------
kvp[5] = hello, "experts"
key =
value = hello, "experts"
=============================

3.

我怎样才能写出模式 value/> 之前停止匹配?在测试用例 7 中,key2 的值应该是 thing-1 .我不记得我尝试过的所有内容,但我还没有找到一种在不破坏测试用例 6 的情况下工作的模式,其中 / 值的一部分。


程序.cs

using System;
using System.Reflection;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
RegExTest();

Console.ReadLine();
}

static void RegExTest()
{
// Test Cases
var case1 = @"<tag>";
var case2 = @"<tag KEY1=""vAl1"">";
var case3 = @"<tag kEy2='val2'>";
var case4 = @"<tag key3=`VAL3`>";
var case5 = @"<tag key1='val1'

key2=""http://www.w3.org"" key3='hello, ""experts""'>";
var case6 = @"<tag :key1 =some/thing>";
var case7 = @"<tag key2=thing-1/>";
var case8 = @"<tag key3 = thing-2>";
var case9 = @"<tag noValue1 noValue2>";
var case10 = @"<tag/>";
var case11 = @"<tag />";

// A key may begin with a letter, underscore or colon, follow by
// zero or more of those, or numbers, periods, or dashs.
string key = @"(?<key>(?<=\s+)[a-z_:][a-z0-9_:\.-]*?(?=[\s=>]+))";

// A value may contain any character, and must be wrapped in balanced quotes (double, single,
// or back) if the value contains any quote, whitespace, equal, or greater- or less- than
// character.
string value = @"(?<value>((?<=(?<quotes>[""'`])).*?(?=\k<quotes>)|(?<=[=][\s]*)[^""'`\s=<>]+))";

// A key-value pair must contain a key,
// a value is optional
string kvp = $"(?<kvp>{key}|{value})"; // Without the | (pipe), it doesn't match any test case...

// ...value needs to be optional (case9), tried:
//kvp = $"(?<kvp>{key}{value}?)";
//kvp = $"(?<kvp>{key}({value}?))";
//kvp = $"(?<kvp>{key}({value})?)";
// ...each only matches key, but also matches value in case8 as key

Regex getKvps = new Regex(kvp, RegexOptions.IgnoreCase);

FormatMatches(getKvps.Matches(case1)); // OK

FormatMatches(getKvps.Matches(case2)); // OK

FormatMatches(getKvps.Matches(case3)); // OK

FormatMatches(getKvps.Matches(case4)); // OK

FormatMatches(getKvps.Matches(case5)); // Backreference and/or lazy qualifier doesn't work.

FormatMatches(getKvps.Matches(case6)); // OK

FormatMatches(getKvps.Matches(case7)); // The / is not part of the value.

FormatMatches(getKvps.Matches(case8)); // OK

FormatMatches(getKvps.Matches(case9)); // OK

FormatMatches(getKvps.Matches(case10)); // OK

FormatMatches(getKvps.Matches(case11)); // OK
}

static void FormatMatches(MatchCollection matches)
{
Console.WriteLine(new string('=', 78));

var _input = matches.GetType().GetField("_input",
BindingFlags.NonPublic |
BindingFlags.Instance)
.GetValue(matches);

Console.WriteLine($"input = {_input}");
Console.WriteLine();

if (matches.Count < 1)
{
Console.WriteLine("[kvp not matched]");
return;
}

for (int i = 0; i < matches.Count; i++)
{
Console.WriteLine(new string('-', 20));

Console.WriteLine($"kvp[{i}] = {matches[i].Groups["kvp"]}");
Console.WriteLine($"\t key\t=\t{matches[i].Groups["key"]}");
Console.WriteLine($"\tvalue\t=\t{matches[i].Groups["value"]}");
}
}
}
}

最佳答案

你可以使用

\s(?<key>[a-z_:][a-z0-9_:.-]*)(?:\s*=\s*(?:(?<q>[`'"])(?<value>.*?)\k<q>|(?<value>(?:(?!/>)[^\s`'"<>])+)))?

参见 regex demo突出显示组和 .NET regex demo (证明)。

C# 用法:

var pattern = @"\s(?<key>[a-z_:][a-z0-9_:.-]*)(?:\s*=\s*(?:(?<q>[`'""])(?<value>.*?)\k<q>|(?<value>(?:(?!/>)[^\s`'""<>])+)))?";
var matches = Regex.Matches(case, pattern, RegexOptions.IgnoreCase);
foreach (Match m in matches)
{
Console.WriteLine(m.Value); // The whole match
Console.WriteLine(m.Groups["key"].Value); // Group "key" value
Console.WriteLine(m.Groups["value"].Value); // Group "value" value
}

详情

  • \s - 一个空格
  • (?<key>[a-z_:][a-z0-9_:.-]*) - 组“键”:一封信,_:然后是 0+ 个字母、数字、_ , : , .-
  • (?:\s*=\s*(?:(?[<code>'"])(?<value>.*?)\k<q>|(?<value>(?:(?!/>)[^\s</code>'"<>])+)))? - 一次或零次出现(因此该值是可选的):
    • \s*=\s* - 一个 =用 0+ 个空格括起来
    • (?: - 非捕获组的开始:
      • (?[`'"]) - 分隔符,` , '"
      • (?<value>.*?) - 将匹配除换行字符以外的任何 0+ 个字符的“值”分组尽可能少
      • \k<q> - 对组“q”的反向引用,相同的值必须匹配
    • | - 或者
      • <code>(?<value></code>(?:(?!/>)[^\s`'"<>])+) - 组“值”:空格以外的字符,` , ' , " , <> ,出现 1 次或多次,不会启动 />字符序列
  • ) - 非捕获组结束。

关于c# - .Net Core 正则表达式、命名组、嵌套组、反向引用和惰性限定符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53365251/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com