gpt4 book ai didi

regex - 背后的Powershell反向引用

转载 作者:行者123 更新时间:2023-12-03 01:27:32 25 4
gpt4 key购买 nike

我要匹配包含某些字符串的行两次。
整个内容如下,我将其保存到1.txt文件中。

    &nbsp;&nbsp;<b><font color="#5b4636">mit ~ und <u>Kegel</u></font></b> <span class="Icon">hum</span> <span class="Icon">fam</span> with the whole family;<br>
&nbsp;&nbsp;<b><font color="#5b4636">aus ~ern werden <u>Leute</u></font></b> <span class="Icon">prov</span> children grow up [all too] quickly;<br>
&nbsp;&nbsp;<b><font color="#5b4636">das ~ muss einen <u>Namen</u> haben</font></b> it must be called something;<br>
&nbsp;&nbsp;<b><font color="#5b4636">das ~ beim [rechten] <u>Namen</u> nennen</font></b> to call a spade a spade;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~er und <u>Narren</u></font></b> [<i><font color="black">o</font></i> <b><font color="#5b4636"><u>Betrunkene</u></font></b>] <b><font color="#5b4636">sagen die Wahrheit</font></b> (<i><font color="black">sagen die Wahrheit</font></i>) children and fools speak the truth <span class="Icon">prov</span><br>
&nbsp;&nbsp;<b><font color="#5b4636">kleine ~er, kleine <u>Sorgen</u>, große ~er, große Sorgen</font></b> (<i><font color="black">große ~er, große Sorgen</font></i>) children when they are little make parents fools, when great, mad [<i><font color="black">or</font></i> they are great they make them mad] <span class="Icon">prov</span><br>
&nbsp;&nbsp;<b><font color="#5b4636">kein ~ von <u>Traurigkeit</u> sein</font></b> <span class="Icon">sein</span> to be sb who enjoys life;<br>
&nbsp;&nbsp;<b><font color="#5b4636">ich bin kein ~ von Traurigkeit</font></b> I [like [<i><font color="black">or</font></i> know how] to] enjoy life;<br>
&nbsp;&nbsp;<b><font color="#5b4636">ein ~ seiner <u>Zeit</u> sein</font></b> to be a child of one's time;<br>
&nbsp;&nbsp;<b><font color="#5b4636">[ein] <u>gebranntes</u> ~ scheut das Feuer</font></b> once bitten, twice shy <span class="Icon">prov</span><br>
&nbsp;&nbsp;<b><font color="#5b4636">was Glücksspiele angeht, bin ich ein gebranntes ~!</font></b> I've learned my lesson as far as games of chance are concerned;<br>
&nbsp;&nbsp;<b><font color="#5b4636">bei jdm <u>lieb</u> ~ sein</font></b> <span class="Icon">fam</span> to be sb's favourite [<i><font color="black">or</font></i> blue-eyed boy] [<i><font color="black">or</font></i> girl];<br>

我的匹配字符串的代码是:
$content = Get-Content "D:\1.txt" -Encoding UTF8
foreach ($line in $content) { $line -match "(?<=$($Matches[1]).*)\(<i><font color=`"black`">([^<]*)</font></i>\)"}

False
False
False
False
False
True
False
False
False
False
False
False

仅在第6行中返回true。但是,如果我匹配它而没有回溯部分,则在第5行和第6行中都返回true。
foreach ($line in $content) { $line -match "\(<i><font color=`"black`">([^<]*)</font></i>\)"}
False
False
False
False
True
True
False
False
False
False
False
False

那么我的第一个正则表达式代码有什么问题呢?我正在使用Powershell 5.1。

最佳答案

据我所知,虽然PowerShell提供了对.NET regex engine的访问权-原则上确实允许在backreferences中使用lookaround assertions(例如\1),但它似乎在您的情况下不起作用,这归结为以下简化示例:

# !! Does NOT match, even though 'foo foo' -match '(?<=foo )(foo)' does
PS> 'foo foo' -match '(?<=\1 )(foo)'
False

推测,后向模式中的反向引用在捕获组之前匹配,因此不匹配任何内容(对尚未捕获任何东西的捕获组的反向引用从不匹配);一个可行的示例(捕获组排在第一位): 'foo foo' -match '(foo) .*(?<=\1)$'
因此,您的尝试(错误地使用 $Matches[1] [1]而不是 \1)不起作用。

您可以通过每行执行两次匹配操作来解决此问题:第一个捕获所需的短语,第二个也在初始匹配之前的字符串中查找该短语(请注意,假设只有一个每行匹配短语查找正则表达式)。
# Array of input lines.
$lines = @'
&nbsp;&nbsp;<b><font color="#5b4636">mit ~ und <u>Kegel</u></font></b> <span class="Icon">hum</span> <span class="Icon">fam</span> with the whole family;<br>
&nbsp;&nbsp;<b><font color="#5b4636">aus ~ern werden <u>Leute</u></font></b> <span class="Icon">prov</span> children grow up [all too] quickly;<br>
&nbsp;&nbsp;<b><font color="#5b4636">das ~ muss einen <u>Namen</u> haben</font></b> it must be called something;<br>
&nbsp;&nbsp;<b><font color="#5b4636">das ~ beim [rechten] <u>Namen</u> nennen</font></b> to call a spade a spade;<br>
&nbsp;&nbsp;<b><font color="#5b4636">~er und <u>Narren</u></font></b> [<i><font color="black">o</font></i> <b><font color="#5b4636"><u>Betrunkene</u></font></b>] <b><font color="#5b4636">sagen die Wahrheit</font></b> (<i><font color="black">sagen die Wahrheit</font></i>) children and fools speak the truth <span class="Icon">prov</span><br>
&nbsp;&nbsp;<b><font color="#5b4636">kleine ~er, kleine <u>Sorgen</u>, große ~er, große Sorgen</font></b> (<i><font color="black">große ~er, große Sorgen</font></i>) children when they are little make parents fools, when great, mad [<i><font color="black">or</font></i> they are great they make them mad] <span class="Icon">prov</span><br>
&nbsp;&nbsp;<b><font color="#5b4636">kein ~ von <u>Traurigkeit</u> sein</font></b> <span class="Icon">sein</span> to be sb who enjoys life;<br>
&nbsp;&nbsp;<b><font color="#5b4636">ich bin kein ~ von Traurigkeit</font></b> I [like [<i><font color="black">or</font></i> know how] to] enjoy life;<br>
&nbsp;&nbsp;<b><font color="#5b4636">ein ~ seiner <u>Zeit</u> sein</font></b> to be a child of one's time;<br>
&nbsp;&nbsp;<b><font color="#5b4636">[ein] <u>gebranntes</u> ~ scheut das Feuer</font></b> once bitten, twice shy <span class="Icon">prov</span><br>
&nbsp;&nbsp;<b><font color="#5b4636">was Glücksspiele angeht, bin ich ein gebranntes ~!</font></b> I've learned my lesson as far as games of chance are concerned;<br>
&nbsp;&nbsp;<b><font color="#5b4636">bei jdm <u>lieb</u> ~ sein</font></b> <span class="Icon">fam</span> to be sb's favourite [<i><font color="black">or</font></i> blue-eyed boy] [<i><font color="black">or</font></i> girl];<br>
'@ -split '\r?\n' #'


foreach ($line in $lines) {
# Note: To better illustrate the result, the doubled phrase
# rather than a Boolean is printed.
if (
$line -match '(?<before>.*)\(<i><font color="black">(?<phrase>[^<]+)</font></i>\)'
-and
$Matches.before -match [regex]::Escape($Matches.phrase)
) {
$Matches[0]
}
}

上面的结果(在第5行和第6行上将短语加倍匹配):

sagen die Wahrheit
große ~er, große Sorgen

[1]在PowerShell中的 automatic $Matches variable会在进行正则表达式操作后填充以反射(reflect)所捕获的内容,并且仅在匹配成功后才填充。 .NET regex引擎( -match在后台调用)纯属PowerShell功能,对此一无所知。

通过将 $($Matches[1])嵌入到用作正则表达式的可扩展字符串( "...")中,因此,您(a)在正则表达式引擎看到字符串之前扩展该值(将变量引用替换为其值),以及(b)引用什么在其第一个捕获组中捕获的最近一次成功匹配操作。

简而言之:在PowerShell中使用反向引用的唯一方法是使用.NET regex引擎的语法。例如 \1引用第一个捕获组。

关于regex - 背后的Powershell反向引用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61268985/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com