gpt4 book ai didi

perl - 如何提取文本中的所有引文?

转载 作者:行者123 更新时间:2023-12-04 16:38:00 26 4
gpt4 key购买 nike

我正在寻找一个 SimpleGrepSedPerlOrPythonOneLiner 来输出文本中的所有引号。

示例 1:

echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner

标准输出:
"HAL,"
"said that everything was going extremely well.”

示例 2:
cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner

标准输出:
"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"

等等。

( link to the corresponding text )。

最佳答案

我喜欢这个:

perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'

它有点冗长,但它处理转义引号和回溯比最简单的实现要好得多。它的意思是:
my $re = qr{
" # Begin it with literal quote
(
(?> # prevent backtracking once the alternation has been
# satisfied. It either agrees or it does not. This expression
# only needs one direction, or we fail out of the branch

[^"\\] # a character that is not a dquote or a backslash
| \\+ # OR if a backslash, then any number of backslashes followed by
[^"] # something that is not a quote
| \\ # OR again a backslash
(?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
" # and a quote
)* # any number of *set* qualifying phrases
) # all batched up together
" # Ended by a literal quote
}x;

如果你不需要那么大的力量——比如说它可能只是对话而不是结构化的引号,那么
/"([^"]*)"/ 

可能和其他任何东西一样有效。

关于perl - 如何提取文本中的所有引文?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/343528/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com