gpt4 book ai didi

html - 从单行输出中删除 html/xml 的最简单方法

转载 作者:数据小太阳 更新时间:2023-10-29 02:19:35 25 4
gpt4 key购买 nike

我正在尝试清理 grep 的输出,如下所示:

<words>Http://www.path.com/words</words>

我试过使用...

sed 's/<.*>//' 

...删除标签,但这只会破坏整行。我不确定为什么会这样,因为每个“<”在到达内容之前都以“>”结束。

最简单的方法是什么?

谢谢!

最佳答案

为你的 sed 表达式试试这个:

sed 's/<.*>\(.*\)<\/.*>/\1/'

表达式的快速分解:

<.*>   - Match the first tag
\(.*\) - Match and save the text between the tags
<\/.*> - Match the end tag making sure to escape the / character
\1 - Output the result of the first saved match
- (the text that is matched between \( and \))

更多关于反向引用

评论中出现了一个问题,为了完整性可能应该解决。

\(\) 是 Sed 的反向引用标记。他们保存一部分匹配的表达式供以后使用。

例如,如果我们有一个输入字符串:

This has (parens) in it. In addition we can use parenslike thisparens using back-references.

我们开发一个表达式:

sed s/.*(\(.*\)).*\1\\(.*\)\1.*/\1 \2/

这给了我们:

parens like this

这到底是怎么回事?让我们分解这个表达式来找出答案。

表达式分割:

sed s/ - This is the opening tag to a sed expression.
.* - Match any character to start (as well as nothing).
( - Match a literal left parenthesis character.
\(.*\) - Match any character and save as a back-reference. In this case it will match anything between the first open and last close parenthesis in the expression.
) - Match a literal right parenthesis character.
.* - Same as above.
\1 - Match the first saved back-reference. In the case of our sample this is filled in with `parens`
\(.*\) - Same as above.
\1 - Same as above.
/ - End of the match expression. Signals transition to the output expression.
\1 \2 - Print our two back-references.
/ - End of output expression.

正如我们所见,从括号 (()) 之间获取的反向引用被替换回匹配表达式,以便能够匹配字符串parens.

关于html - 从单行输出中删除 html/xml <tags> 的最简单方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10988993/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com