gpt4 book ai didi

Java:正则表达式删除列表的 wiki 标记

转载 作者:行者123 更新时间:2023-12-01 04:18:42 27 4
gpt4 key购买 nike

我正在阅读一个维基百科 XML 文件,其中我必须删除任何列表项。例如。对于以下字符串:

String text = ": definition list\n
** some list item\n
# another list item\n
[[Category:1918 births]]\n
[[Category:2005 deaths]]\n
[[Category:Scottish female singers]]\n
[[Category:Billy Cotton Band Show]]\n
[[Category:Deaths from Alzheimer's disease]]\n
[[Category:People from Glasgow]]";

在这里,我想删除*#:,但不删除它所说的类别。输出应如下所示:

String outtext = "definition list\n
some list item\n
another list item\n
[[Category:1918 births]]\n
[[Category:2005 deaths]]\n
[[Category:Scottish female singers]]\n
[[Category:Billy Cotton Band Show]]\n
[[Category:Deaths from Alzheimer's disease]]\n
[[Category:People from Glasgow]]";

我正在使用以下代码:

Pattern pattern = Pattern.compile("(^\\*+|#+|;|:)(.+)$");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
String outtext = matcher.group(0);
outtext = outtext.replaceAll("(^\\*+|#+|;|:)\\s", "");
return(outtext);
}

这不起作用。你能指出我应该怎么做吗?

最佳答案

这应该有效:

text = text.replaceAll("(?m)^[*:#]+\\s*", "");

重要的是使用 (?m) 表示 MULTILINE此处的模式允许您为每行使用行开始/结束 anchor 。

输出:

definition list
some list item
another list item
[[Category:1918 births]]
[[Category:2005 deaths]]
[[Category:Scottish female singers]]
[[Category:Billy Cotton Band Show]]
[[Category:Deaths from Alzheimer's disease]]
[[Category:People from Glasgow]]

关于Java:正则表达式删除列表的 wiki 标记,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19196892/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com