gpt4 book ai didi

java - 如何在网络收获中从字符串中减去子字符串

转载 作者:行者123 更新时间:2023-12-01 14:06:22 25 4
gpt4 key购买 nike

我是 webharvest 的新手,正在使用它从网站获取文章数据,使用以下语句:

let $text := data($doc//div[@id="articleBody"])

这是我从上述语句中得到的数据:

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people

Notable current and former residents of Pittstown include:

我的问题是,是否可以从另一个字符串中减去一个字符串在上面的示例中:内容中的“名人”。

这样可以吗?如果可能的话请告诉我如何做。谢谢。我可以这样做吗:

if (*contains*($text, 'Notable people')) then $text := *minus*($text, 'Notable people') 

contains 是一个示例函数名称,用于确定一个字符串是否是另一个字符串的子字符串, minus 是一个示例函数名称,用于从另一个字符串中删除子字符串

所需的输出:

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable current and former residents of Pittstown include:

最佳答案

来自http://web-harvest.sourceforge.net/manual.php :

regexp

Searches the body for the given regular expression and optionally replaces found occurrences with specified pattern.If body is a list of values then the regexp processor is applied to every item and final execution result is the list.

您只需使用正确的正则表达式、正确的regexp-pattern 和正确的regexp-result

关于java - 如何在网络收获中从字符串中减去子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18866630/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com