gpt4 book ai didi

java - 如何剥离从网络收获中获得的文本的一部分

转载 作者:行者123 更新时间:2023-12-01 14:07:49 26 4
gpt4 key购买 nike

我是 webharvest 的新手,正在使用它从网站获取文章数据,使用以下语句:

let $text := data($doc//div[@id="articleBody"])

这是我从上述语句中得到的数据:

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people

Notable current and former residents of Pittstown include:

我的问题是,是否可以使用该配置删除“名人”之后的全部内容。这样可以吗?如果可能的话请告诉我如何做。谢谢。

编辑:所需的输出:

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people

最佳答案

您只需更改 let 语句,例如:

let $text := substring-before(data($doc//div[@id="articleBody"]/text()), '名人')

得到你想要的输出

关于java - 如何剥离从网络收获中获得的文本的一部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18757780/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com