gpt4 book ai didi

regex - 在 Golang 中从 HTML 中提取文本内容

转载 作者:IT王子 更新时间:2023-10-29 01:34:02 24 4
gpt4 key购买 nike

在 Golang 中从字符串中提取内部子串的最佳方法是什么?

输入:

"Hello <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

输出:

"this is paragraph \n
this is paragraph 2"

是否有任何用于 Go 的字符串包/库已经做了类似的事情?

package main

import (
"fmt"
"strings"
)

func main() {
longString := "Hello world <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

newString := getInnerStrings("<p>", "</p>", longString)

fmt.Println(newString)
//output: this is paragraph \n
// this is paragraph 2

}
func getInnerStrings(start, end, str string) string {
//Brain Freeze
//Regex?
//Bytes Loop?
}

谢谢

最佳答案

Don't use regular expressions尝试解释 HTML。使用fully capable HTML tokenizer and parser .

我建议您阅读 this article关于 CodingHorror。

关于regex - 在 Golang 中从 HTML 中提取文本内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21000277/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com