gpt4 book ai didi

XML 解析返回带有换行符的字符串

转载 作者:IT王子 更新时间:2023-10-29 02:00:59 25 4
gpt4 key购买 nike

我正在尝试通过站点地图解析 XML,然后循环访问地址以获取 Go 中帖子的详细信息。但是我收到了这个奇怪的错误:

: first path segment in URL cannot contain colon

这是代码片段:

type SitemapIndex struct {
Locations []Location `xml:"sitemap"`
}

type Location struct {
Loc string `xml:"loc"`
}

func (l Location) String() string {
return fmt.Sprintf(l.Loc)
}

func main() {
resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
bytes, _ := ioutil.ReadAll(resp.Body)
var s SitemapIndex
xml.Unmarshal(bytes, &s)
for _, Location := range s.Locations {
fmt.Printf("Location: %s", Location.Loc)
resp, err := http.Get(Location.Loc)
fmt.Println("resp", resp)
fmt.Println("err", err)
}
}

输出:

Location: 
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp <nil>
err parse
https://www.washingtonpost.com/news-sitemaps/politics.xml
: first path segment in URL cannot contain colon
Location:
https://www.washingtonpost.com/news-sitemaps/opinions.xml
resp <nil>
err parse
https://www.washingtonpost.com/news-sitemaps/opinions.xml
: first path segment in URL cannot contain colon
...
...

我的猜测是 Location.Loc 在实际地址前后返回一个新行。例如:\n位置:https://www.washingtonpost.com/news-sitemaps/politics.xml\n

因为硬编码 URL 按预期工作:

for _, Location := range s.Locations {
fmt.Printf("Location: %s", Location.Loc)
test := "https://www.washingtonpost.com/news-sitemaps/politics.xml"
resp, err := http.Get(test)
fmt.Println("resp", resp)
fmt.Println("err", err)
}

输出,如您所见,错误为零:

Location: 
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp &{200 OK 200 HTTP/2.0 2 0 map[Server:[nginx] Arc-Service:[api] Arc-Org-Name:[washpost] Expires:[Sat, 02 Feb 2019 05:32:38 GMT] Content-Security-Policy:[upgrade-insecure-requests] Arc-Deployment:[washpost] Arc-Organization:[washpost] Cache-Control:[private, max-age=60] Arc-Context:[index] Arc-Application:[Feeds] Vary:[Accept-Encoding] Content-Type:[text/xml; charset=utf-8] Arc-Servername:[api.washpost.arcpublishing.com] Arc-Environment:[index] Arc-Org-Env:[washpost] Arc-Route:[/feeds] Date:[Sat, 02 Feb 2019 05:31:38 GMT]] 0xc000112870 -1 [] false true map[] 0xc00017c200 0xc0000ca370}
err <nil>
Location:
...
...

但我是 Go 的新手,所以我不知道哪里出了问题。你能告诉我哪里错了吗?

最佳答案

你是对的,问题出在换行符上。如您所见,您正在使用 Printf 而未添加任何 \n 并且在输出的开头添加了一个,在输出的结尾添加了一个。

您可以使用 strings.Trim删除那些换行符。这是 an example使用您要解析的站点地图。修剪字符串后,您将能够对其调用 http.Get 而不会出现任何错误。

func main() {
var s SitemapIndex
xml.Unmarshal(bytes, &s)

for _, Location := range s.Locations {
loc := strings.Trim(Location.Loc, "\n")
fmt.Printf("Location: %s\n", loc)
}
}

这段代码正确地输出了没有任何换行符的位置,正如预期的那样:

Location: https://www.washingtonpost.com/news-sitemaps/politics.xml
Location: https://www.washingtonpost.com/news-sitemaps/opinions.xml
Location: https://www.washingtonpost.com/news-sitemaps/local.xml
Location: https://www.washingtonpost.com/news-sitemaps/sports.xml
Location: https://www.washingtonpost.com/news-sitemaps/national.xml
Location: https://www.washingtonpost.com/news-sitemaps/world.xml
Location: https://www.washingtonpost.com/news-sitemaps/business.xml
Location: https://www.washingtonpost.com/news-sitemaps/technology.xml
Location: https://www.washingtonpost.com/news-sitemaps/lifestyle.xml
Location: https://www.washingtonpost.com/news-sitemaps/entertainment.xml
Location: https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml

Location.Loc 字段中有这些换行符的原因是此 URL 返回的 XML。条目遵循以下形式:

<sitemap>
<loc>
https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml
</loc>
</sitemap>

如您所见,loc 元素中的内容前后都有换行符。

关于XML 解析返回带有换行符的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54490304/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com