gpt4 book ai didi

java - 使用 Jsoup API : [parse(File in, String charsetName, String baseUri)]

转载 作者:行者123 更新时间:2023-12-02 12:44:02 27 4
gpt4 key购买 nike

我正在使用 jsoup 来解析一些 HTML,但我不知道为什么我没有得到我期望的结果。

Q1:我希望输出 URL 为 http://example.com/input/img.jpg ,但它输出 http://example.com/img.jpg .

Q2:生成的 HTML 为 <img src="/img.jpg"> ,但我希望它是 <img src="http://example.com/img.jpg"> .

输入 HTML 文件

<!-- HTML file -->
<!DOCTYPE html>
<html>
<head>
<title>JsoupInputTest</title>
<meta charset="UTF-8">
</head>
<body>
<div id="mydiv">test parsing input file by jsoup</div>
<img src="/img.jpg">
<a href="/a.jpg">s1 test</a>
</body>
</html>

代码

public static void inputTest() throws IOException{

String fileName = "../inputTest.html";
File in = new File(fileName);
Document doc = Jsoup.parse(in, "UTF-8", "http://example.com/input/");

System.out.println(doc.select("img").first().absUrl("src"));
System.out.println(doc.select("a[href]").first().absUrl("href"));

System.out.println("====================================");

System.out.println(doc.html());
}

输出

http://example.com/img.jpg
http://example.com/a.jpg
====================================`enter code here`
<!-- HTML file --><!doctype html>
<html>
<head>
<title>JsoupInputTest</title>
<meta charset="UTF-8">
</head>
<body>
<div id="mydiv">
test parsing input file by jsoup
</div>
<img src="/img.jpg">
<a href="/a.jpg">s1 test</a>
</body>
</html>

最佳答案

Jsoup作者Jhy的回答:#908

Q1: both of those are absolute URLs with implicit domains -- the URL starts with a single slash. So only the hostname from the base HREF (the third argument to Jsoup.parse) is used. The output is correct.

See https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL#Absolute_URLs_vs_relative_URLs for some more details on how URLs are made absolute.

Q2: calling absUrl() doesn't change the value in the DOM; it's calculating the absolute URL, not updating it.

关于java - 使用 Jsoup API : [parse(File in, String charsetName, String baseUri)],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44847302/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com