gpt4 book ai didi

java - Jsoup 中的页面复制错误

转载 作者:行者123 更新时间:2023-12-01 14:20:04 25 4
gpt4 key购买 nike

我构建了一段代码,使用 Jsoup 将整个页面下载为 HTML。下载部分按预期工作。但我的问题是 - 当我打开下载的文件时,页面在浏览器中被复制多次,但我不知道出了什么问题。查看下面的代码:

public class httptest {

static File file;
String crawlingNode;
static BufferedWriter writer = null;
static httptest ht;

public httptest() throws IOException{

file = new File(//***SET HERE YOUR TEST PATH***);

}

private void GetLinks() throws IOException{

Document doc = Jsoup.connect("http://google.com/search?q=mamamia")
.userAgent("Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)")
.cookie("auth", "token")
.timeout(3000)
.get();

Elements links = doc.select("*");
String crawlingNode = links.html();
System.out.println(crawlingNode);
httptest.WriteOnFile(writer, crawlingNode);

}


private static void OpenWriter(File file){
try {
writer = new BufferedWriter(new FileWriter(file));

} catch (IOException e) {

JOptionPane.showMessageDialog(null, "Failed to open URL Writer");
e.printStackTrace();

}

}

private static void WriteOnFile(BufferedWriter writer, String crawlingNode){

try {

writer.write(crawlingNode);
} catch (IOException e) {

JOptionPane.showMessageDialog(null, "Failed to write URL Node");
e.printStackTrace();

}

}


private static void CloseWriter(BufferedWriter writer){
try {

writer.close();

} catch (IOException e) {

JOptionPane.showMessageDialog(null, "Unable to close URL Writer");
System.err.println(e);

}
}

public static void main (String[] args) throws IOException{

ht = new httptest();
httptest.OpenWriter(file);
ht.GetLinks();
httptest.CloseWriter(writer);

}

}

代码的某些部分可能看起来很奇怪,但请记住这是 SSCCE 代码版本。请问有什么可能有用的想法吗?提前致谢。

最佳答案

而不是:

Elements links = doc.select("*");
String crawlingNode = links.html();
System.out.println(crawlingNode);
httptest.WriteOnFile(writer, crawlingNode);

用途:

  Element links = doc.select("*").first();
String crawlingNode = links.html();
System.out.println(crawlingNode);
httptest.WriteOnFile(writer, crawlingNode);

我认为 Elements 类型使用起来更加复杂和详细。我发现此代码更改分析此源:http://jsoup.org/cookbook/extracting-data/attributes-text-html

无论如何,这个解决方案对我来说很有效。

关于java - Jsoup 中的页面复制错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17687492/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com