gpt4 book ai didi

java - 使用 htmlunit 访问 html 表

转载 作者:行者123 更新时间:2023-11-30 06:30:51 25 4
gpt4 key购买 nike

我想访问 html 文件中包含的表格。这是我的代码:

  import java.io.*; 
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTable;
import com.gargoylesoftware.htmlunit.html.*;
import com.gargoylesoftware.htmlunit.WebClient;


public class test {

public static void main(String[] args) throws Exception {

WebClient client = new WebClient();
HtmlPage currentPage = client.getPage("http://www.mysite.com");
client.waitForBackgroundJavaScript(10000);



final HtmlDivision div = (HtmlDivision) currentPage.getByXPath("//div[@id='table-matches-time']");

String textSource = div.toString();
//String textSource = currentPage.asXml();

FileWriter fstream = new FileWriter("index.txt");
BufferedWriter out = new BufferedWriter(fstream);
out.write(textSource);

out.close();

client.closeAllWindows();

}

}

表格是这样的形式:

   <div id="table-matches-time" class="">
<table class=" table-main">

但是我得到这个错误:

 Exception in thread "main" java.lang.ClassCastException: java.util.ArrayList cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlDivision
at test.main(test.java:20)

我如何阅读这张表?

最佳答案

这有效(并返回一个 csv 文件 ;)):

    import java.io.*; 
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTable;
import com.gargoylesoftware.htmlunit.html.HtmlTableRow;
import com.gargoylesoftware.htmlunit.html.*;
import com.gargoylesoftware.htmlunit.WebClient;


public class test {

public static void main(String[] args) throws Exception {

WebClient client = new WebClient();
HtmlPage currentPage = client.getPage("http://www.mysite.com");
client.waitForBackgroundJavaScript(10000);

FileWriter fstream = new FileWriter("index.txt");
BufferedWriter out = new BufferedWriter(fstream);



for (int i=0;i<2;i++){

final HtmlTable table = (HtmlTable) currentPage.getByXPath("//table[@class=' table-main']").get(i);




for (final HtmlTableRow row : table.getRows()) {

for (final HtmlTableCell cell : row.getCells()) {
out.write(cell.asText()+',');
}
out.write('\n');
}

}

out.close();

client.closeAllWindows();

}

}

关于java - 使用 htmlunit 访问 html 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10110452/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com