gpt4 book ai didi

java - 如何从一个表中抓取另一表中具有相同类的数据

转载 作者:行者123 更新时间:2023-12-01 09:57:30 26 4
gpt4 key购买 nike

我必须从有许多表的网站上抓取数据并将其保存在 .csv 文件中。我只抓取一张具有 marketData 类的表的数据。但是,还有另外两个表具有相同的类别。目前,我的代码正在从具有 marketData 类的表中获取所有数据。如何从一张表中抓取数据并跳过其他表?我的代码如下。

public class ComMarket_summary {

boolean writeCSVToConsole = true;
boolean writeCSVToFile = true;
boolean sortTheList = true;
boolean writeToConsole;
boolean writeToFile;
public static Document doc = null;
public static Elements tbodyElements = null;
public static Elements elements = null;
public static Elements tdElements = null;
public static Elements trElement2 = null;
public static String Dcomma = ",";
public static String line = "";
public static ArrayList<Elements> sampleList = new ArrayList<Elements>();

public static void createConnection() throws IOException {
System.setProperty("http.proxyHost", "191.1.1.202");
System.setProperty("http.proxyPort", "8080");
String tempUrl = "http://www.psx.com.pk/phps/mktSummary.php";
doc = Jsoup.parse(new URL(tempUrl), 1000);
System.out.println("Successfully Connected");
}

public static void parsingHTML() throws Exception {

for (Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
File fold = new File("C:\\market_smry.csv");
fold.delete();
File fnew = new File("C:\\market_smry.csv");
trElement2 = trElement.getElementsByTag("tr");
tdElements = trElement.getElementsByTag("td");
FileWriter sb = new FileWriter(fnew, true);

if (table.hasClass("marketData")) {

for (Iterator<Element> it = trElement2.iterator(); it.hasNext();) {
if (it.hasNext()) {
sb.append("\r\n");

}

for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
Element tdElement2 = it.next();
final String content = tdElement2.text();
if (it2.hasNext()) {

sb.append(formatData(content));
sb.append(" , ");

}

}

System.out.println(sb.toString());
sb.flush();
sb.close();
}
}
System.out.println(sampleList.add(tdElements));

}
}
}
private static final SimpleDateFormat FORMATTER_MMM_d_yyyy = new SimpleDateFormat("MMM d, yyyy", Locale.US);
private static final SimpleDateFormat FORMATTER_dd_MMM_yyyy = new SimpleDateFormat("dd-MMM-YYYY", Locale.US);

public static String formatData(String text) {
String tmp = null;

try {
Date d = FORMATTER_MMM_d_yyyy.parse(text);
tmp = FORMATTER_dd_MMM_yyyy.format(d);
} catch (ParseException pe) {
tmp = text;
}

return tmp;
}

public static void main(String[] args) throws IOException, Exception {
createConnection();
parsingHTML();

}

P.S:我使用的是 JDK 1.8、Jre 1.8、jsoup 1.8。

最佳答案

您可以使用更具体的选择器来优化代码。

for (Element table : doc.select("table.marketData")) {
//Process table
}

如果您只想处理页面上的特定表格,可以通过其索引访问该表格。

Elements tables = doc.select("table.marketData");
Element table = tables.get(1);

关于java - 如何从一个表中抓取另一表中具有相同类的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37071382/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com