gpt4 book ai didi

java - 如何解析包含多个表的页面

转载 作者:行者123 更新时间:2023-12-01 15:34:06 25 4
gpt4 key购买 nike

关于如何抓取包含多个表的网页有什么想法吗?我正在连接到网页

这是一张表,但在同一网页上有多个表

我也不知道如何阅读表格...

XML:

    <p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p> 
<div class="storyStats">
<table>
<thead>
<tr>
<th>RANK</th>
<th>CENTRES</th>
<th>TEAM</th>
<th>POS</th>
<th>GP</th>
<th>G</th>
<th>A</th>
<th>PTS</th>
<th>+/-</th>
<th>PIM</th>
<th>PPP</th>
</tr>
</thead>
<tbody>
<tr class="bg1">
<td>1.</td>
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven&nbsp;Stamkos</a></td>

<td>Tampa Bay</td>
<td>C</td>
<td align="right">81</td>
<td align="right">50</td>
<td align="right">51</td>
<td align="right">101</td>
<td align="right">-2</td>
<td align="right">56</td>
<td align="right">38</td>
</tr>


Iterator<Element> trSIter = doc.select("table")
.iterator();
while (trSIter.hasNext()) {
Element trEl = trSIter.next().child(0);
Elements tdEls = trEl.children();
Iterator<Element> tdIter = tdEls.select("tr").iterator();
System.out.println("><1><><"+tdIter);
boolean firstRow = true;
while (tdIter.hasNext()) {

Element tr = (Element) tdIter.next();


while (tdIter.hasNext()) {
int tdCount = 1;
Element tdEl = tdIter.next();
//name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();

Elements tdsEls = tdEl.select("td");
System.out.println("><2><><"+tdsEls);
Iterator<Element> columnIt = tdsEls.iterator();

while (columnIt.hasNext()) {

Element column = columnIt.next();
switch (tdCount++) {
case 1:
name =column.select("a").first().text();

break;
case 2:
stat2 = Double.parseDouble(column.text());
break;
case 3:
stat3 = Double.parseDouble(column.text());
break;
case 4:
stat4 = Double.parseDouble(column.text());
break;
case 5:
stat5 = Double.parseDouble(column.text());
break;
case 6:
stat6 = Double.parseDouble(column.text());
break;
case 7:
stat7 = Double.parseDouble(column.text());
break;
case 8:
stat8 = Double.parseDouble(column.text());
break;

最佳答案

使用下面的代码,从 HTML 解析表格似乎没有问题。

public class JsoupActivity extends Activity {
Document doc;
myHttpGet _myGet;
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
final TextView tv = (TextView)findViewById(R.id.tv1);
_myGet = new myHttpGet();
try {
doc = _myGet.doHttpGet();
Elements tdsEls = doc.getElementsByClass("storyStats");
//tv.setText(tdsEls.get(0).child(0).text());
tv.setText(String.valueOf(tdsEls.first().children().size()));
} catch (Exception e) {
e.printStackTrace();
}
}

private class myHttpGet {
Document myDom;
Connection myConnection;
Response myResponse;
public Document doHttpGet() {
myConnection = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815");
try {
myResponse = myConnection.execute();
try {
myDom = myResponse.parse();
return myDom;
} catch (IOException e) {
Log.e("napster","Parse Error");
}
} catch (IOException e) {
Log.e("napster","HTTP Error");
}
return myDom;
}
}

}

代码可以在textView中显示5,这是该HTML中storyStats类下的表格数量。如果您必须继续解析表的内容,您可以将表分配给另一个 Elements 对象并继续解析它。

Elements es = tdsEls.first().children();

安德森的答案展示了如何解析它的数据。希望有帮助。

关于java - 如何解析包含多个表的页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9190793/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com