gpt4 book ai didi

java - 如何使用具有多个类名元素的 Jsoup 解析 html 文件?

转载 作者:行者123 更新时间:2023-11-30 04:00:52 26 4
gpt4 key购买 nike

下面的java代码适用于带有类的html文件,例如css-sched-table-title。

但是我在 html 文件中需要查找多个类名,例如 css-sched-waypoints 、 css-sched-times 。如何使用 jsoup 中的 getElementsByClass 方法组合搜索。我不想多次编写代码,因为我想保留顺序。我的观点是我想要类似的东西

doc.getElementsByClass("css-sched-table-title"|| doc.getElementsByClass("css-sched-waypoints");

    Document doc = Jsoup.parse(content);

Elements ele = doc.getElementsByClass("css-sched-table-title");
for (Element link : ele) {

String linkText = link.text();
System.out.println(linkText);

}

<tr ALIGN="CENTER">
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:15</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:20</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:24</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:34</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:34</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:40</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:46</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">6:54</td>
</tr>
<tr VALIGN="BOTTOM">
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Townline and Southern</TD>
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and Blueridge</TD>
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and South Fraser</TD>
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Ar. Bourquin Exchange</TD>
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Lv. Bourquin Exchange</TD>
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Downtown Abbotsford</TD>
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">McMillan and Old Yale</TD>
<TD>&nbsp;</TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Sandy Hill and Old Clayburn</TD>
</tr>

<tr ALIGN="CENTER">
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:12</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:17</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:21</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:31</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:34</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:40</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:46</td>
<td CLASS="css-sched-times">&nbsp;</td>
<td CLASS="css-sched-times">8:54</td>
</tr>

最佳答案

从您之前的查询中获取线索,当我尝试通过有效的 Selector 语法组合 3 个 td 时,我得到了您期望的结果。

doc.select("td[class=css-sched-table-title], td[class=css-sched-waypoints], td[class=css-sched-times]")

注意,您可以在选择器语法中组合多个条件,如下所示 Elements row = doc.select("td[class=css-sched-table-title], td[class =css-sched-waypoints], td[class=css-sched-times]"); 这实际上成为您的 OR 运算符。

Elements row = doc.select("td[class=css-sched-table-title], td[class=css-sched-waypoints], td[class=css-sched-times]");
System.out.println("::Total Count::" + row.size());

Iterator<Element> iterator = row.listIterator();
while (iterator.hasNext()) {
Element element = iterator.next();
String id = element.attr("id");
String classes = element.attr("class");
String value = element.text();
System.out.println("Id : " + id + ", classes : " + classes
+ ", value : " + value);
}

给予,

::Total Count::25
Id : , classes : css-sched-table-title, value : Saturday - Afternoon
Id : , classes : css-sched-waypoints, value : Townline and Southern
Id : , classes : css-sched-waypoints, value : Clearbrook and Blueridge
Id : , classes : css-sched-waypoints, value : Clearbrook and South Fraser
Id : , classes : css-sched-waypoints, value : Ar. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Lv. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Downtown Abbotsford
Id : , classes : css-sched-waypoints, value : McMillan and Old Yale
Id : , classes : css-sched-waypoints, value : Sandy Hill and Old Clayburn
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:15
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:20
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:24
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:34
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:34
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:40
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:46
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:54

有关Selector语法的详细用法请参阅here.

关于java - 如何使用具有多个类名元素的 Jsoup 解析 html 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22011133/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com