gpt4 book ai didi

java - 使用 jsoup 提取 html 代码 - 两个彼此相邻的 span 标签

转载 作者:行者123 更新时间:2023-12-01 08:54:32 24 4
gpt4 key购买 nike

我的 html 代码如下所示:

<div class="cloud recommended">
<div id="bigcloud" class="eventwithfoto">
<h1>Artist Name</h1>
<div id="eventphoto">
<a href="http://linkToPhoto.jpg" target="_top" rel="lightbox"><img src="http://linkToPhoto.jpg" height="150"></a>
</div>

<div id="eventmain" style="margin-top: 12px;">
<p id="eventwhere"><span><b>Name of place<br></b></span><span>Address of place</span>
<br> tel.: +48 111 222 111 <br><a href="http://www.linktoplace.com" target="_blank">http://www.linktoplace.com</a> </p>
<p id="eventdate">2017-04-20 godz. 20:00</p>


<div id="eventadmission">
120 zł
</div>

</div>

<div class="clear"></div>
<div id="eventdesc">
Here is some descr<br/>Some other descr
<div class="clear"></div>

<br>
<a href="http://link.com" target="_blank">link to event</a>
</div>
</div>
</div>

现在我想用 JSoup 解析它以获取必要的信息。

我创建了以下方法:

System.out.println(address);
Document doc = Jsoup.connect(address).timeout(10*1000).get();


String placePhoneNo = doc.select("p#eventwhere > br").text();
String placeAddress = doc.getElementsByTag("#eventwhere").text();

但每个字符串都是空的。我在这里做错了什么?我如何解析这个特定的 html 格式来获取这些变量?

最佳答案

编写了一个小示例来演示它:

package sandbox.jsoup;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JsoupMain {
private final String HTML = "<div class=\"cloud recommended\">\n" +
" <div id=\"bigcloud\" class=\"eventwithfoto\">\n" +
" <h1>Artist Name</h1>\n" +
" <div id=\"eventphoto\">\n" +
" <a href=\"http://linkToPhoto.jpg\" target=\"_top\" rel=\"lightbox\"><img src=\"http://linkToPhoto.jpg\" height=\"150\"></a>\n" +
" </div>\n" +
"\n" +
" <div id=\"eventmain\" style=\"margin-top: 12px;\">\n" +
" <p id=\"eventwhere\"><span><b>Name of place<br></b></span><span>Address of place</span>\n" +
" <br> tel.: +48 111 222 111 <br><a href=\"http://www.linktoplace.com\" target=\"_blank\">http://www.linktoplace.com</a> </p>\n" +
" <p id=\"eventdate\">2017-04-20 godz. 20:00</p>\n" +
"\n" +
"\n" +
" <div id=\"eventadmission\">\n" +
" 120 zł\n" +
" </div>\n" +
"\n" +
" </div>\n" +
"\n" +
" <div class=\"clear\"></div>\n" +
" <div id=\"eventdesc\">\n" +
" Here is some descr<br/>Some other descr \n" +
" <div class=\"clear\"></div>\n" +
"\n" +
" <br>\n" +
" <a href=\"http://link.com\" target=\"_blank\">link to event</a>\n" +
" </div>\n" +
" </div>\n" +
"</div>";

public static void main(String[] args) {
new JsoupMain().findTwoSpans();
}

private void findTwoSpans() {
Document doc = Jsoup.parse(HTML);
Element eventWhere = doc.getElementById("eventwhere");
Elements spans = eventWhere.select("span");
System.out.println("span[0]="+spans.get(0).text());
Element spanTwo = spans.get(1);
System.out.println("span[1]="+spanTwo.text());

// Get phone
Element eventMain = doc.getElementById("eventmain");
String textMain = eventMain.after(spanTwo).after("br").text();

int beginPos = textMain.indexOf("tel.: ");
int endPos = textMain.indexOf(" http://");
if (beginPos>0 && endPos>0) {
String phone = textMain.substring(beginPos+6, endPos);
System.out.println("Found phone: "+phone);
}
else {
System.out.println("Phone not found: "+textMain);
}
}
}

输出

span[0]=Name of place
span[1]=Address of place
Found phone: +48 111 222 111

关于java - 使用 jsoup 提取 html 代码 - 两个彼此相邻的 span 标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42137881/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com