gpt4 book ai didi

java - Jsoup 解析 html 字符串

转载 作者:行者123 更新时间:2023-11-30 06:55:22 25 4
gpt4 key购买 nike

我有这个Element :

<td id="color" align="center">
Z 29.02-23.05 someText,
<br>
some.Text2 <a href="man.php?id=111">J. Smith</a> (l.)&nbsp;
</td>

如何获取标记后的文本 <br> , 看起来像 some.Text2 J. Smith我试图在文档中找到答案,但是......

更新

如果我用

System.out.println(element.select("a").text());

我只得到J。史密斯。。不幸的是,我不知道如何解析像 <br> 这样的标签。

最佳答案

Node.childNodes可以挽救你的生命:

package com.github.davidepastore.stackoverflow35436825;

import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

/**
* Stackoverflow 35436825
*
*/
public class App
{
public static void main( String[] args )
{
String html = "<html><body><table><tr><td id=\"color\" align=\"center\">" +
"Z 29.02-23.05 someText," +
"<br>" +
"some.Text2 <a href=\"man.php?id=111\">J. Smith</a> (l.)&nbsp;" +
"</td></tr></table></body></html>";
Document doc = Jsoup.parse( html );
Element td = doc.getElementById( "color" );
String text = getText( td );
System.out.println("Text: " + text);
}

/**
* Get the custom text from the given {@link Element}.
* @param element The {@link Element} from which get the custom text.
* @return Returns the custom text.
*/
private static String getText(Element element) {
String working = "";
List<Node> childNodes = element.childNodes();
boolean brFound = false;
for (int i = 0; i < childNodes.size(); i++) {
Node child = childNodes.get( i );
if (child instanceof TextNode) {
if(brFound){
working += ((TextNode) child).text();
}
}
if (child instanceof Element) {
Element childElement = (Element)child;
if(brFound){
working += childElement.text();
}
if(childElement.tagName().equals( "br" )){
brFound = true;
}
}
}
return working;
}
}

输出将是:

Text: some.Text2 J. Smith (l.) 

关于java - Jsoup 解析 html 字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35436825/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com