gpt4 book ai didi

java - Jsoup:如何在 2 个标题标签之间获取所有 html

转载 作者:塔克拉玛干 更新时间:2023-11-01 21:42:21 24 4
gpt4 key购买 nike

我正在尝试获取 2 个 h1 标签之间的所有 html。实际任务是根据 h1(标题 1)标签将 html 分成框架(章节)。

感谢任何帮助。

谢谢苏尼尔

最佳答案

如果你想获取和处理两个连续的 h1 标签之间的所有元素,你可以处理 sibling 。下面是一些示例代码:

public static void h1s() {
String html = "<html>" +
"<head></head>" +
"<body>" +
" <h1>title 1</h1>" +
" <p>hello 1</p>" +
" <table>" +
" <tr>" +
" <td>hello</td>" +
" <td>world</td>" +
" <td>1</td>" +
" </tr>" +
" </table>" +
" <h1>title 2</h1>" +
" <p>hello 2</p>" +
" <table>" +
" <tr>" +
" <td>hello</td>" +
" <td>world</td>" +
" <td>2</td>" +
" </tr>" +
" </table>" +
" <h1>title 3</h1>" +
" <p>hello 3</p>" +
" <table>" +
" <tr>" +
" <td>hello</td>" +
" <td>world</td>" +
" <td>3</td>" +
" </tr>" +
" </table>" +
"</body>" +
"</html>";
Document doc = Jsoup.parse(html);
Element firstH1 = doc.select("h1").first();
Elements siblings = firstH1.siblingElements();
List<Element> elementsBetween = new ArrayList<Element>();
for (int i = 1; i < siblings.size(); i++) {
Element sibling = siblings.get(i);
if (! "h1".equals(sibling.tagName()))
elementsBetween.add(sibling);
else {
processElementsBetween(elementsBetween);
elementsBetween.clear();
}
}
if (! elementsBetween.isEmpty())
processElementsBetween(elementsBetween);
}

private static void processElementsBetween(
List<Element> elementsBetween) {
System.out.println("---");
for (Element element : elementsBetween) {
System.out.println(element);
}
}

关于java - Jsoup:如何在 2 个标题标签之间获取所有 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6534456/

24 4 0