gpt4 book ai didi

java - 如何使用 Java 将单个 HTML 拆分为多个 HTML 文件

转载 作者:行者123 更新时间:2023-12-01 10:06:22 26 4
gpt4 key购买 nike

我遇到了一个问题,我想使用 Java 将单个 HTML 文件拆分为多个 HTML 文件,该 html 文件在单个 HTML 文件中包含一本教科书的多个章节,但我希望每个章节都在单个 HTML 文件中,每个章节章节开始可以使用 h2 标签和一些 id 来识别。附加了一个示例 HTML 文件,我想将其拆分为多个 HTML 文件。

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.1//EN' 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org"/>
<title>Sample HTML</title>




<link rel="stylesheet" href="0.css" type="text/css"/>
<link rel="stylesheet" href="1.css" type="text/css"/>
<link rel="stylesheet" href="sample.css" type="text/css"/>
<meta name="generator" content="sample content"/>
</head>
<body><div class="c2"><br/>
<br/>
<br/>
<br/></div>
<h2 id="pg00007">Chapter 7</h2>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p><a id="link2HCH0008"><!-- H2 anchor --></a></p>
<div class="c2"><br/>
<br/>
<br/>
<br/></div>
<h2 id="pg00008">Chapter 8</h2>
p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p><a id="link2HCH0009"><!-- H2 anchor --></a></p>
<div class="c2"><br/>
<br/>
<br/>
<br/></div>
<h2 id="pg00009">Chapter 9</h2>
p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p><a id="link2HCH0010"><!-- H2 anchor --></a></p>
<div class="c2"><br/>
<br/>
<br/>
<br/></div>
<h2 id="pg00010">Chapter 10</h2>
p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p>sample paragraph 1</p>
<p><a id="link2HCH0011"><!-- H2 anchor --></a></p>
</body></html>

最佳答案

不完全确定它是否有效,但我想你可以采用像 http://jsoup.org/ 这样的解析器并按如下方式使用它:

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements chapters = doc.select("h2");

然后您必须提取元素的内容并将其保留为新的 HTML 文件(包括正文等)

关于java - 如何使用 Java 将单个 HTML 拆分为多个 HTML 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36435040/

26 4 0
文章推荐: java - Jacksonparser解析器问题: Can not deserialize instance of out of START_ARRAY token