- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我必须将下面的 html 正文部分解析为下面给出的输出。
标签必须存在于输出中。输出可以包含 {p,i,b,br} 标签。剩余的标签必须删除,并且只有文本才能输出。
这是我的输入。
<!DOCTYPE HTML>
<html>
<head>
<title>Introduction</title>
</head>
<body>
<article id="mobi_content">
<h1 class="mobi-page-title">Introduction</h1>
<section id="dataSectionInstanceId-431331" class="body-text">This book is about creating a great career. <p>You might be saying to yourself, "I don't want to talk about a career, much less a great career. Right now I just need a job. I need to eat!" <p>Well, if you're looking, we're going to show you how to get that great job now. That's the first, short-term step. <p>But the day will come when you'll want to do more than just eat. And beyond that day will come another day when you look back at your life and take measure of your entire professional contribution to the world. <p>This book is about today and tomorrow. It's about getting a great job now and enjoying a great career for life. <p>When we say a person has had a great career, what do we mean? That he or she made a lot of money? moved spectacularly up the corporate ladder? became famous or renowned in his or her profession? What about the familiar comment from every movie star on every talk show: "I can't believe I get paid for doing this!" Are only a few people entitled to feel that way, but not the rest of us? <p>And what about you? Are you looking forward to a great career? Would you describe your current career as "great"? When you get to the end of your productive life, will you be looking back on a mediocre career? a good career? a great career? And how will you know? <p>Furthermore, just how do you create a great career for yourself? <p>As coauthors of this book, we are fascinated by these provocative questions. We have been associated in our work for many years as avid students of what it takes to build a great life and career. And we bring two different sets of experiences to the issue, so occasionally, we will speak to you directly in our own voices. We'll share with you our discoveries and provide tools and insights that will help you find answers for yourself. Whether you're looking for a job or want to make the job you have more meaningful, this book is for you.
</section>
</article>
</body>
</html>
输出期望如下:
This book is about creating a great career.
<P>You might be saying to yourself, "I don't want to talk about a career, much less a great career. Right now I just need a job. I need to eat!"
<P>Well, if you're looking, we're going to show you how to get that great job now. That's the first, short-term step.
<P>But the day will come when you'll want to do more than just eat. And beyond that day will come another day when you look back at your life and take measure of your entire professional contribution to the world.
<P>This book is about today and tomorrow. It's about getting a great job now and enjoying a great career for life.
<P>When we say a person has had a great career, what do we mean? That he or she made a lot of money? moved spectacularly up the corporate ladder? became famous or renowned in his or her profession? What about the familiar comment from every movie star on every talk show: "I can't believe I get paid for doing this!" Are only a few people entitled to feel that way, but not the rest of us?
<P>And what about you? Are you looking forward to a great career? Would you describe your current career as "great"? When you get to the end of your productive life, will you be looking back on a mediocre career? a good career? a great career? And how will you know?
<P>Furthermore, just how do you create a great career for yourself?
<P>As coauthors of this book, we are fascinated by these provocative questions. We have been associated in our work for many years as avid students of what it takes to build a great life and career. And we bring two different sets of experiences to the issue, so occasionally, we will speak to you directly in our own voices. We'll share with you our discoveries and provide tools and insights that will help you find answers for yourself. Whether you're looking for a job or want to make the job you have more meaningful, this book is for you.
我的代码:
doc.body().traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
String name = node.nodeName();
String paraText = "";
if (node instanceof TextNode) {
TextNode tn = ((TextNode) node);
if (node.nodeName().equals("p")) {
//finalHtml+="<p>"+tn.text()+"</p>";
} else {
finalHtml += tn.text();
}
} else if (node instanceof Node) {
if (node.nodeName() == "p") {
System.out.println("fnbdnv"+node.toString());
}
if (node.nodeName() == "h1") {
// finalHtml+="<p>"+node.toString()+"<p>";
} else if (node.nodeName() == "div") {
node.removeAttr("class");
finalHtml += node.toString();
} else if (node.nodeName() == "seection") {
finalHtml += node.toString();
} else if (node.nodeName() == "<b>") {
finalHtml += node.toString();
} else if (node.nodeName() == "<i>") {
finalHtml += "<i>" + node.toString() + "</i>";
}
}
}
@Override
public void tail(Node node, int depth) {
// Do Nothing
}
});
最佳答案
也许在这种情况下使用一些正则表达式会更好。
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Main {
public static void main(String[] args) {
try {
String html = "<!DOCTYPE HTML>" +
"<html>" +
"<head>" +
"<title>Introduction</title>" +
"</head>" +
"<body>" +
"<article id=\"mobi_content\">" +
"<h1 class=\"mobi-page-title\">Introduction</h1>" +
"<section id=\"dataSectionInstanceId-431331\" class=\"body-text\">This <i>book</i> is about creating a great career. <p>You might be saying to yourself, \"I don't want to talk about a career, much less a great career. Right now I just need a job. I need to eat!\" <p>Well, if you're looking, we're going to show you how to get that great job now. That's the first, short-term step. <p>But the day will come when you'll want to do more than just eat. And beyond that day will come another day when you look back at your life and take measure of your entire professional contribution to the world. <p>This book is about today and tomorrow. It's about getting a great job now and enjoying a great career for life. <p>When we say a person has had a great career, what do we mean? That he or she made a lot of money? moved spectacularly up the corporate ladder? became famous or renowned in his or her profession? What about the familiar comment from every movie star on every talk show: \"I can't believe I get paid for doing this!\" Are only a few people entitled to feel that way, but not the rest of us? <p>And what about you? Are you looking forward to a great career? Would you describe your current career as \"great\"? When you get to the end of your productive life, will you be looking back on a mediocre career? a good career? a great career? And how will you know? <p>Furthermore, just how do you create a great career for yourself? <p>As coauthors of this book, we are fascinated by these provocative questions. We have been associated in our work for many years as avid students of what it takes to build a great life and career. And we bring two different sets of experiences to the issue, so occasionally, we will speak to you directly in our own voices. We'll share with you our discoveries and provide tools and insights that will help you find answers for yourself. Whether you're looking for a job or want to make the job you have more meaningful, this book is for you." +
"</section>" +
"</article>" +
"</body>" +
"</html>";
Document doc = Jsoup.parse(html);
System.out.println(removeTags(doc.body().toString()));
} catch (Exception e) {
e.printStackTrace();
}
}
public static String removeTags(String source) {
return source.replaceAll("(?!(</?p>|</?i>|</?b>|<br/?>))(</?.*?>)", " ");
}
}
更新
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Main {
public static void main(String[] args) {
try {
String html = "<!DOCTYPE HTML>" +
"<html>" +
"<head>" +
"<title>Introduction</title>" +
"</head>" +
"<body> <article id=\"mobi_content\"> <h1 class=\"mobi-page-title\">\"Build Your Village\" Tool</h1> <section id=\"dataSectionInstanceId-431408\" class=\"body-text\"><p class=\"nonindent\">Your great career depends not only on you,</p> <p class=\"nonindent\">Sample deposits in the Emotional Bank Account:</p> <ul class=\"bullet\"> <li><p class=\"nonindent\">Congratulate the person on a job well done.</p></li> <li><p class=\"nonindent\">Send birthday greetings.</p></li></section></article></body>" +
"</html>";
Document doc = Jsoup.parse(html);
System.out.println(removeTags(doc.body().toString()));
} catch (Exception e) {
e.printStackTrace();
}
}
public static String removeTags(String source) {
return source.replaceAll("(?!(</p>|<p .*?>|</?i>|</?b>|<br/?>))(</?.*?>)", " ");
}
}
更新2
import java.util.ListIterator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Attribute;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
try {
Pattern pattern = Pattern.compile("/(((?!/).)*)[.]");
String html = "<!DOCTYPE HTML>" +
"<html>" +
"<head>" +
"<title>Introduction</title>" +
"</head>" +
"<body> <article id=\"mobi_content\"> <h1 class=\"mobi-page-title\">\"Build Your Village\" Tool</h1> <section id=\"dataSectionInstanceId-431408\" class=\"body-text\"><p class=\"nonindent\">Your great career depends not only on you,</p> <p class=\"center\"><img src=\"mpla/multimedia/Cove_9781936111107_epub_005_r1.png\" id=\"mobi_image_12776\" class=\"inline-img\" alt=\"PNG\"/></p><p class=\"nonindent\">Sample deposits in the Emotional Bank Account:</p> <ul class=\"bullet\"> <li><p class=\"nonindent\">Congratulate the person on a job well done.</p></li> <li><p class=\"nonindent\">Send birthday greetings.</p></li></section></article></body>" +
"</html>";
Document doc = Jsoup.parse(html);
Elements imgs = doc.select("img");
System.out.println(imgs);
ListIterator<Element> iter = imgs.listIterator();
while(iter.hasNext()) {
Element img = iter.next();
String src = img.attr("src");
Matcher matcher = pattern.matcher(src);
if (matcher.find()) {
img.tagName("graphic").text(matcher.group(1));
removeAttr(img);
}
}
System.out.println(removeTags(doc.body().toString()));
} catch (Exception e) {
e.printStackTrace();
}
}
public static void removeAttr(Element e) {
Attributes at = e.attributes();
for (Attribute a : at) {
e.removeAttr(a.getKey());
}
}
public static String removeTags(String source) {
return source.replaceAll("(?!(</p>|<p .*?>|</?graphic>|</?i>|</?b>|<br/?>))(</?.*?>)", " ").trim();
}
}
关于java - 是否可以使用 jsoup 来解析 html?解析后还需要在输出中保留一些标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25643266/
我正在尝试用 C 语言编写一个使用 gstreamer 的 GTK+ 应用程序。 GTK+ 需要 gtk_main() 来执行。 gstreamer 需要 g_main_loop_run() 来执行。
我已经使用 apt-get 安装了 opencv。我得到了以下版本的opencv2,它工作正常: rover@rover_pi:/usr/lib/arm-linux-gnueabihf $ pytho
我有一个看起来像这样的 View 层次结构(基于其他答案和 Apple 的使用 UIScrollView 的高级 AutoLayout 指南): ScrollView 所需的2 个步骤是: 为 Scr
我尝试安装 udev。 udev 在 ./configure 期间给我一个错误 --exists: command not found configure: error: pkg-config and
我正在使用 SQLite 3。我有一个表,forums,有 150 行,还有一个表,posts,有大约 440 万行。每个帖子都属于一个论坛。 我想从每个论坛中选择最新帖子的时间戳。如果我使用 SEL
使用 go 和以下包: github.com/julienschmidt/httprouter github.com/shwoodard/jsonapi gopkg.in/mgo.v2/bson
The database仅包含 2 个表: 钱包(100 万行) 事务(1500 万行) CockroachDB 19.2.6 在 3 台 Ubuntu 机器上运行 每个 2vCPU 每个 8GB R
我很难理解为什么在下面的代码中直接调用 std::swap() 会导致编译错误,而使用 std::iter_swap 编译却没有任何错误. 来自 iter_swap() versus swap() -
我有一个非常简单的 SELECT *用 WHERE NOT EXISTS 查询条款。 SELECT * FROM "BMAN_TP3"."TT_SPLDR_55E63A28_59358" SELECT
我试图按部分组织我的 .css 文件,我需要从任何文件访问文件组中的任何类。在 Less 中,我可以毫无问题地创建一个包含所有文件导入的主文件,并且每个文件都导入主文件,但在 Sass 中,我收到一个
Microsoft.AspNet.SignalR.Redis 和 StackExchange.Redis.Extensions.Core 在同一个项目中使用。前者需要StackExchange.Red
这个问题在这里已经有了答案: Updating from Rails 4.0 to 4.1 gives sass-rails railties version conflicts (4 个答案) 关
我们有一些使用 Azure DevOps 发布管道部署到的现场服务器。我们已经使用这些发布管道几个月了,没有出现任何问题。今天,我们在下载该项目的工件时开始出现身份验证错误。 部署组中的节点显示在线,
Tip: instead of creating indexes here, run queries in your code – if you're missing any indexes, you
你能解释一下 Elm 下一个声明中的意思吗? (=>) = (,) 我在 Elm architecture tutorial 的例子中找到了它 最佳答案 这是中缀符号。实际上,这定义了一个函数 (=>
我需要一个 .NET 程序集查看器,它可以显示低级详细信息,例如元数据表内容等。 最佳答案 ildasm 是 IL 反汇编程序,具有低级托管元数据 token 信息。安装 Visual Studio
我有两个列表要在 Excel 中进行比较。这是一个很长的列表,我需要一个 excel 函数或 vba 代码来执行此操作。我已经没有想法了,因此转向你: **Old List** A
Closed. This question does not meet Stack Overflow guidelines。它当前不接受答案。 想要改善这个问题吗?更新问题,以便将其作为on-topi
我正在学习 xml 和 xml 处理。我无法很好地理解命名空间的存在。 我了解到命名空间帮助我们在 xml 中分离相同命名的元素。我们不能通过具有相同名称的属性来区分元素吗?为什么命名空间很重要或需要
我搜索了 Azure 文档、各种社区论坛和 google,但没有找到关于需要在公司防火墙上打开哪些端口以允许 Azure 所有组件(blob、sql、compute、bus、publish)的简洁声明
我是一名优秀的程序员,十分优秀!