gpt4 book ai didi

java - 读取HTML,如何使用BufferedReader跳过网页中的HEAD标签信息,逐行读取HTML?

转载 作者:太空宇宙 更新时间:2023-11-04 07:11:54 25 4
gpt4 key购买 nike

我有一个快速的问题,我很难弄清楚。我想逐行读取 html 文件,但我想跳过 HEAD 标签。因此,我认为我可以在跳过 HEAD 标签后开始阅读文本。

到目前为止我已经创建了:

BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));

StringBuilder string = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("<html>"))
string.append(line + "\n");
}

我想将html代码保存在内存中,但不带HEAD信息。

示例:

<HTML>

<HEAD>

<TITLE>Your Title Here</TITLE>

</HEAD>

<BODY BGCOLOR="FFFFFF">

<CENTER><IMG SRC="clouds.jpg" ALIGN="BOTTOM"> </CENTER>

<a href="http://somegreatsite.com">Link Name</a>is a link to another nifty site

<H1>This is a Header</H1>

<H2>This is a Medium Header</H2>

Send me mail at <a href="mailto:support@yourcompany.com">support@yourcompany.com</a>.

</BODY>

我想保存除标签信息之外的所有内容。

最佳答案

像这样怎么样 -

boolean htmlFound = false;                        // Have we found an open html tag?
StringBuilder string = new StringBuilder(); // Back to your code...
String line;
while ((line = reader.readLine()) != null) {
if (!htmlFound) { // Have we found it yet?
if (line.toLowerCase().startsWith("<html")) { // Check if this line opens a html tag...
htmlFound = true; // yes? Excellent!
} else {
continue; // Skip over this line...
}
}
System.out.println("This is each line: " + line);
string.append(line + "\n");
}

关于java - 读取HTML,如何使用BufferedReader跳过网页中的HEAD标签信息,逐行读取HTML?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20535287/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com