gpt4 book ai didi

java - 如何在 Java 中将转义字符读取为文本?

转载 作者:太空狗 更新时间:2023-10-29 13:52:23 25 4
gpt4 key购买 nike

public List<String> readRSS(String feedUrl, String openTag, String closeTag)
throws IOException, MalformedURLException {

URL url = new URL(feedUrl);
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));

String currentLine;
List<String> tempList = new ArrayList<String>();
while ((currentLine = reader.readLine()) != null) {
Integer tagEndIndex = 0;
Integer tagStartIndex = 0;
while (tagStartIndex >= 0) {
tagStartIndex = currentLine.indexOf(openTag, tagEndIndex);
if (tagStartIndex >= 0) {
tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);
tempList.add(currentLine.substring(tagStartIndex + openTag.length(), tagEndIndex) + "\n");
}
}
}
if (tempList.size() > 0) {
if(openTag.contains("title")){
tempList.remove(0);
tempList.remove(0);
}
else if(openTag.contains("desc")){
tempList.remove(0);
}
}
return tempList;
}

我编写这段代码是为了阅读 RSS 提要。一切正常,但是当解析器找到像这样的字符时 它会中断。这是因为它找不到它的结束标签,因为 xml 被转义了。

我不知道如何在我的代码中修复它。谁能帮我解决这个问题?

最佳答案

问题是特殊字符 是一个换行符,因此您的开始和结束标记会出现在不同的行上。因此,如果您逐行阅读,它将无法使用您拥有的代码。

你可以尝试这样的事情:

StringBuffer fullLine = new StringBuffer();

while ((currentLine = reader.readLine()) != null) {
int tagStartIndex = currentLine.indexOf(openTag, 0);
int tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);

// both tags on the same line
if (tagStartIndex != -1 && tagEndIndex != -1) {
// process the whole line
tempList.add(currentLine);
fullLine = new StringBuffer();
// no tags on this line but the buffer has been started
} else if (tagStartIndex == -1 && tagEndIndex == -1 && fullLine.length() > 0) {
/*
* add the current line to the buffer; it is part
* of a larger line
*/
fullLine.append(currentLine);
// start tag is on this line
} else if (tagStartIndex != -1 && tagEndIndex == -1) {
/*
* line started but did not have an end tag; add it to
* a new buffer
*/
fullLine = new StringBuffer(currentLine);
// end tag is on this line
} else if (tagEndIndex != -1 && tagStartIndex == -1) {
/*
* line ended but did not have a start tag; add it to
* the current buffer and then process the buffer
*/
fullLine.append(currentLine);
tempList.add(fullLine.toString());
fullLine = new StringBuffer();
}
}

给定这个样本输入:

<title>another &#xD;
title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<desc>description 0</desc>
<desc>another &#xD;
description 1</desc>
<title>another title 4</title>
<title>another &#xD;
another line in between &#xD;
title 5</title>

titletempList 中的完整行变为:

<title>another &#xD;title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<title>another title 4</title>
<title>another &#xD;another line in between &#xD;title 5</title>

对于desc:

<desc>description 0</desc>
<desc>another &#xD;description 1</desc>

您应该在完整的 RSS 提要上测试此方法的性能。还要注意特殊字符不会被转义。

关于java - 如何在 Java 中将转义字符读取为文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44763781/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com