gpt4 book ai didi

Android rss 提要解析

转载 作者:塔克拉玛干 更新时间:2023-11-02 22:48:02 26 4
gpt4 key购买 nike

我是 android 的新手,在我的应用程序中我必须解析数据并且我需要在屏幕上显示。但是在一个特定的标签数据中我无法解析原因因为一些特殊字符也出现在该标签中。下面我显示我的代码。

我的解析器函数:

  protected ArrayList<String> doInBackground(Context... params) 
{
// context = params[0];
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
test = new ArrayList<String>();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new java.net.URL("input URL_confidential").openConnection().getInputStream());
//Document document = builder.parse(new URL("http://www.gamestar.de/rss/gamestar.rss").openConnection().getInputStream());
Element root = document.getDocumentElement();
NodeList docItems = root.getElementsByTagName("item");
Node nodeItem;
for(int i = 0;i<docItems.getLength();i++)
{
nodeItem = docItems.item(i);
if(nodeItem.getNodeType() == Node.ELEMENT_NODE)
{
NodeList element = nodeItem.getChildNodes();
Element entry = (Element) docItems.item(i);
name=(element.item(0).getFirstChild().getNodeValue());




// System.out.println("description = "+element.item(2).getFirstChild().getNodeValue().replaceAll("&lt;div&gt;&lt;p&gt;"," "));
System.out.println("Description"+Jsoup.clean(org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(element.item(2).getFirstChild().getNodeValue()), new Whitelist()));


items.add(name);


}
}
}
catch (ParserConfigurationException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (MalformedURLException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (SAXException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}

return items;
}

输入:

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>my application</title>
<link>http:// some link</link>
<atom:link href="http:// XXXXXXXX" rel="self"></atom:link>
<language>en-us</language>
<lastBuildDate>Thu, 20 Dec 2012</lastBuildDate>
<item>
<title>lllegal settlements</title>
<link>http://XXXXXXXXXXXXXXXX</link>
<description> &lt;div&gt;&lt;p&gt;
India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel’s announcement of new construction activity in Palestinian territories and demand immediate dismantling of the “illegal†settlements.
&lt;/p&gt;
&lt;p&gt;
UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel “gravely threatens efforts to establish a viable Palestinian state.â€
&lt;/p&gt;
&lt;p&gt;
</description>
</item>
</channel>

输出:

 lllegal settlements  ----> title tag text

India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel announcement of new construction activity in Palestinian territories and demand immediate dismantling of the illegal settlements. -----> description tag text

UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel gravely threatens efforts to establish a viable Palestinian state. ----> description tag text.

最佳答案

您的文本节点包含转义的 HTML 实体(&gt;>大于)和垃圾字符(“grossly” em>).您应该首先根据您的输入源调整编码,然后您可以使用 Apache Commons Lang反转义 HTML StringUtils.escapeHtml4(String) .

此方法(希望)返回一个 XML,您可以查询(例如使用 XPath)以提取所需的文本节点,或者您可以将整个字符串提供给 JSOUP或到 the Android Html class

// JSOUP, "html" is the unescaped string. Returns a string
Jsoup.parse(html).text();

// Android
android.text.Html.fromHtml(instruction).toString()

测试程序(需要 JSOUP 和 Commons-Lang)

package stackoverflow;

import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;

public class EmbeddedHTML {

public static void main(String[] args) {
String src = "<description> &lt;div&gt;&lt;p&gt; An independent" +
" inquiry into the September 11 attack on the US Consulate" +
" in Benghazi that killed the US ambassador to Libya and" +
" three other Americans has found that systematic failures" +
" at the State Department led to “grossly†inadequate" +
" security at the mission. &lt;/p&gt;</description>";
String unescaped = StringEscapeUtils.unescapeHtml4(src);
System.out.println(Jsoup.clean(unescaped, new Whitelist()));
}

}

关于Android rss 提要解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13949800/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com