gpt4 book ai didi

java - 如何调整我的 DOMParser 以从 Java (Android) 上的 RSS 解析 "media:content"?

转载 作者:行者123 更新时间:2023-12-01 09:11:20 27 4
gpt4 key购买 nike

首先,我有这个 DOMParser 类;

import android.util.Log;

import java.io.IOException;
import java.io.StringReader;
import java.net.MalformedURLException;
import java.net.URL;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class DOMParser {

private RSSFeed _feed = new RSSFeed();

public RSSFeed parseXml(String xml) {
// _feed.clearList();
URL url = null;
try {
url = new URL(xml);
Log.e("THE XML", xml);
Log.e("THE URL", url.toString());
} catch (MalformedURLException e1) {
Log.e("MALFORMED EXCEPTION", "1");
e1.printStackTrace();
}

try {
// Create required instances
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String arg0, String arg1)
throws SAXException, IOException {
if (arg0.contains("Hibernate")) {
return new InputSource(new StringReader(""));
} else {
// TODO Auto-generated method stub
return null;
}
}
});
// Parse the xml
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();

// Get all <item> tags.
NodeList nl = doc.getElementsByTagName("item");
int length = nl.getLength();

for (int i = 0; i < length; i++) {
Node currentNode = nl.item(i);
RSSItem _item = new RSSItem();

NodeList nchild = currentNode.getChildNodes();
int clength = nchild.getLength();

// Get the required elements from each Item
for (int j = 0; j < clength; j = j + 1) {
try {
Node thisNode = nchild.item(j);
String theString = null;
String nodeName = thisNode.getNodeName();
Log.e("NODE NAME", nodeName);
theString = nchild.item(j).getFirstChild().getNodeValue();
//Log.e("THE STRING", theString);
if (theString != null) {
if ("title".equals(nodeName)) {
// Node name is equals to 'title' so set the Node
// value to the Title in the RSSItem.
_item.setTitle(theString);
} else if ("description".equals(nodeName)) {
_item.setDescription(theString);

// Parse the html description to get the image url
String html = theString;
org.jsoup.nodes.Document docHtml = Jsoup
.parse(html);
Elements imgEle = docHtml.select("img");
_item.setImage(imgEle.attr("src"));
} else if ("pubDate".equals(nodeName)) {

// We replace the plus and zero's in the date with
// empty string
String formatedDate = theString.replace(" +0000",
"");
_item.setDate(formatedDate);
} else if ("link".equals(nodeName)) {

// Trying to get the URL as a string
_item.setURL(theString);
}
/*else if ("media:content".equals(nodeName)){
_item.setImage(theString);
Log.e("THE IMAGE LINK", theString);
}*/

}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

// add item to the list
_feed.addItem(_item);
}

} catch (Exception e) {
e.printStackTrace();
}

// Return the final feed once all the Items are added to the RSSFeed
// Object(_feed).
return _feed;
}

}
}

我正在尝试解析如下所示的条目;

<item>
<title><![CDATA[Oceans Full of 'Aliens' Could Be Hidden Beneath Earth's Surface, Expert Says]]></title>
<description><![CDATA[Do "aliens" exist on Earth? In a way, experts think so, and they believe that these creatures can be found thriving in massive underground oceans hidden hundreds of miles beneath the Earth's surface.]]></description>
<guid>http://www.natureworldnews.com/articles/33160/20161130/oceans-full-aliens-hidden-beneath-earths-surface-expert.htm</guid>
<link>http://www.natureworldnews.com/articles/33160/20161130/oceans-full-aliens-hidden-beneath-earths-surface-expert.htm</link>
<media:content url="http://images.natureworldnews.com/data/images/full/37450/earth-ocean.jpg" />
<media:title type="html"><![CDATA[earth ocean]]></media:title>
<media:text type="html"><![CDATA[Do "aliens" exist on Earth? In a way, experts think so, and they believe that these creatures can be found thriving in massive underground oceans hidden hundreds of miles beneath the Earth's surface.]]></media:text>
<category>
<name><![CDATA[News]]></name>
</category>
<pubDate>Wed, 30 Nov 2016 11:02:00 EST</pubDate>
</item>
<item>
<title><![CDATA[Great Barrier Reef Sees Its Worst Damage on Record]]></title>
<description><![CDATA[The Great Barrier Reef is reportedly experiencing its worst damage via coral bleaching by far in history. The culprit is none other than the significant increase in water temperatures, which is record high as well. More than half of the coral population in the northern section has perished, while the central and southern centers have been reported to be in better health.]]></description>
<guid>http://www.natureworldnews.com/articles/33132/20161130/great-barrier-reef-sees-worst-damage-record.htm</guid>
<link>http://www.natureworldnews.com/articles/33132/20161130/great-barrier-reef-sees-worst-damage-record.htm</link>
<media:content url="http://images.natureworldnews.com/data/images/full/37433/great-barrier-reef-sees-its-worst-damage-on-record.jpg" />
<media:title type="html"><![CDATA[Great Barrier Reef Sees Its Worst Damage on Record]]></media:title>
<media:text type="html"><![CDATA[Corals in the Great Barrier reef are in danger.
]]></media:text>
<category>
<name><![CDATA[News]]></name>
</category>
<pubDate>Wed, 30 Nov 2016 09:54:00 EST</pubDate>
</item>

请注意<media:content>标签 - 这是图像的 URL 所在的位置。

我的代码为每个 RSS 条目抛出以下内容!有人可以解释一下#text我在下面看到的值?有人可以帮我编写如何提取图像 URL 并将其放在 setImage 中的代码吗?方法?

12-01 01:58:36.278 27776-27823/com.example01 E/NODE NAME: media:content
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: java.lang.NullPointerException: Attempt to invoke interface method 'java.lang.String org.w3c.dom.Node.getNodeValue()' on a null object reference
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at com.climatenews07.parser.DOMParser.parseXml(DOMParser.java:74)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at com.climatenews07.SplashActivity$AsyncLoadXMLFeed.doInBackground(SplashActivity.java:103)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at com.climatenews07.SplashActivity$AsyncLoadXMLFeed.doInBackground(SplashActivity.java:97)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at android.os.AsyncTask$2.call(AsyncTask.java:304)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at java.util.concurrent.FutureTask.run(FutureTask.java:237)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:243)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
12-01 01:58:36.278 27776-27823/com.example01 W/System.err: at java.lang.Thread.run(Thread.java:761)
12-01 01:58:36.278 27776-27823/com.example01 E/NODE NAME: #text
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: java.lang.NullPointerException: Attempt to invoke interface method 'java.lang.String org.w3c.dom.Node.getNodeValue()' on a null object reference
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at com.climatenews07.parser.DOMParser.parseXml(DOMParser.java:74)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at com.climatenews07.SplashActivity$AsyncLoadXMLFeed.doInBackground(SplashActivity.java:103)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at com.climatenews07.SplashActivity$AsyncLoadXMLFeed.doInBackground(SplashActivity.java:97)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at android.os.AsyncTask$2.call(AsyncTask.java:304)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at java.util.concurrent.FutureTask.run(FutureTask.java:237)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:243)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
12-01 01:58:36.279 27776-27823/com.example01 W/System.err: at java.lang.Thread.run(Thread.java:761)
12-01 01:58:36.279 27776-27823/com.example01 E/NODE NAME: media:title

正因为如此,我也得到了以下异常;

12-01 01:58:36.500 27776-27927/com.example01 E/Image URL: http:
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: java.net.UnknownHostException: Invalid host: http:
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.android.okhttp.HttpUrl.getChecked(HttpUrl.java:670)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.android.okhttp.OkHttpClient$1.getHttpUrlChecked(OkHttpClient.java:165)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.android.okhttp.internal.huc.HttpURLConnectionImpl.newHttpEngine(HttpURLConnectionImpl.java:345)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.android.okhttp.internal.huc.HttpURLConnectionImpl.initHttpEngine(HttpURLConnectionImpl.java:331)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:398)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:243)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.climatenews07.image.ImageLoader.getBitmap(ImageLoader.java:74)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.climatenews07.image.ImageLoader.access$000(ImageLoader.java:27)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at com.climatenews07.image.ImageLoader$PhotosLoader.run(ImageLoader.java:148)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:428)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at java.util.concurrent.FutureTask.run(FutureTask.java:237)
12-01 01:58:36.500 27776-27927/com.example01 W/System.err: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
12-01 01:58:36.501 27776-27927/com.example01 W/System.err: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
12-01 01:58:36.501 27776-27927/com.example01 W/System.err: at java.lang.Thread.run(Thread.java:761)

最佳答案

can someone help me code how to extract the image URL

提取 media:content 元素的 url 属性的值:

import org.w3c.dom.Element;

if ("media:content".equals(nodeName)) {
Element contentElement = (Element) thisNode;
if (contentElement.hasAttribute("url")) {
String u = contentElement.getAttribute("url");
}
}

该 fragment 转换 Node thisNodeElement这样就可以使用 getAttribute(…) 方法来获取 url 属性的值。

My code is throwing the following for every single RSS entry!

问题中的代码正在执行以下操作:

theString = nchild.item(j).getFirstChild().getNodeValue();

…例如,当 nchild.item(j) 是这样时:

<media:content url="http://images.natureworldnews.com/data/images/full/37450/earth-ocean.jpg" />

因此,在这种情况下,代码会在没有子元素的 media:content 元素上调用 .getFirstChild(),从而返回 null。然后代码调用 .getNodeValue() ,这会导致 java.lang.NullPointerException: Attempt to invoke interface method 'java.lang.String org.w3c.dom.Node .getNodeValue()' 出现空对象引用 错误。

代码的目的似乎是获取 url 属性的值。但属性不是子属性,因此 .getFirstChild() 将无法获取 url 属性。应该使用 .getAttribute(…) 来代替。

Can someone explain the #text value I see below

每个 item 元素不仅包含子元素,还包含文本节点——因为元素之间有空间。 .getChildNodes() 返回文本节点以及元素节点。

跳过文本节点的一种方法是在 for 循环的代码中添加如下内容:

if ("#text".equals(nodeName)) {
continue;
}

关于java - 如何调整我的 DOMParser 以从 Java (Android) 上的 RSS 解析 "media:content"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40904640/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com