gpt4 book ai didi

java - 个人项目 "RSS FEED"XML解析器

转载 作者:行者123 更新时间:2023-12-02 06:43:02 27 4
gpt4 key购买 nike

我对 Java 比较陌生,并且在很长很长的几天里我一直在试图弄清楚如何到达以下标签以进行输出。我真的很感激对这个问题的一些见解。似乎我能找到和/或尝试的一切都没有成功。 (请原谅那些俗气的新闻文章)

<item>
<pubDate>Sat, 21 Sep 2013 02:30:23 EDT</pubDate>
<title>
<![CDATA[
Carmen Bryan Lashes Out at Beyonce Fans for Throwing Shade (@carmenbryan)
]]>
</title>
<link>
http://www.vladtv.com/blog/174937/carmen-bryan-lashes-out-at-beyonce-fans-for-throwing-shade/
</link>
<guid>
http://www.vladtv.com/blog/174937/carmen-bryan-lashes-out-at-beyonce-fans-for-throwing-shade/
</guid>
<description>
<![CDATA[
<img ... /><br />.
<p>In response to someone who reminded Bryan that Jay Z has Beyonce now, she tweeted.</p>
<p>Check out what else Bryan had to say above.</p>
<p>Source: </p>
]]>
</description>
</item>

我已成功解析 XML 并打印出标题和描述元素标记中的内容,但是描述元素标记的输出还包括其所有子元素标记。我希望将来使用这个项目来构建我的 Java 作品集,请帮忙!

到目前为止我的代码:

public class NewXmlReader
{

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
try {

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document docXml = builder.parse(NewXMLReaderHandlers.inputHandler());
docXml.getDocumentElement().normalize();

NewXMLReaderHandlers.handleItemTags(docXml, "item");

} catch (ParserConfigurationException | SAXException parserConfigurationException) {
System.out.println("You Are Not XML formated !!");
parserConfigurationException.printStackTrace();
} catch (IOException iOException) {
System.out.println("URL NOT FOUND");
iOException.getCause();
}
}

}

public class NewXMLReaderHandlers {

private static int ARTICLELENGTH;

public static String inputHandler() throws IOException {
InputStreamReader inputStream = new InputStreamReader(System.in);
BufferedReader bufferRead = new BufferedReader(inputStream);
System.out.println("Please Enter A Proper URL: ");
String urlPageString = bufferRead.readLine();
return urlPageString;
}

public static void handleItemTags( Document document, String rssFeedParentTopicTag){
NodeList listOfArticles = document.getElementsByTagName(rssFeedParentTopicTag);
NewXMLReaderHandlers.ARTICLELENGTH = listOfArticles.getLength();
String rootElement = document.getDocumentElement().getNodeName();
if (rootElement == "rss"){
System.out.println("We Have An RSS Feed To Parse");

for (int i = 0; i < NewXMLReaderHandlers.ARTICLELENGTH; i++) {
Node itemNode = (Node) listOfArticles.item(i);
if (itemNode.getNodeType() == Node.ELEMENT_NODE) {
Element itemElement= (Element) itemNode;
tagContent (itemElement, "title");
tagContent (itemElement, "description");
}
}
}

}

public static void tagContent (Element item, String tagName) {
NodeList tagNodeList = item.getElementsByTagName(tagName);
Element tagElement = (Element)tagNodeList.item(0);
NodeList tagTElist = tagElement.getChildNodes();
Node tagNode = tagTElist.item(0);

// System.out.println( " - " + tagName + " : " + tagNode.getNodeValue() + "\n");
if(tagName == "description"){
System.out.println( " - " + tagName + " : " + tagNode.getNodeValue() + "\n\n");
System.out.println(" Do We Have Any Siblings? " + tagNode.getNextSibling().getNodeValue() + "\n");
}
}
}

最佳答案

对于我来说,最简单的解决方案是使用 XPath API。

本质上,它是一种 XML 查询语言。请参阅XPath Tutorial作为底漆。

此示例使用来自 SO 的 RSS 提要,该提要使用 <entry...>而不是<item> ,但我对其他 RSS(和 XML)文件甚至非常复杂的 HTML 文档使用了相同的技术...

import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class TestRSSFeed {

public static void main(String[] args) {
try {
// Read the feed...
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc = factory.newDocumentBuilder().parse("http://stackoverflow.com/feeds/tag?tagnames=java&sort=newest");
Element root = doc.getDocumentElement();

// Create a xPath instance
XPath xPath = XPathFactory.newInstance().newXPath();
// Find all the nodes that are named <entry...> any where in
// the document that live under the parent node...
XPathExpression expression = xPath.compile("//entry");
NodeList nl = (NodeList) expression.evaluate(root, XPathConstants.NODESET);

System.out.println("Found " + nl.getLength() + " items...");
for (int index = 0; index < nl.getLength(); index++) {
Node node = nl.item(index);
// This is a sub node search.
// The search is based on the parent node and looks for a single
// node titled "title" that belongs to the parent node...
// I did this because I'm only expecting a single node...
expression = xPath.compile("title");
Node child = (Node) expression.evaluate(node, XPathConstants.NODE);
System.out.println(child.getTextContent());
}

} catch (IOException | ParserConfigurationException | SAXException exp) {
exp.printStackTrace();
} catch (XPathExpressionException ex) {
ex.printStackTrace();
}
}

}

现在,您可以执行一些非常复杂的查询,但我想我应该从一个基本示例开始;)

关于java - 个人项目 "RSS FEED"XML解析器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18930920/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com