gpt4 book ai didi

java - 用于非常大的 XML 文件的 SAX 解析器

转载 作者:行者123 更新时间:2023-12-04 06:28:55 25 4
gpt4 key购买 nike

我正在处理一个非常大的 XML 文件,4 GB 并且我总是遇到内存不足错误,我的 java 堆已经达到最大值,这就是代码的原因:

Handler h1 = new Handler("post");
Handler h2 = new Handler("comment");
posts = new Hashtable<Integer, Posts>();
comments = new Hashtable<Integer, Comments>();
edges = new Hashtable<String, Edges>();
try {
output = new BufferedWriter(new FileWriter("gephi.gdf"));
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
SAXParser parser1 = SAXParserFactory.newInstance().newSAXParser();


parser.parse(new File("G:\\posts.xml"), h1);
parser1.parse(new File("G:\\comments.xml"), h2);
} catch (Exception ex) {
ex.printStackTrace();
}

@Override
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
if(qName.equalsIgnoreCase("row") && type.equals("post")) {
post = new Posts();
post.id = Integer.parseInt(atts.getValue("Id"));
post.postTypeId = Integer.parseInt(atts.getValue("PostTypeId"));
if (atts.getValue("AcceptedAnswerId") != null)
post.acceptedAnswerId = Integer.parseInt(atts.getValue("AcceptedAnswerId"));
else
post.acceptedAnswerId = -1;
post.score = Integer.parseInt(atts.getValue("Score"));
if (atts.getValue("OwnerUserId") != null)
post.ownerUserId = Integer.parseInt(atts.getValue("OwnerUserId"));
else
post.ownerUserId = -1;
if (atts.getValue("ParentId") != null)
post.parentId = Integer.parseInt(atts.getValue("ParentId"));
else
post.parentId = -1;
}
else if(qName.equalsIgnoreCase("row") && type.equals("comment")) {
comment = new Comments();
comment.id = Integer.parseInt(atts.getValue("Id"));
comment.postId = Integer.parseInt(atts.getValue("PostId"));
if (atts.getValue("Score") != null)
comment.score = Integer.parseInt(atts.getValue("Score"));
else
comment.score = -1;
if (atts.getValue("UserId") != null)
comment.userId = Integer.parseInt(atts.getValue("UserId"));
else
comment.userId = -1;
}
}



public void endElement(String uri, String localName, String qName)
throws SAXException {
if(qName.equalsIgnoreCase("row") && type.equals("post")){
posts.put(post.id, post);
//System.out.println("Size of hash table is " + posts.size());
}else if (qName.equalsIgnoreCase("row") && type.equals("comment"))
comments.put(comment.id, comment);
}

有什么方法可以优化此代码,以免内存不足?可能使用流?如果是,你会怎么做?

最佳答案

SAX 解析器对故障很有效。

帖子、评论和边缘 HashMap 立即作为潜在问题向我跳出来。我怀疑您需要定期从内存中刷新这些映射以避免 OOME。

关于java - 用于非常大的 XML 文件的 SAX 解析器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5684239/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com