gpt4 book ai didi

java - 如何从大型 XML 中获取特定元素的值

转载 作者:行者123 更新时间:2023-12-01 09:06:16 25 4
gpt4 key购买 nike

我是JAVA SAX的初学者。我有一个很大的 XML 文件,我想从中提取一些信息。下面是 XML 文件,我想要提取的内容和代码:

XML 文件中提取:

    ...
<Synset baseConcept="3" id="mizaAj_n2AR">
<SynsetRelations>
<SynsetRelation relType="hyponym" targets="TaboE_n2AR"/>
<SynsetRelation relType="hyponym" targets="TaboE_n2AR"/>
<SynsetRelation relType="hypernym" targets="ragobap_n4AR"/>
<SynsetRelation relType="hypernym" targets="ragobap_n4AR"/>
<SynsetRelation relType="hypernym" targets="Tiybap_Aln~afos_n1AR"/>
<SynsetRelation relType="hypernym" targets="Tiybap_Aln~afos_n1AR"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="04623612-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
<Synset baseConcept="3" id="ragobap_n4AR">
<SynsetRelations>
<SynsetRelation relType="antonym" targets="mizaAj_n2AR"/>
<SynsetRelation relType="antonym" targets="mizaAj_n2AR"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="04624826-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
<Synset baseConcept="3" id="tasal~uT_n1AR">
<SynsetRelations>
<SynsetRelation relType="has_instance" targets="simap_n1AR"/>
<SynsetRelation relType="is_instance" targets="simap_n1AR"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="04625882-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
...

我想要:

hyponym: 2
hypernym: 4
antonym: 2
has_instance: 1
is_instance:1

代码(主类和我的处理程序):

    import java.io.IOException;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

public class Main {

public static void main(String[] args) throws SAXException, IOException{

XMLReader p = XMLReaderFactory.createXMLReader();
p.setContentHandler(new handler());
p.parse("test1.xml");
}
----------------------------------------
import org.xml.sax.helpers.DefaultHandler;

public class handler extends DefaultHandler {

@Override
public void startElement(String SpacenameURI, String localName,
String qName, Attributes attrs) {

System.out.println("qname = " + qName);
String node = qName;

if (attrs != null) {
for (int i = 0; i < attrs.getLength(); i++) {
//nous récupérons le nom de l'attribut
String aname = attrs.getLocalName(i);
//Et nous affichons sa valeur
System.out.println("Attribut " + aname + " valeur : " + attrs.getValue(i));
}
}
}
}

最佳答案

public Map<String, Integer> countElements(File xmlFile) {

Map<String, Integer> counts = new HashMap<>();

try {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
FileInputStream fileInputStream = new FileInputStream(xmlFile);
XMLStreamReader reader = inputFactory.createXMLStreamReader(fileInputStream);

while(reader.hasNext()) {
reader.next();
if(reader.isStartElement() && reader.getLocalName().equals("SynsetRelation")) {
String relTypeValue = reader.getAttributeValue("", "relType");

if(!counts.containsKey(relTypeValue)) {
counts.put(relTypeValue, 0);
}

counts.put(relTypeValue, counts.get(relTypeValue) + 1);
}
}

fileInputStream.close();
} catch (XMLStreamException | IOException e) {
e.printStackTrace();
}

return counts;
}

此代码使用 Stream 读取器,这意味着它一次只会在内存中加载一个元素。这使得它非常高效,即使对于大文件也是如此。

map 用于跟踪计数。每次遇到“SynsetRelation”元素时,我都会首先检查它是否已被计数,然后递增计数器。

结果是包含每个检测值的计数的 map 。

您可以在主类中像这样使用它:

public class Main {
public static void main(String[] args) {
Map<String, Integer> results = countElements(new File("your file location here.xml"));
}
}

关于java - 如何从大型 XML 中获取特定元素的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41259939/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com