gpt4 book ai didi

java - 快速替换 XML 节点值

转载 作者:行者123 更新时间:2023-11-30 04:14:58 25 4
gpt4 key购买 nike

我有一堆 XML 文档,其中包含我需要用虚假数据替换的个人信息。 Person 节点包含以下元素:

  • uuid - 必需,不应触及。
  • 名字 - 可选
  • 姓氏 - 可选
  • 地址 - 可选
  • personID - 必填

一个人可能会出现多次,在这种情况下应该使用相同的假数据,即如果两个 Person 节点具有相同的 personID,则它们都应该收到相同的假 ID。

我已经实现了一些 Java 代码,这些代码从 XML 字符串构建 DOM 树,并在将其写回字符串之前替换节点。这工作正常,但由于我有这么多文档,我想知道是否有更快的方法。也许通过正则表达式或 XSLT 之类的?

这是一个示例文档:

<ADocument>
<Stuff>
...
</Stuff>
<OtherStuff>
...
</OtherStuff>
<Person>
<uuid>11111111-1111-1111-1111-111111111111</uuid>
<firstName>Some</firstName>
<lastName>Person</lastName>
<personID>111111111111</personID>
</Person>
<Person>
<uuid>22222222-2222-2222-2222-222222222222</uuid>
<firstName>Another Person</firstName>
<address>Main St. 2</address>
<personID>222222222222</personID>
</Person>
<Person>
<uuid>33333333-3333-3333-3333-333333333333</uuid>
<firstName>Some</firstName>
<lastName>Person</lastName>
<personID>111111111111</personID>
</Person>
<MoreStuff>
...
</MoreStuff>
</ADocument>

这是我当前的实现:

public String replaceWithFalseData(String xmlInstance) {
Document dom = toDOM(xmlInstance);

XPathExpression xPathExpression = XPathExpressionFactory.createXPathExpression("//Person");
List<Node> nodeList = xPathExpression.evaluateAsNodeList(dom);

for(Node personNode : nodeList) {
Map<String, Node> childNodes = getChildNodes(personNode);
String personID = childNodes.get("personID").getTextContent();
// Retrieve a cached fake person using the ID, or create a new one if none exists.
Person fakePerson = getFakePerson(personID);

setIfExists(childNodes.get("firstName"), fakePerson.getFirstName());
setIfExists(childNodes.get("lastName"), fakePerson.getLastName());
setIfExists(childNodes.get("address"), fakePerson.getAddress());
setIfExists(childNodes.get("personID"), fakePerson.getPersonID());
}

return toString(dom);
}

public Map<String, Node> getChildNodes(Node parent) {
Map<String, Node> childNodes = new HashMap<String, Node>();
for(Node child = parent.getFirstChild(); child != null; child = child.getNextSibling()) {
if(child.getLocalName() != null) {
childNodes.put(child.getLocalName(), child);
}
}
return childNodes;
}

public void setIfExists(Node node, String value) {
if(node != null) {
node.setTextContent(value);
}
}

最佳答案

您正在使用基于 DOM 的 API。使用 Streaming API for XML (StAX) 可以实现更快的替换,在许多情况下,它的性能优于基于 DOM 的 API: StAX versus DOM

DOM API 比 StAX 占用更多内存,这会降低性能,但比 StAX API 更易于使用。

您的示例的工作解决方案 - 在 150 MB xml 文件上进行测试,在 10 秒内替换:

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;


public class ReplaceXmlWithFakeUser
{
public static void main(String[] args) throws XMLStreamException, IOException
{
XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new BufferedInputStream(new FileInputStream("c:\\temp\\persons.xml")));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLEventWriter writer = factory.createXMLEventWriter(new BufferedOutputStream(new FileOutputStream("c:\\temp\\fakePersons.xml")));
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
while (eventReader.hasNext())
{
XMLEvent event = eventReader.nextEvent();

if (event.getEventType() == XMLEvent.START_ELEMENT &&
event.asStartElement().getName().toString().equals("Person"))
{
//write Person startElement:
writer.add(event);


/*
STEP 1:
personId is at the end of Person element. Cannot overwrite firstName and address element with fake data yet. Must call getFakePerson() first.
Iterate till you read Person END element and just remember all events within person element which we will overwrite with fake data in step 2.
*/
Person fakePerson=null;

List<XMLEvent> eventsWithinPersonElement = new ArrayList<XMLEvent>();

event = eventReader.nextEvent();
while(!(event.getEventType() == XMLEvent.END_ELEMENT && event.asEndElement().getName().toString().equals("Person")))
{

eventsWithinPersonElement.add(event);

if(event.getEventType() == XMLEvent.START_ELEMENT &&
event.asStartElement().getName().toString().equals("personID"))
{
XMLEvent personIDContentEvent = eventReader.nextEvent();

String personId = personIDContentEvent.asCharacters().toString();
fakePerson = getFakePerson(personId);

eventsWithinPersonElement.add(personIDContentEvent);
}

event = eventReader.nextEvent();
}
XMLEvent personEndElement=event;


//STEP 2:
for (Iterator<XMLEvent> eventWithinPersonElementIterator = eventsWithinPersonElement.iterator(); eventWithinPersonElementIterator.hasNext(); )
{
XMLEvent eventWithinPersonElement = eventWithinPersonElementIterator.next();

writer.add(eventWithinPersonElement);

if(eventWithinPersonElement.getEventType() == XMLEvent.START_ELEMENT &&
eventWithinPersonElement.asStartElement().getName().toString().equals("personID"))
{
writer.add(eventFactory.createCharacters(fakePerson.personId));

//skip personId event
eventWithinPersonElementIterator.next();
}
if(eventWithinPersonElement.getEventType() == XMLEvent.START_ELEMENT &&
eventWithinPersonElement.asStartElement().getName().toString().equals("firstName"))
{
writer.add(eventFactory.createCharacters(fakePerson.firstName));

//skip real firstName
eventWithinPersonElementIterator.next();
}
if(eventWithinPersonElement.getEventType() == XMLEvent.START_ELEMENT &&
eventWithinPersonElement.asStartElement().getName().toString().equals("lastName"))
{
writer.add(eventFactory.createCharacters(fakePerson.lastName));

//skip real firstName
eventWithinPersonElementIterator.next();
}
else if(eventWithinPersonElement.getEventType() == XMLEvent.START_ELEMENT &&
eventWithinPersonElement.asStartElement().getName().toString().equals("address"))
{
writer.add(eventFactory.createCharacters(fakePerson.address));

//skip real address
eventWithinPersonElementIterator.next();

}
}

writer.add(personEndElement);
}
else
{
writer.add(event);
}
}
writer.close();
}

private static Person getFakePerson(String personId)
{
//create simple fake user...

Person fakePerson = new Person();
fakePerson.personId = personId;
fakePerson.firstName = "fake first name: " + Math.random();
fakePerson.lastName = "fake last name: " + Math.random();
fakePerson.address = "fake address: " + Math.random();

return fakePerson;
}

static class Person
{
String personId;
String firstName;
String lastName;
String address;

}
}

使用persons.xml作为输入:

<ADocument>
<Stuff>
<StuffA></StuffA>
</Stuff>
<OtherStuff>
<OtherStuff>
<ABC>yada yada</ABC>
</OtherStuff>
</OtherStuff>

<Person>
<uuid>11111111-1111-1111-1111-111111111111</uuid>
<firstName>Some</firstName>
<lastName>Person</lastName>
<personID>111111111111</personID>
</Person>
<Person>
<uuid>22222222-2222-2222-2222-222222222222</uuid>
<firstName>Another Person</firstName>
<address>Main St. 2</address>
<personID>222222222222</personID>
</Person>
<Person>
<uuid>33333333-3333-3333-3333-333333333333</uuid>
<firstName>Some</firstName>
<lastName>Person</lastName>
<personID>111111111111</personID>
</Person>

<MoreStuff>
<foo></foo>
<foo>fooo</foo>
<foo><bar></bar></foo>
<foo>
<bar></bar>
<bar/>
<bar>bb</bar>
</foo>
<bar/>
</MoreStuff>

</ADocument>

生成此 fakePersons.xml 结果:

<?xml version="1.0" encoding="UTF-8"?><ADocument>
<Stuff>
<StuffA></StuffA>
</Stuff>
<OtherStuff>
<OtherStuff>
<ABC>yada yada</ABC>
</OtherStuff>
</OtherStuff>

<Person>
<uuid>11111111-1111-1111-1111-111111111111</uuid>
<firstName>fake first name: 0.9518514637129984</firstName>
<lastName>fake last name: 0.3495378044884426</lastName>
<personID>111111111111</personID>
</Person>
<Person>
<uuid>22222222-2222-2222-2222-222222222222</uuid>
<firstName>fake first name: 0.8945739434355868</firstName>
<address>fake address: 0.40784763231471777</address>
<personID>222222222222</personID>
</Person>
<Person>
<uuid>33333333-3333-3333-3333-333333333333</uuid>
<firstName>fake first name: 0.7863207851479257</firstName>
<lastName>fake last name: 0.09918620445731652</lastName>
<personID>111111111111</personID>
</Person>

<MoreStuff>
<foo></foo>
<foo>fooo</foo>
<foo><bar></bar></foo>
<foo>
<bar></bar>
<bar></bar>
<bar>bb</bar>
</foo>
<bar></bar>
</MoreStuff>

</ADocument>

关于java - 快速替换 XML 节点值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18612712/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com