gpt4 book ai didi

java - 使用java在xml文件中替换多个单词

转载 作者:太空宇宙 更新时间:2023-11-04 11:42:17 25 4
gpt4 key购买 nike

我有一个基于 xml 的 .tbx 文件,其中包含如下代码:

<descripGrp>
<descrip type="subjectField">406001</descrip>
</descripGrp>
<langSet xml:lang="en">
<tig>
<term>competence of the Member States</term>
<termNote type="termType">fullForm</termNote>
<descrip type="reliabilityCode">3</descrip>
</tig>
</langSet>
<langSet xml:lang="pl">
<tig>
<term>kompetencje państw członkowskich</term>
<termNote type="termType">fullForm</termNote>
<descrip type="reliabilityCode">3</descrip>
</tig>
</langSet>
</termEntry>
<termEntry id="IATE-290">
<descripGrp>
<descrip type="subjectField">406001</descrip>
</descripGrp>

我想在整个(几乎 50 MiB)文件中搜索并替换“subjectField”字段中的代码,并将其替换为正确的文本,例如。406001 代表政治意识形态,406002 代表政治制度。我有一个包含代码和相应名称的表:406001 政治思想406002 政治机构406003政治哲学

这样的代码有五百个,因此手动完成可能会花费很长时间。我不是程序员(我正在学习),但我了解一点java,所以我制作了一些小应用程序,我想这会对我有帮助,但结果令人沮丧(幸运的是我没有气馁:))

这就是我写的,结果是运行速度极慢,根本无法取代那些代码。它在 15 分钟内处理了文件的 1/5 (!)。此外,输出文件中没有换行符,因此整个 xml 代码都在一行中。

关于我应该走哪条路有什么建议吗?

    File log= new File("D:\\IATE\\export_EN_PL_2017-03-07_All_Langs.tbx"); // TBX file to be processed
File newe = new File("D:\\IATE\\now.txt"); // output file
String search = "D:\\IATE\\org.txt"; // file containing codes "40600" etc
String replace = "D:\\IATE\\rplc.txt"; // file containing names

try {
FileReader fr = new FileReader(log);
String s;
String s1;
String s2;
String totalStr = "";
String tot1 = "";
String tot2 = "";
FileReader fr1 = new FileReader(search);
FileReader fr2 = new FileReader(replace);
try (BufferedReader br = new BufferedReader(fr)) {
try (BufferedReader br1 = new BufferedReader(fr1)) {
try (BufferedReader br2 = new BufferedReader(fr2)) {
while ((s = br.readLine()) != null) {
totalStr += s;
while((s1 = br1.readLine()) != null){
tot1 += s1;

while ((s2 = br2.readLine()) != null){
tot2 += s2;
}
}
totalStr = totalStr.replaceAll(tot1, tot2);

FileWriter fw = new FileWriter(newe);

fw.write(totalStr);
fw.write("\n");
fw.close();
}


} catch (Exception e) {
e.printStackTrace();
}
} catch (Exception e) {
e.printStackTrace();
}
}
} catch (Exception e) {
e.printStackTrace();
}

}

最佳答案

遍历 2 个文件来获取匹配值将需要大量冗余工作。在替换 .tbx 文件中的值之前,您应该设置一个要读取的属性文件。这是一个可以做到这一点的函数:

public static Properties getProps(String pathToNames, String pathToNumbers){

Properties prop = new Properties();

try{
File names = new File(pathToNames);
BufferedReader theNames = new BufferedReader( new InputStreamReader (new FileInputStream(names)));
File numbers = new File(pathToNumbers);
BufferedReader theNumbers = new BufferedReader( new InputStreamReader (new FileInputStream(numbers)));

String name;
String number;
while(((name = theNames.readLine())!= null)&&((number = theNumbers.readLine())!= null)){
prop.put(number, name);
}
theNames.close();
theNumbers.close();

}catch(Exception e){
e.printStackTrace();
}
return prop;
}

假设您使用的是 Java 8,您可以检查该函数是否可以使用:

thePropertiesFile.forEach((Object key, Object value) ->{
System.out.println(key+ " " +value);
});

现在您可以编写一个可以正确转换的函数。使用 PrintStream 实现您想要的输出功能。

static String workingDir = System.getProperty("user.dir");
public static void main(String[] args){

Properties p = getProps(workingDir+"path/to/names.txt",workingDir+"path/to/numbers.txt");
File output = new File(workingDir+"path/to/output.txt");

try {
PrintStream ps = new PrintStream(output);
BufferedReader tbx = new BufferedReader(new InputStreamReader (new FileInputStream(new File(workingDir+"path/to/the.tbx"))));
String currentLine;
String theNum;
String theName;
int c; //temp index
int start;
int end;
while((currentLine = tbx.readLine()) != null){
if(currentLine.contains("subjectField")){
c = currentLine.indexOf("subjectField");
start = currentLine.indexOf(">", c)+1;
end = currentLine.indexOf("<", c);
theNum = currentLine.substring(start, end);
theName = p.getProperty(theNum);
currentLine = currentLine.substring(0,start)+theName+currentLine.substring(end);
}
ps.println(currentLine);
}
ps.close();
tbx.close();
} catch (IOException e) {
e.printStackTrace();
}

}

对于不存在的数字,这将用空字符串替换它们。您可以根据您的特定用途更新它。

如果theNum包含多个值,则分割成一个数组:

theName = "";
if(theNum.contains(","){
int[] theNums = theNum.split(",");
for (int num : theNums) {
theName += p.getProperty(num);
theName += ",";
}
theName = theName.replaceAll(",$", ""); //get rid of trailing comma
}
else
theName = p.getProperty(theNum);

关于java - 使用java在xml文件中替换多个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42673835/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com