gpt4 book ai didi

java - 如何在Java中读写非英语字符(特殊字符如马拉地语、泰米尔语、印地语等)?

转载 作者:行者123 更新时间:2023-12-02 13:11:45 25 4
gpt4 key购买 nike

从 Excel 文件中读取非英语字符,假设读取马拉地语,然后将该语言写入 XML 文件。当我从 Excel 中读取这种马拉地语并在 Java 代码中进行检查时,它完全显示了马拉地语,但是当我通过 Java 代码将其写入 XML 时,我得到了一些与这种马拉地语相对应的符号。所以请建议我如何处理这种情况。请查找所附的相同代码。

public void excelToXML(String path) {

FileWriter fostream;

PrintWriter out = null;

String strOutputPath = "C:\\Temp\\";

try {

File file = new File(path);

InputStream inputStream = new FileInputStream(file);

Workbook wb = WorkbookFactory.create(inputStream);

List<String> sheetNames = new ArrayList<String>();

for (int i = 0; i < wb.getNumberOfSheets(); i++) {

sheetNames.add(wb.getSheetName(i));

}

fostream = new FileWriter(strOutputPath + "\\" + "iTicker" + ".xml");

out = new PrintWriter(new BufferedWriter(fostream));

// out.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");

out.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");

out.println("<root xmlns:xsi=\"http://www.w3.org/3921/XMLSchema-instance\">");

for (String sheetName : sheetNames) {
if(sheetName.equals("Sheet3")){
System.out.println(sheetName);
break;
}


Sheet sheet = wb.getSheet(sheetName);

boolean firstRow = true;

ArrayList<String> myStringArray = new ArrayList<String>();

Iterator<Cell> cells = sheet.getRow(0).cellIterator();

while (cells.hasNext()) {

myStringArray.add(cells.next().toString());

}

for (Row row : sheet) {

if (firstRow == true) {
firstRow = false;
continue;
}

if (!sheetName.equals("Sheet1")) {
out.println("\t<element>");
}

for (int i = 0; i < myStringArray.size(); i++) {
if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty()
&& row.getCell(i).toString().length() > 0) {
if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){
out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
} else{
long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString());
out.println(formatElement("\t\t", myStringArray.get(i), String.valueOf(ePochValue)));
}
} else {
blankValues.add(sheetName +":" + "column header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank.");
}
}

if (!sheetName.equals("Sheet1")) {
out.println("\t</element>");
}

}
}
out.write("</root>");

out.flush();

out.close();
if(blankValues != null && blankValues.size() >0){
FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values.");
}

} catch (Exception e) {
new DTHException(e.getMessage());
e.printStackTrace();

}

}

private static String formatCell(Cell cell)

{
if (cell == null) {
return "";
}

switch (cell.getCellType()) {

case Cell.CELL_TYPE_BLANK:

return "";

case Cell.CELL_TYPE_BOOLEAN:

return Boolean.toString(cell.getBooleanCellValue());

case Cell.CELL_TYPE_ERROR:

return "*error*";

case Cell.CELL_TYPE_NUMERIC:

return df.format(cell.getNumericCellValue());

case Cell.CELL_TYPE_STRING:

return cell.getStringCellValue();

default:

return "<unknown value>";

}

}

private static String formatElement(String prefix, String tag, String value) {

StringBuilder sb = new StringBuilder(prefix);
sb.append("<");

sb.append(tag);

if (value != null && value.length() > 0) {

sb.append(">");

sb.append(value);

sb.append("</");

sb.append(tag);

sb.append(">");

} else {

sb.append("/>");

}
return sb.toString();

}

在下面的行中,我在检查此 row.getCell(i) 值时获得了准确的马拉地语值,但在写入此值后获得了不同的输出。

out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));

最佳答案

您的代码有两个大问题。

1) 您显然使用的是 Windows(路径 C:\\Temp),但是 - 正如 Axel Richter 已在评论中所述 - 您正在使用输出文件的默认编码。直接使用文件名创建 FileWriter 将为您提供平台的默认编码,即 Windows ANSI for Windows。这不是您想要的,因为稍后您将使用 UTF-8 作为编码来编写 XML header 声明。

您永远不应该依赖平台的默认编码。始终通过 OutputStreamWriterFileOutputStream 显式编码创建 PrintWriter,如下所示:

PrintWriter writer = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("iTicker.xml"), StandardCharsets.UTF_8)));

2) 像您一样手动编写 XML 是不好的做法。如果这样做,您应该注意特殊字符,例如“<”、“>”和“&”。始终建议为此使用一个库,它会自动进行转义。 Java 标准库的一部分是例如接口(interface) XMLStreamWriter 的实现.

这里是一个使用简单的示例:

import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;

public class WriteXml {

public static void main(String[] args) {
try {
File outFile = new File("iTicker.xml");
// Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding.
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));

XMLStreamWriter xmlWriter =
XMLOutputFactory.newInstance().createXMLStreamWriter(out);
xmlWriter.writeStartDocument("UTF-8", "1.0");
xmlWriter.writeCharacters("\n");
xmlWriter.writeStartElement("root");
xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance");

xmlWriter.writeCharacters("\n ");
xmlWriter.writeStartElement("element");
// Some special characters and (I hope) some Marathi letters
xmlWriter.writeCharacters("<>&\": मराठी वर्णमाला");
xmlWriter.writeEndElement(); // element

xmlWriter.writeCharacters("\n");
xmlWriter.writeEndElement(); // root
xmlWriter.writeEndDocument();
xmlWriter.close(); // should be better in a finally block
out.close(); // should be better handled automatically by try-with-resources
} catch(Exception e) {
e.printStackTrace();
}
}

}

这将创建以下 XML:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
<element>&lt;&gt;&amp;": मराठी वर्णमाला</element>
</root>

关于java - 如何在Java中读写非英语字符(特殊字符如马拉地语、泰米尔语、印地语等)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43933519/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com