gpt4 book ai didi

Java文件编码从ANSI到UTF8的转换

转载 作者:搜寻专家 更新时间:2023-11-01 02:12:35 28 4
gpt4 key购买 nike

我需要将文件的编码从 ANSI(windows-1252) 更改为 UTF8。我写了下面的程序来通过 java 来完成它。该程序将字符转换为 UTF8,但是当我在 Notepad++ 中打开文件时,编码类型显示为 ANSI 作为 UTF8。当我在访问数据库中导入此文件时,这给了我错误。需要仅使用 UTF8 编码的文件。此外,要求是在不在任何编辑器中打开文件的情况下转换文件。

public class ConvertFromAnsiToUtf8 {

private static final char BYTE_ORDER_MARK = '\uFEFF';
private static final String ANSI_CODE = "windows-1252";
private static final String UTF_CODE = "UTF8";
private static final Charset ANSI_CHARSET = Charset.forName(ANSI_CODE);

public static void main(String[] args) {

List<File> fileList;
File inputFolder = new File(args[0]);
if (!inputFolder.isDirectory()) {
return;
}
File parentDir = new File(inputFolder.getParent() + "\\"
+ inputFolder.getName() + "_converted");

if (parentDir.exists()) {
return;
}
if (parentDir.mkdir()) {

} else {
return;
}

fileList = new ArrayList<File>();
for (final File fileEntry : inputFolder.listFiles()) {
fileList.add(fileEntry);
}

InputStream in;

Reader reader = null;
Writer writer = null;
try {
for (File file : fileList) {
in = new FileInputStream(file.getAbsoluteFile());
reader = new InputStreamReader(in, ANSI_CHARSET);

OutputStream out = new FileOutputStream(
parentDir.getAbsoluteFile() + "\\"
+ file.getName());
writer = new OutputStreamWriter(out, UTF_CODE);
writer.write(BYTE_ORDER_MARK);
char[] buffer = new char[10];
int read;
while ((read = reader.read(buffer)) != -1) {
System.out.println(read);
writer.write(buffer, 0, read);
}
}
reader.close();
writer.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

任何指针都会有所帮助。

谢谢,阿希什

最佳答案

发布的代码正确地从 windows-1252 转码为 UTF-8。

Notepad++ 消息令人困惑,因为“ANSI as UTF-8”没有明显的含义;它似乎是一个 open defect在 Notepad++ 中。我相信 Notepad++ 意味着 没有 BOM 的 UTF-8(请参阅编码菜单。)

作为 Windows 程序的 Microsoft Access 可能希望 UTF-8 文件以字节顺序标记 ( BOM ) 开头。

您可以通过在文件开头写入代码点 U+FEFF 将 BOM 注入(inject)到文档中:

import java.io.*;
import java.nio.charset.*;

public class Ansi1252ToUtf8 {
private static final char BYTE_ORDER_MARK = '\uFEFF';

public static void main(String[] args) throws IOException {
Charset windows1252 = Charset.forName("windows-1252");
try (InputStream in = new FileInputStream(args[0]);
Reader reader = new InputStreamReader(in, windows1252);
OutputStream out = new FileOutputStream(args[1]);
Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8)) {
writer.write(BYTE_ORDER_MARK);
char[] buffer = new char[1024];
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
}
}
}

关于Java文件编码从ANSI到UTF8的转换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15353671/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com