gpt4 book ai didi

java - 你如何让 Matlab 为 UTF-16 文本文件编写 BOM(字节顺序标记)?

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:26:10 27 4
gpt4 key购买 nike

我正在使用 Matlab 创建 UTF16 文本文件,稍后我将使用 Java 阅读这些文件。在 Matlab 中,我打开一个名为 fileName 的文件并按如下方式写入:

fid = fopen(fileName, 'w','n','UTF16-LE');
fprintf(fid,"Some stuff.");

在 Java 中,我可以使用以下代码读取文本文件:

FileInputStream fileInputStream = new FileInputStream(fileName);
Scanner scanner = new Scanner(fileInputStream, "UTF-16LE");
String s = scanner.nextLine();

这是十六进制输出:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 1300000000  73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00  s.o.m.e. .s.t.u.f.f.

The above approach works fine. But, I want to be able to write out the file using UTF16 with a BOM to give me more flexibility so that I don't have to worry about big or little endian. In Matlab, I've coded:

fid = fopen(fileName, 'w','n','UTF16');
fprintf(fid,"Some stuff.");

在 Java 中,我将代码更改为:

FileInputStream fileInputStream = new FileInputStream(fileName);
Scanner scanner = new Scanner(fileInputStream, "UTF-16");
String s = scanner.nextLine();

在这种情况下,字符串s 是乱码,因为Matlab 没有编写BOM。如果我手动添加 BOM,我可以让 Java 代码正常工作。使用添加的 BOM,以下文件可以正常工作。

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 1500000000  FF FE 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00  ÿþs.o.m.e. .s.t.u.f.f.

How can I get Matlab to write out the BOM? I know I could write the BOM out separately, but I'd rather have Matlab do it automatically.

Addendum

I selected the answer below from Amro because it exactly solves the question I posed.

One key discovery for me was the difference between the Unicode Standard and a UTF (Unicode transformation format) (see http://unicode.org/faq/utf_bom.html). The Unicode Standard provides unique identifiers (code points) for characters. UTFs provide mappings of every code point "to a unique byte sequence." Since all but a handful of the characters I am using are in the first 128 code points, I'm going to switch to using UTF-8 as Romeo suggests. UTF-8 is supported by Matlab (The warning shown below won't need to be suppressed.) and Java, and for my application will generate smaller text files.

I suppress the Matlab warning

Warning: The encoding 'UTF-16LE' is not supported.

warning off MATLAB:iofun:UnsupportedEncoding;

最佳答案

在我的系统上,MATLAB 报告不支持 UTF-16。我认为使用 UTF-8 会更安全。此外,UTF-8 将解决您的 Little Endian/Big Endian 问题。

关于java - 你如何让 Matlab 为 UTF-16 文本文件编写 BOM(字节顺序标记)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8052770/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com