gpt4 book ai didi

c - 无法读取 C 中的 UNICODE 文本文件

转载 作者:可可西里 更新时间:2023-11-01 11:47:45 25 4
gpt4 key购买 nike

(我查看了以前的帖子并尝试了他们的建议,但无济于事。)

我正在尝试读取仅包含日文字符的文件。该文件如下所示:

当我尝试读取它时,控制台中没有任何输出显示,并且在调试时,读取缓冲区只是垃圾。这是我用来读取文件的函数:

wchar_t* ReadTextFileW(wchar_t* filePath, size_t numBytesToRead, size_t maxBufferSize, const wchar_t* mode, int seekOffset, int seekOrigin)
{
size_t numItems = 0;
size_t bufferSize = 0;
wchar_t* buffer = NULL;
FILE* file = NULL;

//Ensure the filePath does NOT lead to a device.
if (IsPathADevice(filePath) == false)
{
//0 indicates to read as much as possible (the max specified).
if (numBytesToRead == 0)
{
numBytesToRead = maxBufferSize;
}

if (filePath != NULL && mode != NULL)
{
//Ensure there are no errors in opening the file.
if (_wfopen_s(&file, filePath, mode) == 0)
{
//Set the cursor location (back to the beginning of the file by default).
if (fseek(file, seekOffset, seekOrigin) != 0)
{
//Error: Could not change file cursor position.
fclose(file);
return NULL;
}

//Calculate the size of the buffer in bytes.
bufferSize = numBytesToRead * sizeof(wchar_t);

//Create the buffer to store file data in.
buffer = (wchar_t*)_aligned_malloc(bufferSize, BYTE_ALIGNMENT);

//Ensure the buffer was allocated.
if (buffer == NULL)
{
//Error: Buffer could not be allocated.
fclose(file);
return NULL;
}

//Clear any garbage data in the buffer.
memset(buffer, 0, bufferSize);

//Read the data from the file.
numItems = fread_s(buffer, bufferSize, sizeof(wchar_t), numBytesToRead, file);

//Check for read errors.
if (numItems <= 0)
{
//Error: File could not be read.
fclose(file);
_aligned_free(buffer);
return NULL;
}

//Ensure the file is closed without errors.
if (fclose(file) != 0)
{
//Error: File did not close properly.
_aligned_free(buffer);
return NULL;
}

}
}
}

return buffer;
}

要调用此函数,我将执行以下操作。也许我没有正确使用 setlocale() 但从我读到的内容看来我是。只是重申一下,我遇到的问题是垃圾似乎被读入并且控制台中没有显示任何内容:

    setlocale(LC_ALL, "jp");
wchar_t* retVal = ReadTextFileW(L"C:\\jap.txt");
printf("%S\n", retVal);
_aligned_free(retVal);

我还在 .cpp 的顶部定义了以下内容

#define UNICODE
#define _UNICODE

已解决:

如 ryyker 所述,要解决此问题,您需要知道用于创建原始文件的编码。在记事本和 Notepad++ 中,有一个用于编码的下拉菜单。默认情况下(也是最常用的)是 UTF-8。

一旦知道编码,就可以将 _wfopen_s() 的读取模式更改为以下内容。

wchar_t* retVal = ReadWide::ReadTextFileW(L"C:\\jap.txt", 0, 1024, L"r, ccs=UTF-8");
MessageBoxW(NULL, retVal, NULL, 0);
_aligned_free(retVal);

必须使用消息框打印外文。

最佳答案

这是一个 excerpt discussing content on encoding for Japanese language ,使用 Notepad++ 创建(在注释中说明为 OP 使用)

Double Byte encodings, also called, by usage, Double Byte Character Set (DBCS)

Some of them preexisted Unicode, and were designed to encode character sets with a large number of characters, mainly found in Far East languages with ideographic or syllabic scripts:

The 2 Bytes Universal Character Set : UCS-2 Big Endian and UCS-2 Little Endian
The Japanese Code Page : Shift-JIS ( Windows-932 )
The Chinese Code Pages : Simplified Chinese GB2312 ( Windows-936 ),
Traditionnal Chinese Big5 ( Windows-950 )
The Korean Code Pages : Windows 949, EUC-KR

看起来 Shift-JIS 可能是您尝试读取的编码。来自 here

Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft...

通常,您需要确定用于在文件中创建多字节字符的编码,然后它们才能被 C 或任何其他语言中的函数正确读回。 This link may help .

关于c - 无法读取 C 中的 UNICODE 文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40023039/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com