gpt4 book ai didi

C: 如何使用 getc 来避免产生非 UTF-8 字符?

转载 作者:太空宇宙 更新时间:2023-11-03 23:42:45 25 4
gpt4 key购买 nike

我目前正在编写一个 c 程序,它将采用 3 个参数、两个文件(一个输入和一个输出)和一个 int(输出行的最大长度,称之为 x)。我想读取输入文件中的每一行并将前 x 个字符写入输出文件(有效地“修剪”文件)。

这是我的代码:

int main(int argc, char *argv[]) {

const char endOfLine = '\n';

if (argc < 4) {
printf("Program takes 4 params\n");
exit(1);
} else {
// Convert character argument [3] (line length) to an int
int maxLen = atoi(argv[3]);

char str[maxLen];
char *inputName;
char *outputName;

inputName = argv[1];
outputName = argv[2];

// Open files to be read and written to
FILE *inFile = fopen(inputName, "r");
FILE *outFile = fopen(outputName, "w");

int count = 0;
char ch = getc(inFile);
while (ch != EOF) {
if (ch == '\n') {
str[count] = (char)ch;
printf("Adding %s to output\n", str);
fputs(str, outFile);
count = 0;
} else if (count < maxLen) {
str[count] = ch;
printf("Adding %c to str\n", ch);
count++;
} else if (count == maxLen) {
str[count] = '\n';
}
ch = getc(inFile);
}

}

return 0;
}

唯一的问题是,如果最后一个字符是单引号,它会打印出非 UTF-8 字符,如下所示:

For Whom t
John Donne
No man is
Entire of
Each is a
A part of
If a clod
Europe is
As well as
As well as
Or of thin
Each man��
For I am i
Therefore,
For whom t

最佳答案

您可以检查最后一个字符输出是否为 utf-8 连续字节 10xxxxxx,如果是,则继续输出直到字符完成。

// bits match 10xxxxxx
int is_utf_continue_byte(int ch){
return ch & 0x80 && ~ch & 0x40;
}

//...
while (is_utf_continue_byte(ch))
putchar(ch), ch = getchar();

关于C: 如何使用 getc 来避免产生非 UTF-8 字符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41052753/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com