gpt4 book ai didi

ios - 如何使用 UTF-8 读取 NSInputStream?

转载 作者:塔克拉玛干 更新时间:2023-11-02 21:45:11 24 4
gpt4 key购买 nike

我尝试使用 NSInputStream 在 iOS 中读取一个大文件,以通过换行符分隔文件行(我不想使用 componentsSeparatedByCharactersInSet,因为它占用太多内存)。

但由于并非所有行似乎都是 UTF-8 编码(因为它们可以显示为 ASCII,相同的字节),所以我经常检测到 Incorrect NSStringEncoding value 0x0000。假设 NSASCIIStringEncoding。将在不久的将来停止这种兼容性映射行为。 警告。

我的问题是:有没有办法通过例如设置编译器标志?

此外:追加/连接两个缓冲区读取是否省事,因为从字节流读取,然后将缓冲区转换为字符串,然后追加字符串可能会使字符串损坏?

下面的示例方法演示了字节到字符串的转换将丢弃 UTF-8 字符的前半部分和后半部分,因为它们是无效的。

- (void)NSInputStreamTest {
uint8_t testString[] = {0xd0, 0x91}; // @"Б"

// Test 1: Read max 1 byte at a time of UTF-8 string
uint8_t buf1[1], buf2[1];
NSString *s1, *s2, *s3;
NSInteger c1, c2;
NSInputStream *inStream = [[NSInputStream alloc] initWithData:[[NSData alloc] initWithBytes:testString length:2]];

[inStream open];
c1 = [inStream read:buf1 maxLength:1];
s1 = [[NSString alloc] initWithBytes:buf1 length:1 encoding:NSUTF8StringEncoding];
NSLog(@"Test 1: Read %d byte(s): %@", c1, s1);
c2 = [inStream read:buf2 maxLength:1];
s2 = [[NSString alloc] initWithBytes:buf2 length:1 encoding:NSUTF8StringEncoding];
NSLog(@"Test 1: Read %d byte(s): %@", c2, s2);
s3 = [s1 stringByAppendingString:s2];
NSLog(@"Test 1: Concatenated: %@", s3);
[inStream close];

// Test 2: Read max 2 bytes at a time of UTF-8 string
uint8_t buf4[2];
NSString *s4;
NSInteger c4;
NSInputStream *inStream2 = [[NSInputStream alloc] initWithData:[[NSData alloc] initWithBytes:testString length:2]];

[inStream2 open];
c4 = [inStream2 read:buf4 maxLength:2];
s4 = [[NSString alloc] initWithBytes:buf4 length:2 encoding:NSUTF8StringEncoding];
NSLog(@"Test 2: Read %d byte(s): %@", c4, s4);
[inStream2 close];
}

输出:

2013-02-10 21:16:23.412 Test[11144:c07] Test 1: Read 1 byte(s): (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 1: Read 1 byte(s): (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 1: Concatenated: (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 2: Read 2 byte(s): Б

最佳答案

首先,在行中:s3 = [s1 stringByAppendingString:s2]; 您正在尝试连接到“nil”值。结果也将是“零”。因此,您可能希望连接字节而不是字符串:

uint8_t buf3[2];
buf3[0] = buf1[0];
buf3[1] = buf2[0];
s3 = [[NSString alloc] initWithBytes:buf3 length:2 encoding:NSUTF8StringEncoding];

输出:

2015-11-06 12:57:40.304 Test[10803:883182] Test 1: Read 1 byte(s): (null)
2015-11-06 12:57:40.305 Test[10803:883182] Test 1: Read 1 byte(s): (null)
2015-11-06 12:57:40.305 Test[10803:883182] Test 1: Concatenated: Б

其次,UTF-8字符的长度可能在[1..6]字节内。

(1 byte)   0aaa aaaa         //if symbol lays in 0x00 .. 0x7F (ASCII)
(2 bytes) 110x xxxx 10xx xxxx
(3 bytes) 1110 xxxx 10xx xxxx 10xx xxxx
(4 bytes) 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
(5 bytes) 1111 10xx 10xx xxxx 10xx xxxx 10xx xxxx 10xx xxxx
(6 bytes) 1111 110x 10xx xxxx 10xx xxxx 10xx xxxx 10xx xxxx 10xx xxxx

因此,如果您打算从 NSInputStream 读取原始字节,然后将它们转换为 UTF-8 NSString,您可能希望从 NSInputStream 逐字节读取,直到获得有效字符串:

#define MAX_UTF8_BYTES 6
NSString *utf8String;
NSMutableData *_data = [[NSMutableData alloc] init]; //for easy 'appending' bytes

int bytes_read = 0;
while (!utf8String) {
if (bytes_read > MAX_UTF8_BYTES) {
NSLog(@"Can't decode input byte array into UTF8.");
return;
}
else {
uint8_t byte[1];
[_inputStream read:byte maxLength:1];
[_data appendBytes:byte length:1];
utf8String = [NSString stringWithUTF8String:[_data bytes]];
bytes_read++;
}
}

关于ios - 如何使用 UTF-8 读取 NSInputStream?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14798706/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com