gpt4 book ai didi

html - 当 NSURLResponse 为 textEncodingName 返回 nil 时检测 HTML 编码

转载 作者:塔克拉玛干 更新时间:2023-11-02 20:12:43 25 4
gpt4 key购买 nike

我正在使用此调用加载网站 HTML -

    NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url];
[request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"];
[request setValue:@"text/html" forHTTPHeaderField:@"Accept"];
[NSURLConnection sendAsynchronousRequest:request
queue:[NSOperationQueue currentQueue]
completionHandler:^(NSURLResponse *response, NSData *data, NSError *error) { ... }

然后,为了将 NSData 转换为 NSString,我需要知道编码,所以我调用 -

NSString *textEncoding = [response textEncodingName];

来自代码块,但它在不指定“Content-Encoding” header 字段的网站上返回 nil。

如果我不知道编码,[[NSString alloc] initWithData:data encoding:responseEncoding] 不会给我可读的 HTML。

如何为不发送“Content-Encoding” header 字段的网站检测正确的编码?

最佳答案

可以尝试不同的编码,看看哪一种编码的文本可读 -

static int encodingPriority[] = {
NSUTF8StringEncoding,
NSASCIIStringEncoding,
NSISOLatin1StringEncoding,
NSISOLatin2StringEncoding,
NSUnicodeStringEncoding,
NSWindowsCP1251StringEncoding,
NSWindowsCP1252StringEncoding,
NSWindowsCP1253StringEncoding,
NSWindowsCP1254StringEncoding,
NSWindowsCP1250StringEncoding,
NSNEXTSTEPStringEncoding,
NSJapaneseEUCStringEncoding,
NSNonLossyASCIIStringEncoding,
NSShiftJISStringEncoding, /* kCFStringEncodingDOSJapanese */
NSISO2022JPStringEncoding, /* ISO 2022 Japanese encoding for e-mail */
NSMacOSRomanStringEncoding,
NSUTF16BigEndianStringEncoding,
NSUTF16LittleEndianStringEncoding,
NSUTF32StringEncoding,
NSUTF32BigEndianStringEncoding,
NSUTF32LittleEndianStringEncoding
};

#define REQUIRED_HTML_STRING @"<html"

- (NSString *)htmlStringForUnknownEncodingData:(NSData *)data detectedEncoding:(NSStringEncoding *)detectedEncoding
{
NSStringEncoding encoding;
NSString *html;

for (int i = 0; i < sizeof(encodingPriority); i++) {
encoding = encodingPriority[i];

// try this encoding
html = [[NSString alloc] initWithData:data encoding:encoding];

// we need to find a text, because bad encoding will return an unreadable text
if (html && [html rangeOfString:REQUIRED_HTML_STRING options:NSCaseInsensitiveSearch].location != NSNotFound) {
*detectedEncoding = encoding;
return html;
}
}
return nil;
}

然后,要检测 NSData 中的 HTML 使用的是哪种编码,请调用 -

NSStringEncoding encoding;
html = [self htmlStringForUnknownEncodingData:data detectedEncoding:&encoding];

if (html)
NSLog("Encoding detected!");
else
NSLog("No encoding detected");

关于html - 当 NSURLResponse 为 textEncodingName 返回 nil 时检测 HTML 编码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17702782/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com