gpt4 book ai didi

html - NSString 移除 html 标签,保留

转载 作者:行者123 更新时间:2023-11-29 12:21:55 26 4
gpt4 key购买 nike

如何从 NSString 中删除 html 标签,但保留任何 <Text in angle brackets>

<p>123 <Hello> abc</p> -> 123 <Hello> abc

我尝试了各种正则表达式、扫描器和 XML 解析器解决方案,但它们删除了 <Text in angle brackets>以及标签。

唯一适合我的解决方案是使用带选项的 NSAttributedString

NSAttributedString *str = [[NSAttributedString alloc] initWithData:utf8Data
options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)}
documentAttributes:nil
error:nil];

NSString *result = [str string];

但是这种方法使用了 WebKit 并且为我的任务消耗了太多内存。

那么,我如何从 NSString 中去除标签,同时保留 <Text in angle brackets>不使用任何类型的 WebKit/UIWebView 等等?

最佳答案

我问了一个similar question前阵子,可能有些答案可以帮到你。如果您确实需要完整的 HTML 解析器并且只想去除 HTML 标签,NSString 类别可能会有用(这是 mwaterfal 修改的类别):

- (NSString *)stringByStrippingTags {

// Find first & and short-cut if we can
NSUInteger ampIndex = [self rangeOfString:@"<" options:NSLiteralSearch].location;
if (ampIndex == NSNotFound) {
return [NSString stringWithString:self]; // return copy of string as no tags found
}

// Scan and find all tags
NSScanner *scanner = [NSScanner scannerWithString:self];
[scanner setCharactersToBeSkipped:nil];
NSMutableSet *tags = [[NSMutableSet alloc] init];
NSString *tag;
do {
// Scan up to <
tag = nil;
[scanner scanUpToString:@"<" intoString:NULL];
[scanner scanUpToString:@">" intoString:&tag];

if (tag) {
NSString *t = [[NSString alloc] initWithFormat:@"%@>", tag];
[tags addObject:t];
}

} while (![scanner isAtEnd]);
NSMutableString *result = [[NSMutableString alloc] initWithString:self];
NSString *finalString;

NSString *replacement;
for (NSString *t in tags) {
replacement = @" ";
if ([t isEqualToString:@"<a>"] ||
[t isEqualToString:@"</a>"] ||
[t isEqualToString:@"<span>"] ||
[t isEqualToString:@"</span>"] ||
[t isEqualToString:@"<strong>"] ||
[t isEqualToString:@"</strong>"] ||
[t isEqualToString:@"<em>"] ||
[t isEqualToString:@"</em>"]) {
replacement = @"";
}
[result replaceOccurrencesOfString:t
withString:replacement
options:NSLiteralSearch
range:NSMakeRange(0, result.length)];
}

// Remove multi-spaces and line breaks
return = [result stringByRemovingNewLinesAndWhitespace];
}

关于html - NSString 移除 html 标签,保留 <Text in angle brackets>,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30379279/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com