gpt4 book ai didi

objective-c - 在大型 NSString 中有效地找到许多关键字中的第一个

转载 作者:搜寻专家 更新时间:2023-10-30 19:42:13 25 4
gpt4 key购买 nike

我需要在大型 NSString 中找到所有关键字(用于解析源代码),而我当前的实现速度太慢,但我不确定如何改进它。

我正在使用 NSRegularExpression,假设它比我能写的任何东西都更优化,但性能比我预期的要慢。有谁知道更快的实现方法?

目标字符串将包含 utf-8 字符,但关键字本身将始终是纯字母数字 ascii。我想这可以用来优化很多东西?

@implementation MyClass

// i'm storing the regular expression in a static variable, since it never changes and I need to re-use it often
static NSRegularExpression *keywordsExpression;

+ (void)initialize
{
[super initialize];

NSArray *keywords = [NSArray arrayWithObjects:@"accumsan", @"adipiscing", @"aliquam", @"aliquet", @"amet", @"ante", @"arcu", @"at", @"commodo", @"congue", @"consectetur", @"consequat", @"convallis", @"cras", @"curabitur", @"cursus", @"dapibus", @"diam", @"dolor", @"dui", @"elit", @"enim", @"erat", @"eros", @"est", @"et", @"eu", @"felis", @"fermentum", @"gravida", @"iaculis", @"id", @"imperdiet", @"integer", @"ipsum", @"lacinia", @"lectus", @"leo", nil];

NSString *pattern = [NSString stringWithFormat:@"\\b(%@)\\b", [keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b
keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL];
}

// this method will be called in quick succession, I need it to be a able to run tens
// of thousands of times per second. The target string is big (50KB or so), but the
// search range is short, rarely more than 30 characters
- (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range
{
return [keywordsExpression rangeOfFirstMatchInString:string options:0 range:range];
}

@end

编辑 根据@CodeBrickie 的回答,我更新了我的代码以对整个字符串执行一次正则表达式搜索,并将匹配项保存到缓存的 NSIndexSet,然后每次调用该方法时,它都会在 NSIndexSet 中搜索关键字范围,而不是搜索字符串。结果快了一个数量级:

@implementation MyClass

static NSRegularExpression *keywordsExpression;
static NSIndexSet *keywordIndexes = nil;

+ (void)initialize
{
[super initialize];

NSArray *keywords = [NSArray arrayWithObjects:@"accumsan", @"adipiscing", @"aliquam", @"aliquet", @"amet", @"ante", @"arcu", @"at", @"commodo", @"congue", @"consectetur", @"consequat", @"convallis", @"cras", @"curabitur", @"cursus", @"dapibus", @"diam", @"dolor", @"dui", @"elit", @"enim", @"erat", @"eros", @"est", @"et", @"eu", @"felis", @"fermentum", @"gravida", @"iaculis", @"id", @"imperdiet", @"integer", @"ipsum", @"lacinia", @"lectus", @"leo", nil];

NSString *pattern = [NSString stringWithFormat:@"\\b(%@)\\b", [keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b
keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL];
}

- (void)prepareToFindKeywordsInString:(NSString *)string
{
NSMutableIndexSet *keywordIndexesMutable = [[NSIndexSet indexSet] mutableCopy];
[keywordsExpression enumerateMatchesInString:string options:0 range:NSMakeRange(0, string.length) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
[keywordIndexesMutable addIndexesInRange:match.range];
}];

keywordIndexes = [keywordIndexesMutable copy];
}

- (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range
{
NSUInteger foundKeywordMax = (foundCharacterSetRange.location == NSNotFound) ? string.length : foundCharacterSetRange.location;
NSRange foundKeywordRange = NSMakeRange(NSNotFound, 0);
for (NSUInteger index = startingAt; index < foundKeywordMax; index++) {
if ([keywordIndexes containsIndex:index]) {
if (foundKeywordRange.location == NSNotFound) {
foundKeywordRange.location = index;
foundKeywordRange.length = 1;
} else {
foundKeywordRange.length++;
}
} else {
if (foundKeywordRange.location != NSNotFound) {
break;
}
}
}

return foundKeywordRange;
}

@end

这似乎运作良好,并且性能达到了我想要的范围。不过,我想再等一会儿,看看是否有更多建议,然后再接受这个建议。

最佳答案

因为您需要关键字及其范围,所以我会使用 enumerateMatchesInString:options:range:usingBlock: 并实现一个 block ,将关键字作为键添加,范围作为值添加到NSMutableDictionary.

所以你只有一次调用整个字符串和所有关键字及其在该调用之后在字典中的范围。

关于objective-c - 在大型 NSString 中有效地找到许多关键字中的第一个,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8180231/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com