gpt4 book ai didi

ios - 优化字符串解析

转载 作者:技术小花猫 更新时间:2023-10-29 11:19:20 25 4
gpt4 key购买 nike

我有一个解析“txf”格式的数据文件的需求。这些文件可能包含 1000 多个条目。由于格式像 JSON 一样定义明确,我想制作一个像 JSON 一样的通用解析器,它可以序列化和反序列化 txf 文件。

与 JSON 不同,标记无法识别对象或数组。如果出现具有相同标签的条目,我们需要将其视为数组。

  1. # 标记对象的开始。
  2. $ 标记一个对象的成员
  3. / 标记一个对象的结束

以下是一个示例“txf”文件

#Employees
$LastUpdated=2015-02-01 14:01:00
#Employee
$Id=1
$Name=Employee 01
#Departments
$LastUpdated=2015-02-01 14:01:00
#Department
$Id=1
$Name=Department Name
/Department
/Departments
/Employee
#Employee
/Employee
/Employees

我能够创建一个通用的 TXF Parser使用 NSScanner。但是随着条目的增多,性能需要进行更多的调整。

我编写了作为 plist 获得的基础对象,并再次将其性能与我编写的解析器进行了比较。我的解析器比 plist 解析器慢大约 10 倍。

虽然plist的文件大小是txf的5倍,标记字符也更多,但我觉得还有很大的优化空间。

非常感谢在这方面的任何帮助。

编辑:包括解析代码

static NSString *const kArray    = @"TXFArray";
static NSString *const kBodyText = @"TXFText";

@interface TXFParser ()

/*Temporary variable to hold values of an object*/
@property (nonatomic, strong) NSMutableDictionary *dict;

/*An array to hold the hierarchial data of all nodes encountered while parsing*/
@property (nonatomic, strong) NSMutableArray *stack;

@end

@implementation TXFParser

#pragma mark - Getters

- (NSMutableArray *)stack{
if (!_stack) {
_stack = [NSMutableArray new];
}return _stack;
}

#pragma mark -

- (id)objectFromString:(NSString *)txfString{
[txfString enumerateLinesUsingBlock:^(NSString *string, BOOL *stop) {
if ([string hasPrefix:@"#"]) {
[self didStartParsingTag:[string substringFromIndex:1]];
}else if([string hasPrefix:@"$"]){
[self didFindKeyValuePair:[string substringFromIndex:1]];
}else if([string hasPrefix:@"/"]){
[self didEndParsingTag:[string substringFromIndex:1]];
}else{
//[self didFindBodyValue:string];
}
}]; return self.dict;
}

#pragma mark -

- (void)didStartParsingTag:(NSString *)tag{
[self parserFoundObjectStartForKey:tag];
}

- (void)didFindKeyValuePair:(NSString *)tag{
NSArray *components = [tag componentsSeparatedByString:@"="];
NSString *key = [components firstObject];
NSString *value = [components lastObject];

if (key.length) {
self.dict[key] = value?:@"";
}
}

- (void)didFindBodyValue:(NSString *)bodyString{
if (!bodyString.length) return;
bodyString = [bodyString stringByTrimmingCharactersInSet:[NSCharacterSet illegalCharacterSet]];
if (!bodyString.length) return;

self.dict[kBodyText] = bodyString;
}

- (void)didEndParsingTag:(NSString *)tag{
[self parserFoundObjectEndForKey:tag];
}

#pragma mark -

- (void)parserFoundObjectStartForKey:(NSString *)key{
self.dict = [NSMutableDictionary new];
[self.stack addObject:self.dict];
}

- (void)parserFoundObjectEndForKey:(NSString *)key{
NSDictionary *dict = self.dict;

//Remove the last value of stack
[self.stack removeLastObject];

//Load the previous object as dict
self.dict = [self.stack lastObject];

//The stack has contents, then we need to append objects
if ([self.stack count]) {
[self addObject:dict forKey:key];
}else{
//This is root object,wrap with key and assign output
self.dict = (NSMutableDictionary *)[self wrapObject:dict withKey:key];
}
}

#pragma mark - Add Objects after finding end tag

- (void)addObject:(id)dict forKey:(NSString *)key{
//If there is no value, bailout
if (!dict) return;

//Check if the dict already has a value for key array.
NSMutableArray *array = self.dict[kArray];

//If array key is not found look for another object with same key
if (array) {
//Array found add current object after wrapping with key
NSDictionary *currentDict = [self wrapObject:dict withKey:key];
[array addObject:currentDict];
}else{
id prevObj = self.dict[key];
if (prevObj) {
/*
There is a prev value for the same key. That means we need to wrap that object in a collection.
1. Remove the object from dictionary,
2. Wrap it with its key
3. Add the prev and current value to array
4. Save the array back to dict
*/
[self.dict removeObjectForKey:key];
NSDictionary *prevDict = [self wrapObject:prevObj withKey:key];
NSDictionary *currentDict = [self wrapObject:dict withKey:key];
self.dict[kArray] = [@[prevDict,currentDict] mutableCopy];

}else{
//Simply add object to dict
self.dict[key] = dict;
}
}
}

/*Wraps Object with a key for the serializer to generate txf tag*/
- (NSDictionary *)wrapObject:(id)obj withKey:(NSString *)key{
if (!key ||!obj) {
return @{};
}
return @{key:obj};
}

编辑 2:

样本 TXF file超过 1000 个条目。

最佳答案

您是否考虑过使用拉式读取和递归处理?这样就无需将整个文件读入内存,也无需管理一些自己的堆栈来跟踪您解析的深度。

下面是 Swift 中的示例。该示例适用于您的示例“txf”,但不适用于保管箱版本;您的一些“成员”跨越多行。如果这是一个要求,它可以很容易地实现到 switch/case "$" 部分。但是,我也没有看到您自己的代码处理这个问题。此外,该示例还没有遵循正确的 Swift 错误处理(parse 方法需要一个额外的 NSError 参数)

import Foundation

extension String
{
public func indexOfCharacter(char: Character) -> Int? {
if let idx = find(self, char) {
return distance(self.startIndex, idx)
}
return nil
}

func substringToIndex(index:Int) -> String {
return self.substringToIndex(advance(self.startIndex, index))
}
func substringFromIndex(index:Int) -> String {
return self.substringFromIndex(advance(self.startIndex, index))
}
}


func parse(aStreamReader:StreamReader, parentTagName:String) -> Dictionary<String,AnyObject> {
var dict = Dictionary<String,AnyObject>()

while let line = aStreamReader.nextLine() {

let firstChar = first(line)
let theRest = dropFirst(line)

switch firstChar! {
case "$":
if let idx = theRest.indexOfCharacter("=") {
let key = theRest.substringToIndex(idx)
let value = theRest.substringFromIndex(idx+1)

dict[key] = value
} else {
println("no = sign")
}
case "#":
let subDict = parse(aStreamReader,theRest)

var list = dict[theRest] as? [Dictionary<String,AnyObject>]
if list == nil {
dict[theRest] = [subDict]
} else {
list!.append(subDict)
}
case "/":
if theRest != parentTagName {
println("mismatch... [\(theRest)] != [\(parentTagName)]")
} else {
return dict
}
default:
println("mismatch... [\(line)]")
}
}

println("shouldn't be here...")
return dict
}


var data : Dictionary<String,AnyObject>?

if let aStreamReader = StreamReader(path: "/Users/taoufik/Desktop/QuickParser/QuickParser/file.txf") {

if var line = aStreamReader.nextLine() {
let tagName = line.substringFromIndex(advance(line.startIndex, 1))
data = parse(aStreamReader, tagName)
}

aStreamReader.close()
}

println(JSON(data!))

StreamReader 是从https://stackoverflow.com/a/24648951/95976 借来的

编辑

编辑2

我用 C++11 重写了上面的代码,并使用 dropbox 上的更新文件让它在 2012 MBA I5 上运行不到 0.05 秒( Release模式)。我怀疑 NSDictionaryNSArray 一定有一些惩罚。下面的代码可以编译成一个objective-c项目(文件需要扩展名为.mm):

#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <map>
#include <vector>

using namespace std;


class benchmark {

private:
typedef std::chrono::high_resolution_clock clock;
typedef std::chrono::milliseconds milliseconds;

clock::time_point start;

public:
benchmark(bool startCounting = true) {
if(startCounting)
start = clock::now();
}

void reset() {
start = clock::now();
}

double elapsed() {
milliseconds ms = std::chrono::duration_cast<milliseconds>(clock::now() - start);
double elapsed_secs = ms.count() / 1000.0;
return elapsed_secs;
}
};

struct obj {
map<string,string> properties;
map<string,vector<obj>> subObjects;
};

obj parse(ifstream& stream, string& parentTagName) {
obj obj;
string line;
while (getline(stream, line))
{
auto firstChar = line[0];
auto rest = line.substr(1);

switch (firstChar) {
case '$': {
auto idx = rest.find_first_of('=');

if (idx == -1) {
ostringstream o;
o << "no = sign: " << line;
throw o.str();
}
auto key = rest.substr(0,idx);
auto value = rest.substr(idx+1);
obj.properties[key] = value;
break;
}
case '#': {
auto subObj = parse(stream, rest);
obj.subObjects[rest].push_back(subObj);
break;
}
case '/':
if(rest != parentTagName) {
ostringstream o;
o << "mismatch end of object " << rest << " != " << parentTagName;
throw o.str();
} else {
return obj;
}
break;
default:
ostringstream o;
o << "mismatch line " << line;
throw o.str();
break;
}

}

throw "I don't know why I'm here. Probably because the file is missing an end of object marker";
}


void visualise(obj& obj, int indent = 0) {
for(auto& property : obj.properties) {
cout << string(indent, '\t') << property.first << " = " << property.second << endl;
}

for(auto& subObjects : obj.subObjects) {
for(auto& subObject : subObjects.second) {
cout << string(indent, '\t') << subObjects.first << ": " << endl;
visualise(subObject, indent + 1);
}
}
}

int main(int argc, const char * argv[]) {
try {
obj result;

benchmark b;
ifstream stream("/Users/taoufik/Desktop/QuickParser/QuickParser/Members.txf");
string line;
if (getline(stream, line))
{
string tagName = line.substr(1);
result = parse(stream, tagName);
}

cout << "elapsed " << b.elapsed() << " ms" << endl;

visualise(result);

}catch(string s) {
cout << "error " << s;
}

return 0;
}

编辑3

请参阅完整代码 C++ 的链接:https://github.com/tofi9/TxfParser

关于ios - 优化字符串解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28260743/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com