gpt4 book ai didi

macos - 在 Swift 中逐行读取文件/URL

转载 作者:IT王子 更新时间:2023-10-29 04:57:53 28 4
gpt4 key购买 nike

我正在尝试读取 NSURL 中给定的文件并将其加载到数组中,项目由换行符 \n 分隔。

这是我目前的做法:

var possList: NSString? = NSString.stringWithContentsOfURL(filePath.URL) as? NSString
if var list = possList {
list = list.componentsSeparatedByString("\n") as NSString[]
return list
}
else {
//return empty list
}

出于几个原因,我对此不是很满意。第一,我正在处理大小从几千字节到数百 MB 不等的文件。可以想象,使用这么大的字符串既慢又笨重。其次,这会在执行时卡住 UI——同样不好。

我研究过在一个单独的线程中运行这段代码,但我一直遇到麻烦,此外,它仍然没有解决处理巨大字符串的问题。

我想做的是遵循以下伪代码的思路:

var aStreamReader = new StreamReader(from_file_or_url)
while aStreamReader.hasNextLine == true {
currentline = aStreamReader.nextLine()
list.addItem(currentline)
}

我如何在 Swift 中完成此操作?

关于我正在读取的文件的几点说明:所有文件均由短字符串(<255 个字符)组成,字符串由 \n\分隔r\n。文件的长度从大约 100 行到超过 5000 万行不等。它们可能包含欧洲字符和/或带有重音符号的字符。

最佳答案

(该代码目前适用于 Swift 2.2/Xcode 7.3。如果有人需要,可以在编辑历史中找到旧版本。最后提供了 Swift 3 的更新版本。)

以下 Swift 代码很大程度上受到了各种答案的启发 How to read data from NSFileHandle line by line? .它以 block 的形式从文件中读取,并将完整的行转换为字符串。

默认行分隔符(\n)、字符串编码(UTF-8)和 block 大小(4096)可以使用可选参数进行设置。

class StreamReader  {

let encoding : UInt
let chunkSize : Int

var fileHandle : NSFileHandle!
let buffer : NSMutableData!
let delimData : NSData!
var atEof : Bool = false

init?(path: String, delimiter: String = "\n", encoding : UInt = NSUTF8StringEncoding, chunkSize : Int = 4096) {
self.chunkSize = chunkSize
self.encoding = encoding

if let fileHandle = NSFileHandle(forReadingAtPath: path),
delimData = delimiter.dataUsingEncoding(encoding),
buffer = NSMutableData(capacity: chunkSize)
{
self.fileHandle = fileHandle
self.delimData = delimData
self.buffer = buffer
} else {
self.fileHandle = nil
self.delimData = nil
self.buffer = nil
return nil
}
}

deinit {
self.close()
}

/// Return next line, or nil on EOF.
func nextLine() -> String? {
precondition(fileHandle != nil, "Attempt to read from closed file")

if atEof {
return nil
}

// Read data chunks from file until a line delimiter is found:
var range = buffer.rangeOfData(delimData, options: [], range: NSMakeRange(0, buffer.length))
while range.location == NSNotFound {
let tmpData = fileHandle.readDataOfLength(chunkSize)
if tmpData.length == 0 {
// EOF or read error.
atEof = true
if buffer.length > 0 {
// Buffer contains last line in file (not terminated by delimiter).
let line = NSString(data: buffer, encoding: encoding)

buffer.length = 0
return line as String?
}
// No more lines.
return nil
}
buffer.appendData(tmpData)
range = buffer.rangeOfData(delimData, options: [], range: NSMakeRange(0, buffer.length))
}

// Convert complete line (excluding the delimiter) to a string:
let line = NSString(data: buffer.subdataWithRange(NSMakeRange(0, range.location)),
encoding: encoding)
// Remove line (and the delimiter) from the buffer:
buffer.replaceBytesInRange(NSMakeRange(0, range.location + range.length), withBytes: nil, length: 0)

return line as String?
}

/// Start reading from the beginning of file.
func rewind() -> Void {
fileHandle.seekToFileOffset(0)
buffer.length = 0
atEof = false
}

/// Close the underlying file. No reading must be done after calling this method.
func close() -> Void {
fileHandle?.closeFile()
fileHandle = nil
}
}

用法:

if let aStreamReader = StreamReader(path: "/path/to/file") {
defer {
aStreamReader.close()
}
while let line = aStreamReader.nextLine() {
print(line)
}
}

您甚至可以将读取器与 for-in 循环一起使用

for line in aStreamReader {
print(line)
}

通过实现 SequenceType 协议(protocol)(比较 http://robots.thoughtbot.com/swift-sequences ):

extension StreamReader : SequenceType {
func generate() -> AnyGenerator<String> {
return AnyGenerator {
return self.nextLine()
}
}
}

Swift 3/Xcode 8 beta 6 的更新:也“现代化”到使用 guard 和新的 Data 值类型:

class StreamReader  {

let encoding : String.Encoding
let chunkSize : Int
var fileHandle : FileHandle!
let delimData : Data
var buffer : Data
var atEof : Bool

init?(path: String, delimiter: String = "\n", encoding: String.Encoding = .utf8,
chunkSize: Int = 4096) {

guard let fileHandle = FileHandle(forReadingAtPath: path),
let delimData = delimiter.data(using: encoding) else {
return nil
}
self.encoding = encoding
self.chunkSize = chunkSize
self.fileHandle = fileHandle
self.delimData = delimData
self.buffer = Data(capacity: chunkSize)
self.atEof = false
}

deinit {
self.close()
}

/// Return next line, or nil on EOF.
func nextLine() -> String? {
precondition(fileHandle != nil, "Attempt to read from closed file")

// Read data chunks from file until a line delimiter is found:
while !atEof {
if let range = buffer.range(of: delimData) {
// Convert complete line (excluding the delimiter) to a string:
let line = String(data: buffer.subdata(in: 0..<range.lowerBound), encoding: encoding)
// Remove line (and the delimiter) from the buffer:
buffer.removeSubrange(0..<range.upperBound)
return line
}
let tmpData = fileHandle.readData(ofLength: chunkSize)
if tmpData.count > 0 {
buffer.append(tmpData)
} else {
// EOF or read error.
atEof = true
if buffer.count > 0 {
// Buffer contains last line in file (not terminated by delimiter).
let line = String(data: buffer as Data, encoding: encoding)
buffer.count = 0
return line
}
}
}
return nil
}

/// Start reading from the beginning of file.
func rewind() -> Void {
fileHandle.seek(toFileOffset: 0)
buffer.count = 0
atEof = false
}

/// Close the underlying file. No reading must be done after calling this method.
func close() -> Void {
fileHandle?.closeFile()
fileHandle = nil
}
}

extension StreamReader : Sequence {
func makeIterator() -> AnyIterator<String> {
return AnyIterator {
return self.nextLine()
}
}
}

关于macos - 在 Swift 中逐行读取文件/URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24581517/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com