gpt4 book ai didi

ios - 如何在后台线程(Swift)上有效地将大文件写入磁盘

转载 作者:行者123 更新时间:2023-12-02 05:49:43 26 4
gpt4 key购买 nike

更新

我已经解决并消除了分散注意力的错误。请阅读整篇文章,如果还有任何问题,请随时发表评论。

背景

我正在尝试使用 Swift 2.0、GCD 和完成处理程序在 iOS 上将相对较大的文件(视频)写入磁盘。我想知道是否有更有效的方法来执行此任务。任务需要在不阻塞主 UI 的情况下完成,同时使用完成逻辑,并确保操作尽快发生。我有一个带有 NSData 属性的自定义对象,所以我目前正在尝试在 NSData 上使用扩展。例如,替代解决方案可能包括使用 NSFilehandle 或 NSStreams 以及某种形式的线程安全行为,这会导致比我当前解决方案所基于的 NSData writeToURL 函数更快的吞吐量。

NSData 有什么问题?

请注意以下来自 NSData 类引用的讨论,( Saving Data )。我确实对我的临时目录执行了写入,但是我遇到问题的主要原因是我在处理大文件时可以看到 UI 中出现明显的延迟。这种滞后正是因为 NSData 不是异步的(Apple Docs 指出原子写入会导致“大”文件的性能问题 ~ > 1mb)。因此,在处理大文件时,NSData 方法中的任何内部机制都在起作用。

我做了一些更多的挖掘,并从 Apple 找到了这个信息......“这种方法非常适合将 data://URL 转换为 NSData 对象,也可以用于同步读取 短文件。如果您需要潜在读取大文件 ,使用 inputStreamWithURL: 打开一个流,然后一次读取一个文件。” ( NSData Class Reference, Objective-C, +dataWithContentsOfURL )。此信息似乎暗示,如果将 writeToURL 移动到后台线程(如@jtbandes 所建议)是不够的,我可以尝试使用流在后台线程上写出文件。

The NSData class and its subclasses provide methods to quickly and easily save their contents to disk. To minimize the risk of data loss, these methods provide the option of saving the data atomically. Atomic writes guarantee that the data is either saved in its entirety, or it fails completely. The atomic write begins by writing the data to a temporary file. If this write succeeds, then the method moves the temporary file to its final location.

While atomic write operations minimize the risk of data loss due to corrupt or partially-written files, they may not be appropriate when writing to a temporary directory, the user’s home directory or other publicly accessible directories. Any time you work with a publicly accessible file, you should treat that file as an untrusted and potentially dangerous resource. An attacker may compromise or corrupt these files. The attacker can also replace the files with hard or symbolic links, causing your write operations to overwrite or corrupt other system resources.

Avoid using the writeToURL:atomically: method (and the related methods) when working inside a publicly accessible directory. Instead initialize an NSFileHandle object with an existing file descriptor and use the NSFileHandle methods to securely write the file.



其他替代品

article在 objc.io 的并发编程上提供了有关“高级:后台文件 I/O”的有趣选项。一些选项也涉及使用 InputStream。 Apple 也有一些对 reading and writing files asynchronously 的旧引用.我发布这个问题是为了期待 Swift 替代品。

适当答案的示例

以下是可能满足此类问题的适当答案的示例。 (取自流编程指南, Writing To Output Streams)

使用 NSOutputStream 实例写入输出流需要几个步骤:
  • 创建并初始化一个 NSOutputStream 实例
    写入数据的存储库。还设置了一个委托(delegate)。
  • 安排
    运行循环上的流对象并打开流。
  • 处理事件
    流对象向其委托(delegate)报告。
  • 如果流对象
    已将数据写入内存,通过请求获取数据
    NSStreamDataWrittenToMemoryStreamKey 属性。
  • 当没有更多
    要写入的数据,处理流对象。

  • I am looking for the most proficient algorithm that applies to writing extremely large files to iOS using Swift, APIs, or possibly even C/ObjC would suffice. I can transpose the algorithm into appropriate Swift compatible constructs.



    Nota Bene

    I understand the informational error below. It is included for completeness. This question is asking whether or not there is a better algorithm to use for writing large files to disk with a guaranteed dependency sequence (e.g. NSOperation dependencies). If there is please provide enough information (description/sample for me to reconstruct pertinent Swift 2.0 compatible code). Please advise if I am missing any information that would help answer the question.



    分机注意事项

    I've added a completion handler to the base writeToURL to ensure that no unintended resource sharing occurs. My dependent tasks that use the file should never face a race condition.


    extension NSData {

    func writeToURL(named:String, completion: (result: Bool, url:NSURL?) -> Void) {

    let filePath = NSTemporaryDirectory() + named
    //var success:Bool = false
    let tmpURL = NSURL( fileURLWithPath: filePath )
    weak var weakSelf = self


    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), {
    //write to URL atomically
    if weakSelf!.writeToURL(tmpURL, atomically: true) {

    if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {
    completion(result: true, url:tmpURL)
    } else {
    completion (result: false, url:tmpURL)
    }
    }
    })

    }
    }

    此方法用于使用以下方法处理来自 Controller 的自定义对象数据:
    var items = [AnyObject]()
    if let video = myCustomClass.data {

    //video is of type NSData
    video.writeToURL("shared.mp4", completion: { (result, url) -> Void in
    if result {
    items.append(url!)
    if items.count > 0 {

    let sharedActivityView = UIActivityViewController(activityItems: items, applicationActivities: nil)

    self.presentViewController(sharedActivityView, animated: true) { () -> Void in
    //finished
    }
    }
    }
    })
    }

    结论

    Apple 文档位于 Core Data Performance提供一些关于处理内存压力和管理 BLOB 的好建议。这确实是一篇文章,其中包含大量行为线索以及如何缓解应用程序中大文件的问题。现在,虽然它特定于 Core Data 而不是文件,但关于原子写入的警告确实告诉我,我应该非常小心地实现以原子方式写入的方法。

    对于大文件,管理写入的唯一安全方法似乎是添加完成处理程序(到 write 方法)并在主线程上显示事件 View 。是使用流还是通过修改现有 API 来添加完成逻辑取决于读者。我过去已经完成了这两项工作,并且正在测试以获得最佳性能。

    在那之前,我正在更改解决方案以从 Core Data 中删除所有二进制数据属性,并用字符串替换它们以将 Assets URL 保存在磁盘上。我还利用 Assets 库和 PHAsset 的内置功能来获取和存储所有相关 Assets URL。当或如果我需要复制任何 Assets ,我将使用标准 API 方法(PHAsset/ Assets 库上的导出方法)和完成处理程序来通知用户主线程上的完成状态。

    (核心数据性能文章中非常有用的片段)

    Reducing Memory Overhead

    It is sometimes the case that you want to use managed objects on a temporary basis, for example to calculate an average value for a particular attribute. This causes your object graph, and memory consumption, to grow. You can reduce the memory overhead by re-faulting individual managed objects that you no longer need, or you can reset a managed object context to clear an entire object graph. You can also use patterns that apply to Cocoa programming in general.

    You can re-fault an individual managed object using NSManagedObjectContext’s refreshObject:mergeChanges: method. This has the effect of clearing its in-memory property values thereby reducing its memory overhead. (Note that this is not the same as setting the property values to nil—the values will be retrieved on demand if the fault is fired—see Faulting and Uniquing.)

    When you create a fetch request you can set includesPropertyValues to NO > to reduce memory overhead by avoiding creation of objects to represent the property values. You should typically only do so, however, if you are sure that either you will not need the actual property data or you already have the information in the row cache, otherwise you will incur multiple trips to the persistent store.

    You can use the reset method of NSManagedObjectContext to remove all managed objects associated with a context and "start over" as if you'd just created it. Note that any managed object associated with that context will be invalidated, and so you will need to discard any references to and re-fetch any objects associated with that context in which you are still interested. If you iterate over a lot of objects, you may need to use local autorelease pool blocks to ensure temporary objects are deallocated as soon as possible.

    If you do not intend to use Core Data’s undo functionality, you can reduce your application's resource requirements by setting the context’s undo manager to nil. This may be especially beneficial for background worker threads, as well as for large import or batch operations.

    Finally, Core Data does not by default keep strong references to managed objects (unless they have unsaved changes). If you have lots of objects in memory, you should determine the owning references. Managed objects maintain strong references to each other through relationships, which can easily create strong reference cycles. You can break cycles by re-faulting objects (again by using the refreshObject:mergeChanges: method of NSManagedObjectContext).

    Large Data Objects (BLOBs)

    If your application uses large BLOBs ("Binary Large OBjects" such as image and sound data), you need to take care to minimize overheads. The exact definition of “small”, “modest”, and “large” is fluid and depends on an application’s usage. A loose rule of thumb is that objects in the order of kilobytes in size are of a “modest” sized and those in the order of megabytes in size are “large” sized. Some developers have achieved good performance with 10MB BLOBs in a database. On the other hand, if an application has millions of rows in a table, even 128 bytes might be a "modest" sized CLOB (Character Large OBject) that needs to be normalized into a separate table.

    In general, if you need to store BLOBs in a persistent store, you should use an SQLite store. The XML and binary stores require that the whole object graph reside in memory, and store writes are atomic (see Persistent Store Features) which means that they do not efficiently deal with large data objects. SQLite can scale to handle extremely large databases. Properly used, SQLite provides good performance for databases up to 100GB, and a single row can hold up to 1GB (although of course reading 1GB of data into memory is an expensive operation no matter how efficient the repository).

    A BLOB often represents an attribute of an entity—for example, a photograph might be an attribute of an Employee entity. For small to modest sized BLOBs (and CLOBs), you should create a separate entity for the data and create a to-one relationship in place of the attribute. For example, you might create Employee and Photograph entities with a one-to-one relationship between them, where the relationship from Employee to Photograph replaces the Employee's photograph attribute. This pattern maximizes the benefits of object faulting (see Faulting and Uniquing). Any given photograph is only retrieved if it is actually needed (if the relationship is traversed).

    It is better, however, if you are able to store BLOBs as resources on the filesystem, and to maintain links (such as URLs or paths) to those resources. You can then load a BLOB as and when necessary.



    注:

    I've moved the logic below into the completion handler (see the code above) and I no longer see any error. As mentioned before this question is about whether or not there is a more performant way to process large files in iOS using Swift.



    当尝试处理生成的 items 数组以传递给 UIActvityViewController 时,使用以下逻辑:

    如果 items.count > 0 {
    让 sharedActivityView = UIActivityViewController(activityItems: items, applicationActivities: nil)
    self.presentViewController(sharedActivityView, animation: true) { () -> Void in
    //完成的}
    }

    我看到以下错误:通信错误:{ count = 1,
    内容 = "XPCErrorDescription"=> { 长度 =
    22、contents = "Connection interrupted"} }> (请注意,我正在寻找更好的设计,而不是这个错误信息的答案)

    最佳答案

    性能取决于数据是否适合 RAM。如果是,那么你应该使用 NSData writeToURLatomically功能打开,这就是你在做什么。
    Apple 关于“写入公共(public)目录”的危险的注释在 iOS 上完全无关,因为没有公共(public)目录。该部分仅适用于 OS X。坦率地说,这也不是很重要。
    因此,只要视频适合 RAM(大约 100MB 是安全限制),您编写的代码就会尽可能高效。
    对于不适合 RAM 的文件,您需要使用流,否则您的应用程序将在将视频保存在内存中时崩溃。要从服务器下载大视频并将其写入磁盘,您应该使用 NSURLSessionDownloadTask .
    通常,流式传输(包括 NSURLSessionDownloadTask )将比 NSData.writeToURL() 慢几个数量级.所以除非你需要,否则不要使用流。 NSData上的所有操作速度非常快,它完全有能力处理数 TB 大小的文件,并在 OS X 上具有出色的性能(iOS 显然不能拥有那么大的文件,但它是具有相同性能的同一类)。

    您的代码中存在一些问题。
    这是错误的:

    let filePath = NSTemporaryDirectory() + named
    而是总是这样做:
    let filePath = NSTemporaryDirectory().stringByAppendingPathComponent(named)
    但这也不理想,您应该避免使用路径(它们有问题且速度慢)。而是使用这样的 URL:
    let tmpDir = NSURL(fileURLWithPath: NSTemporaryDirectory())!
    let fileURL = tmpDir.URLByAppendingPathComponent(named)
    此外,您正在使用路径来检查文件是否存在......不要这样做:
    if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {
    而是使用 NSURL检查它是否存在:
    if fileURL.checkResourceIsReachableAndReturnError(nil) {

    关于ios - 如何在后台线程(Swift)上有效地将大文件写入磁盘,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31965566/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com