c# - 为什么这个 System.IO.Pipelines 代码比基于 Stream 的代码慢得多？-6ren

c# - 为什么这个 System.IO.Pipelines 代码比基于 Stream 的代码慢得多？

转载作者：行者123 更新时间：2023-12-03 14:40:25

27

4

我写了一个小解析程序来比较旧的 System.IO.Stream和更新的 System.IO.Pipelines在 .NET Core 中。我期望管道代码具有相同的速度或更快。但是，它慢了大约 40%。
程序很简单:它在一个 100Mb 的文本文件中搜索关键字，并返回关键字的行号。这是流版本:

public static async Task<int> GetLineNumberUsingStreamAsync(
    string file,
    string searchWord)
{
    using var fileStream = File.OpenRead(file);
    using var lines = new StreamReader(fileStream, bufferSize: 4096);

    int lineNumber = 1;
    // ReadLineAsync returns null on stream end, exiting the loop
    while (await lines.ReadLineAsync() is string line)
    {
        if (line.Contains(searchWord))
            return lineNumber;

        lineNumber++;
    }
    return -1;
}

我希望上面的流代码比下面的管道代码慢，因为流代码将字节编码为 StreamReader 中的字符串。管道代码通过对字节进行操作来避免这种情况:

public static async Task<int> GetLineNumberUsingPipeAsync(string file, string searchWord)
{
    var searchBytes = Encoding.UTF8.GetBytes(searchWord);
    using var fileStream = File.OpenRead(file);
    var pipe = PipeReader.Create(fileStream, new StreamPipeReaderOptions(bufferSize: 4096));

    var lineNumber = 1;
    while (true)
    {
        var readResult = await pipe.ReadAsync().ConfigureAwait(false);
        var buffer = readResult.Buffer;

        if(TryFindBytesInBuffer(ref buffer, searchBytes, ref lineNumber))
        {
            return lineNumber;
        }

        pipe.AdvanceTo(buffer.End);

        if (readResult.IsCompleted) break;
    }

    await pipe.CompleteAsync();

    return -1;
}

以下是相关的辅助方法:

/// <summary>
/// Look for `searchBytes` in `buffer`, incrementing the `lineNumber` every
/// time we find a new line.
/// </summary>
/// <returns>true if we found the searchBytes, false otherwise</returns>
static bool TryFindBytesInBuffer(
    ref ReadOnlySequence<byte> buffer,
    in ReadOnlySpan<byte> searchBytes,
    ref int lineNumber)
{
    var bufferReader = new SequenceReader<byte>(buffer);
    while (TryReadLine(ref bufferReader, out var line))
    {
        if (ContainsBytes(ref line, searchBytes))
            return true;

        lineNumber++;
    }
    return false;
}

static bool TryReadLine(
    ref SequenceReader<byte> bufferReader,
    out ReadOnlySequence<byte> line)
{
    var foundNewLine = bufferReader.TryReadTo(out line, (byte)'\n', advancePastDelimiter: true);
    if (!foundNewLine)
    {
        line = default;
        return false;
    }

    return true;
}

static bool ContainsBytes(
    ref ReadOnlySequence<byte> line,
    in ReadOnlySpan<byte> searchBytes)
{
    return new SequenceReader<byte>(line).TryReadTo(out var _, searchBytes);
}

我正在使用 SequenceReader<byte>以上是因为我的理解是它比 ReadOnlySequence<byte> 更智能/更快;当它可以在单个 Span<byte> 上运行时，它有一个快速路径.
以下是基准测试结果 (.NET Core 3.1)。完整代码和 BenchmarkDotNet 结果可用 in this repo .

GetLineNumberWithStreamAsync - 435.6 毫秒 同时分配 366.19 MB

GetLineNumberUsingPipeAsync - 619.8 毫秒 同时分配 9.28 MB

我在管道代码中做错了什么吗？
更新 : Evk 已经回答了这个问题。应用他的修复后，这里是新的基准数字:

GetLineNumberWithStreamAsync - 452.2 毫秒 同时分配 366.19 MB

GetLineNumberWithPipeAsync - 203.8 毫秒 虽然分配了 9.28 MB

最佳答案

我相信原因是实现 SequenceReader.TryReadTo . Here is the source code这种方法的。它使用非常简单的算法(读取第一个字节的匹配，然后检查该匹配之后是否所有后续字节，如果不是 - 向前推进 1 个字节并重复)，并注意在此实现中有相当多的方法称为“慢” ( IsNextSlow ， TryReadToSlow 等等)，所以至少在某些情况下，在某些情况下，它会回退到一些缓慢的路径。它还必须处理可能包含多个段的事实序列，并保持位置。
在您的情况下，您可以避免使用 SequenceReader专门用于搜索匹配项(但将其保留为实际阅读行)，例如进行此微小更改(在这种情况下 TryReadTo 的重载也更有效):

private static bool TryReadLine(ref SequenceReader<byte> bufferReader, out ReadOnlySpan<byte> line) {
    // note that both `match` and `line` are now `ReadOnlySpan` and not `ReadOnlySequence`
    var foundNewLine = bufferReader.TryReadTo(out ReadOnlySpan<byte> match, (byte) '\n', advancePastDelimiter: true);

    if (!foundNewLine) {
        line = default;
        return false;
    }

    line = match;
    return true;
}

然后:

private static bool ContainsBytes(ref ReadOnlySpan<byte> line, in ReadOnlySpan<byte> searchBytes) {
    // line is now `ReadOnlySpan` so we can use efficient `IndexOf` method
    return line.IndexOf(searchBytes) >= 0;
}

这将使您的管道代码比流代码运行得更快。

关于c# - 为什么这个 System.IO.Pipelines 代码比基于 Stream 的代码慢得多？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64283938/

27

4

0

文章推荐： Django 3.x - 哪个 ASGI 服务器(Uvicorn vs. Daphne)

文章推荐： terminology - SPMD 和 SIMD 有什么区别？

文章推荐： netbeans - 在 JavaFX 中创建多个阶段

io - 内存映射 IO - IO 设备如何知道值已更改？
IO 设备如何知道属于它的内存中的值在memory mapped IO 中发生了变化？？例如，假设内存地址 0 专用于保存 VGA 设备的背景颜色。当我们更改 memory[0] 中的值时，VGA
ios - Facebook iOS iOS SDK登录错误
我目前正在开发一个使用Facebook sdk登录(通过FBLoginView)的iOS应用。一切正常，除了那些拥有较旧版本的facebook的人。当他们按下“使用Facebook登录”按钮时，他
ios - ios ios nsrange char从结束
假设我有: this - is an - example - with some - dashesNSRange将使用`rangeOfString:@“-”拾取“-”的第一个实例，但是如果我只想要最后
ios - 如何从card.io SDK获取国家名称？ -iOS
Card.io SDK提供以下详细信息: 卡号，有效期，月份，年份，CVV和邮政编码。如何从此SDK获取国家名称。 - (void)userDidProvideCreditCardInfo:(Car
ios - iOS 应用程序如何从网络服务下载图片并在安装过程中将它们安装在用户的 iOS 设备上？
iOS 应用程序如何从网络服务下载图片并在安装过程中将它们安装到用户的 iOS 设备上？可能吗？最佳答案您无法控制应用在用户设备上的安装，因此无法在安装过程中下载其他数据。只需在安装后首次启动应
ios - iOS 企业应用程序和 iOS 零售应用程序之间的区别
我曾经开发过一款企业版 iOS 产品，我们公司曾将其出售给大型企业，供他们的员工使用。该应用程序通过 AppStore 提供，企业用户获得了公司特定的配置文件(包含应用程序配置文件)以启用他们有权使
ios - Card.io ios 与本地化集成
我正在尝试将 Card.io SDK 集成到我的 iOS 应用程序中。我想为 CardIO ui 做一个简单的本地化，如更改取消按钮标题或“在此保留信用卡”提示文本。我在 github 上找到了这个
ios - Card.Io iOS 扫描名称
我正在使用 CardIOView 和 CardIOViewDelegate 类，没有可以设置为 YES 的 BOOL 来扫描 collectCardholderName。我可以看到它在 CardIOP
ios - 如何为最近的原生 ios 应用程序设置名称字段？ - iOS
我有一个集成了通话工具包的 voip 应用程序。每次我从我的 voip 应用程序调用时，都会在 native 电话应用程序中创建一个新的最近通话记录。我在 voip 应用程序中也有自定义联系人(电话应
ios - iOS 应用程序如何在应用程序打开时知道键盘是否已经在屏幕上(iOS 多任务处理)
iOS 应用程序如何知道应用程序打开时屏幕上是否已经有键盘？应用程序运行后，它可以接收键盘显示/隐藏通知。但是，如果应用程序在分屏模式下作为辅助应用程序打开，而主应用程序已经显示键盘，则辅助应用程序不
ios - iOS 上的图像 IO 错误
我在模拟器中收到以下错误: ImageIO: CGImageReadSessionGetCachedImageBlockData *** CGImageReadSessionGetCachedIm
ios - iOS 设备与非 iOS 设备通信
如 Apple 文档所示，可以通过 EAAccessory Framework 与经过认证的配件(由 Apple 认证)进行通信。但是我有点困惑，因为一些帖子告诉我它也可以通过 CoreBluetoo
ios - (iOS) 直接在 iOS 设备上查看日志消息的方式？
尽管现在的调试器已经很不错了，但有时找出应用程序中正在发生的事情的最好方法仍然是古老的 NSLog。当您连接到计算机时，这样做很容易； Xcode 会帮助弹出日志查看器面板，然后就可以了。当您不在办公
ios - Kontakt.io iOS - 按名称识别信标
在我的 iOS 应用程序中，我定义了一些兴趣点。其中一些有一个 Kontakt.io 信标的名称，它绑定(bind)到一个特定的 PoI(我的意思是通常贴在信标标签上的名称)。现在我想在附近发现信标，
ios - Trigger.io iOS 插件从回调返回数据
我正在为警报提示创建一个 trigger.io 插件。尝试从警报提示返回数据。这是我的代码: // Prompt + (void)show_prompt:(ForgeTask*)task{
ios - iOS 4、iOS 5 和 iOS 6 的推送通知有何不同？
您好，我是 Apple iOS 的新手。我阅读并搜索了很多关于推送通知的文章，但我没有发现任何关于 APNS 从 io4 到 ios 6 的新更新的信息。任何人都可以向我提供 APNS 如何在 ios
ios - iOS 8、iOS 9、iOS 10 和 iOS 11 上 UITabBar 的高度是多少？
UITabBar 的高度似乎在 iOS 7 和 8/9/10/11 之间发生了变化。我发布这个问题是为了让其他人轻松找到答案。那么:在 iPhone 和 iPad 上的 iOS 8/9/10/11
ios - 最佳实践。通过支持 iOS 5、iOS 6 和 iOS 7 UI，使 iOS 应用程序变得通用
我想我可以针对不同的 iOS 版本使用不同的 Storyboard。由于 UI 的差异，我将创建下一个 Storyboard: Main_iPhone.storyboard Main_iPad.st
ios - 如何使用 iOS 中的视觉控件在 ios 中选择音轨的一部分？
我正在写一些东西，我将使用设备的 iTunes 库中的一部分音轨来覆盖 2 个视频的组合，例如: AVMutableComposition* mixComposition = [[AVMutableC
ios - iOS 模拟器中存在头文件，但 iOS 设备上不存在...？
我创建了一个简单的 iOS 程序，可以顺利编译并在 iPad 模拟器上运行良好。当我告诉 XCode 4 使用我连接的 iPad 设备时，无法编译相同的程序。问题似乎是当我尝试使用附加的 iPad 时

首页

博学

6Ren·AI

商城

c# - 为什么这个 System.IO.Pipelines 代码比基于 Stream 的代码慢得多？