c# - 使用 System.Text.Json 异步反序列化列表-6ren

c# - 使用 System.Text.Json 异步反序列化列表

转载作者：行者123 更新时间：2023-12-02 00:13:29

假设我请求一个包含许多对象列表的大型 json 文件。我不希望它们一下子都在内存中，但我宁愿一个一个地阅读和处理它们。所以我需要转一个异步 System.IO.Stream流入 IAsyncEnumerable<T> .如何使用新版 System.Text.Json API来做到这一点？

private async IAsyncEnumerable<T> GetList<T>(Uri url, CancellationToken cancellationToken = default)
{
    using (var httpResponse = await httpClient.GetAsync(url, cancellationToken))
    {
        using (var stream = await httpResponse.Content.ReadAsStreamAsync())
        {
            // Probably do something with JsonSerializer.DeserializeAsync here without serializing the entire thing in one go
        }
    }
}

最佳答案

TL;DR 这不是微不足道的

看起来有人已经 posted full code 了 Utf8JsonStreamReader 结构，它从流中读取缓冲区并将它们提供给 Utf8JsonRreader，允许使用 JsonSerializer.Deserialize<T>(ref newJsonReader, options); 轻松反序列化。代码也不是微不足道的。相关问题是 here ，答案是 here 。

但这还不够 - HttpClient.GetAsync 只有在收到整个响应后才会返回，本质上是在内存中缓冲所有内容。

为了避免这种情况，HttpClient.GetAsync(string,HttpCompletionOption ) 应该与 HttpCompletionOption.ResponseHeadersRead 一起使用。

反序列化循环也应该检查取消 token ，如果有信号则退出或抛出。否则循环将继续，直到整个流被接收和处理。

此代码基于相关答案的示例，并使用 HttpCompletionOption.ResponseHeadersRead 并检查取消标记。它可以解析包含适当项目数组的 JSON 字符串，例如:

[{"prop1":123},{"prop1":234}]

第一次调用 jsonStreamReader.Read() 移动到数组的开头，而第二次调用移动到第一个对象的开头。当检测到数组的结尾 ( ] ) 时，循环本身终止。

private async IAsyncEnumerable<T> GetList<T>(Uri url, CancellationToken cancellationToken = default)
{
    //Don't cache the entire response
    using var httpResponse = await httpClient.GetAsync(url,                               
                                                       HttpCompletionOption.ResponseHeadersRead,  
                                                       cancellationToken);
    using var stream = await httpResponse.Content.ReadAsStreamAsync();
    using var jsonStreamReader = new Utf8JsonStreamReader(stream, 32 * 1024);

    jsonStreamReader.Read(); // move to array start
    jsonStreamReader.Read(); // move to start of the object

    while (jsonStreamReader.TokenType != JsonTokenType.EndArray)
    {
        //Gracefully return if cancellation is requested.
        //Could be cancellationToken.ThrowIfCancellationRequested()
        if(cancellationToken.IsCancellationRequested)
        {
            return;
        }

        // deserialize object
        var obj = jsonStreamReader.Deserialize<T>();
        yield return obj;

        // JsonSerializer.Deserialize ends on last token of the object parsed,
        // move to the first token of next object
        jsonStreamReader.Read();
    }
}

JSON 片段，AKA 流 JSON aka ...*

在事件流或日志记录场景中，将单个 JSON 对象附加到文件中是很常见的，每行一个元素，例如:

{"eventId":1}
{"eventId":2}
...
{"eventId":1234567}

这不是有效的 JSON 文档，但各个片段是有效的。这对于大数据/高并发场景有几个优势。添加新事件只需要在文件中追加一个新行，而不需要解析和重建整个文件。处理，尤其是并行处理更容易，原因有二:

可以一次检索单个元素，只需从流中读取一行即可。

输入文件可以很容易地跨行边界进行分区和拆分，将每个部分提供给单独的工作进程，例如在 Hadoop 集群中，或者只是应用程序中的不同线程:计算拆分点，例如通过将长度除以数量 worker ，然后寻找第一个换行符。将到那时为止的所有内容都提供给单独的 worker 。

使用 StreamReader

执行此分配的方法是使用 TextReader，一次读取一行并使用 JsonSerializer.Deserialize 解析它:

using var reader=new StreamReader(stream);
string line;
//ReadLineAsync() doesn't accept a CancellationToken 
while((line=await reader.ReadLineAsync()) != null)
{
    var item=JsonSerializer.Deserialize<T>(line);
    yield return item;

    if(cancellationToken.IsCancellationRequested)
    {
        return;
    }
}

这比反序列化正确数组的代码简单得多。有两个问题:

ReadLineAsync 不接受取消 token

每次迭代都会分配一个新字符串，这是我们希望通过使用 System.Text.Json 避免的事情之一

这可能已经足够了，因为尝试生成 JsonSerializer.Deserialize 所需的 ReadOnlySpan<Byte> 缓冲区并非易事。

管道和 SequenceReader

为了避免分配，我们需要从流中获取 ReadOnlySpan<byte>。这样做需要使用 System.IO.Pipeline 管道和 SequenceReader 结构。 Steve Gordon 的 An Introduction to SequenceReader 解释了如何使用这个类使用分隔符从流中读取数据。

不幸的是， SequenceReader 是一个 ref 结构，这意味着它不能用于异步或本地方法。这就是为什么史蒂夫戈登在他的文章中创建了一个

private static SequencePosition ReadItems(in ReadOnlySequence<byte> sequence, bool isCompleted)

读取项的方法形成 ReadOnlySequence 并返回结束位置，因此 PipeReader 可以从中恢复。不幸的是，我们想要返回一个 IEnumerable 或 IAsyncEnumerable，并且迭代器方法也不喜欢 in 或 out 参数。

我们可以在 List 或 Queue 中收集反序列化的项目并将它们作为单个结果返回，但这仍然会分配列表、缓冲区或节点，并且必须等待缓冲区中的所有项目在返回之前被反序列化:

private static (SequencePosition,List<T>) ReadItems(in ReadOnlySequence<byte> sequence, bool isCompleted)

我们需要一些像可枚举一样的东西，不需要迭代器方法，使用异步并且不缓冲所有东西。

添加 channel 以生成 IAsyncEnumerable

ChannelReader.ReadAllAsync 返回一个 IAsyncEnumerable。我们可以从不能作为迭代器工作的方法返回一个 ChannelReader 并且仍然产生一个没有缓存的元素流。

调整 Steve Gordon 的代码以使用 channel ，我们得到 ReadItems(ChannelWriter...) 和 ReadLastItem 方法。第一个，一次读取一个项目，直到使用 ReadOnlySpan<byte> itemBytes 换行。这可以由 JsonSerializer.Deserialize 使用。如果 ReadItems 找不到分隔符，它将返回其位置，以便 PipelineReader 可以从流中提取下一个块。

当我们到达最后一个块并且没有其他分隔符时， ReadLastItem` 读取剩余的字节并反序列化它们。

代码几乎与 Steve Gordon 的相同。我们不是写入控制台，而是写入 ChannelWriter。

private const byte NL=(byte)'\n';
private const int MaxStackLength = 128;

private static SequencePosition ReadItems<T>(ChannelWriter<T> writer, in ReadOnlySequence<byte> sequence, 
                          bool isCompleted, CancellationToken token)
{
    var reader = new SequenceReader<byte>(sequence);

    while (!reader.End && !token.IsCancellationRequested) // loop until we've read the entire sequence
    {
        if (reader.TryReadTo(out ReadOnlySpan<byte> itemBytes, NL, advancePastDelimiter: true)) // we have an item to handle
        {
            var item=JsonSerializer.Deserialize<T>(itemBytes);
            writer.TryWrite(item);            
        }
        else if (isCompleted) // read last item which has no final delimiter
        {
            var item = ReadLastItem<T>(sequence.Slice(reader.Position));
            writer.TryWrite(item);
            reader.Advance(sequence.Length); // advance reader to the end
        }
        else // no more items in this sequence
        {
            break;
        }
    }

    return reader.Position;
}

private static T ReadLastItem<T>(in ReadOnlySequence<byte> sequence)
{
    var length = (int)sequence.Length;

    if (length < MaxStackLength) // if the item is small enough we'll stack allocate the buffer
    {
        Span<byte> byteBuffer = stackalloc byte[length];
        sequence.CopyTo(byteBuffer);
        var item=JsonSerializer.Deserialize<T>(byteBuffer);
        return item;        
    }
    else // otherwise we'll rent an array to use as the buffer
    {
        var byteBuffer = ArrayPool<byte>.Shared.Rent(length);

        try
        {
            sequence.CopyTo(byteBuffer);
            var item=JsonSerializer.Deserialize<T>(byteBuffer);
            return item;
        }
        finally
        {
            ArrayPool<byte>.Shared.Return(byteBuffer);
        }

    }    
}

DeserializeToChannel<T> 方法在流的顶部创建一个管道读取器，创建一个 channel 并启动一个工作任务来解析块并将它们推送到 channel :

ChannelReader<T> DeserializeToChannel<T>(Stream stream, CancellationToken token)
{
    var pipeReader = PipeReader.Create(stream);    
    var channel=Channel.CreateUnbounded<T>();
    var writer=channel.Writer;
    _ = Task.Run(async ()=>{
        while (!token.IsCancellationRequested)
        {
            var result = await pipeReader.ReadAsync(token); // read from the pipe

            var buffer = result.Buffer;

            var position = ReadItems(writer,buffer, result.IsCompleted,token); // read complete items from the current buffer

            if (result.IsCompleted) 
                break; // exit if we've read everything from the pipe

            pipeReader.AdvanceTo(position, buffer.End); //advance our position in the pipe
        }

        pipeReader.Complete(); 
    },token)
    .ContinueWith(t=>{
        pipeReader.Complete();
        writer.TryComplete(t.Exception);
    });

    return channel.Reader;
}

ChannelReader.ReceiveAllAsync() 可用于通过 IAsyncEnumerable<T> 消费所有项目:

var reader=DeserializeToChannel<MyEvent>(stream,cts.Token);
await foreach(var item in reader.ReadAllAsync(cts.Token))
{
    //Do something with it 
}

关于c# - 使用 System.Text.Json 异步反序列化列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58572524/

c# - 找不到方法 : 'System. String System.String.Format(System.IFormatProvider, System.String, System.Object)
我有一个带有帮助页面的 Web API 2 项目，该项目在本地运行良好，但当我将其推送到 Azure 时抛出此错误: Method not found: 'System.String System.S
c# - 找不到方法 : 'System.String System.String.Format(System.IFormatProvider, System.String, System.Object)'
我有两台服务器，但通常运行相同的设置 - IIS、SQL Server 等。一台给我这个错误，另一台没有。我从 Visual Studio 向两者发布相同的代码。它们都在运行 .NET CLR Ve
java: "final"System.out、System.in 和 System.err？
System.out声明为 public static final PrintStream out。但是你可以调用System.setOut()重新分配它。嗯？如果它是 final，这怎么可能？
java: "final"System.out、System.in 和 System.err？
System.out被声明为 public static final PrintStream out。但是您可以调用System.setOut()重新分配它。嗯？如果是 final，这怎么可能？
LINQ to Entities 无法识别方法 'System.String Format(System.String, System.Object, System.Object)'
我有这个 linq 查询: private void GetReceivedInvoiceTasks(User user, List tasks) { var areaIds = user.A
c# - 编译时出现 MonoTouch 错误:System.Boolean System.Type::op_Equality(System.Type,System.Type)
我有一个 MonoTouch 应用程序，当我为设备编译它时，出现以下错误: Error MT2002: Can not resolve reference: System.Boolean System
c# - MVC5中LINQ to Entities无法识别方法 'System.DateTime ParseExact(System.String, System.String, System.IFormatProvider)'方法错误
您好，我有一个名为 DailyVisitReport 的 View 。在该 View 中，我有两个名为 FromDate 和 toDate 的字段。如果我选择 FromDate 和 ToDate 取决
c# - 结果类型 'System.Tuple` 3[System.Guid,System.Int32,System.String]' 可能不是抽象的，必须包含默认构造函数
是否可以从 ObjectContext 对象中读取元组列表？我在存储过程中有类似这样的数据库查询 SELECT T.Id as Item1, -- this is guid T.Wo
.net - 二元运算符 LessThan 没有为类型 'System.Nullable` 1[System.DateTime ]' and ' System.Nullable`1[System.DateTimeOffset]' 定义
我正在尝试创建 Odata 端点，但每当我尝试执行任何涉及日期的查询时都会收到此错误。我在下面的非常简单示例中重新创建了它。数据库表 EDMX(片段)
.net - 二元运算符 LessThan 没有为类型 'System.Nullable` 1[System.DateTime ]' and ' System.Nullable`1[System.DateTimeOffset]' 定义
我正在尝试创建 Odata 端点，但每当我尝试执行任何涉及日期的查询时都会收到此错误。我在下面的非常简单示例中重新创建了它。数据库表 EDMX(片段)
c# - 类型为 'System.Int16' 的对象无法转换为类型 'System.Nullable` 1[System.Int32]
我有一个方法可以从数据读取器的数据中生成类类型列表。 if (datareader != null && datareader .HasRows) { Dictionary pDict= GetP
java - 您可以将 system.in 、 system.out 和 system.err 重新映射到 Java 线程吗？
我有一些旧的 C++ 代码，它们使用 stdio 进行输入和输出。该代码还通过 fork 生成新进程。它将 stdio 重新映射到每个新进程，以便每个 session 获取其各自的数据。我正在考虑使
ios - MonoTouch 链接器无法解析 System.Void System.Console::set_ForegroundColor(System.ConsoleColor)
我的应用程序可以很好地构建/链接/部署到模拟器，但我只是第一次尝试将应用程序构建/部署到真实设备，并且链接器失败。我不使用 System.Console或 ConsoleColor在我的应用程序的任
system.reactive - System.Reactive.Unit 不叫 System.Reactive.Void 有什么原因吗？
主要是我很好奇。我们有一个名为 Unit 的对象在我们的代码库中 - 代表桥梁或道路的组件。在我们的例子中，看到带有 Unit 的 ReactiveUI 命令可能会模棱两可。作为声明中的泛型之一。
c# - System.InvalidCastException:无法将类型为“System.Object”的对象转换为类型为“System.IO.StreamWriter”
我试图将Object变量转换为StreamWriter。但是，它不起作用。有什么错？ StreamWriter file = (StreamWriter) myObject; 最佳答案 myObjec
c# - 无法从 'System.Linq.Expressions.Expression>' 转换为 'System.Linq.Expressions.Expression>'
为什么以下不编译？ using System; using System.Linq; using System.Linq.Expressions; public static class Extens
.net - 无法添加对 System.IO、System.Runtime 和 System.Threading.Tasks 的引用
我正在使用 Visual Studio Community 2015 开发面向 .NET 4.5 的 Visual Basic 应用程序.我没有编写应用程序，所以我使用 NuGet 添加了所有缺失的依
powershell - 无法将 "System.Object[]"类型的 "System.Object[]"值转换为 "System.Char"类型
我刚刚开始使用 powershell，我正在制作一个非常简单的加密功能。我想获取字符串中的每个字符，将其转换为 int 并添加一个选定的数字，然后将其转换回一个字符。这工作正常: function
c# - System.Windows.Threading.Dispatcher.Invoke(System.Delegate, System.Object[]) 何时添加？
一些使用我的应用程序的人似乎变得越来越 System.MissingMethodException: Method not found: 'System.Object System.Windows.T
c# - 运行程序后无法将类型 'System.Linq.IQueryable' 转换为 'System.Guid'
我是 C# 和实体的新手我想知道是否有人在这里帮助我。我选择了哪个返回我的 customerid，所以我想将它作为参数传递给我的构造函数，我的构造函数参数类型是 guid 但我的选择类型不同，我不知

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c# - 使用 System.Text.Json 异步反序列化列表