C#对许多相对较大的对象进行垃圾回收-6ren

C#对许多相对较大的对象进行垃圾回收

转载作者：行者123 更新时间：2023-12-03 23:42:38

我有几个进程轮询不同的数据源以获取某种特定类型的信息。他们经常轮询它并在后台进行，所以当我需要这些信息时，它随时可用，不需要会浪费时间的往返。
示例代码如下所示:

public class JournalBackgroundPoller
{
    private readonly int _clusterSize;

    private readonly IConfiguration _configuration;

    Dictionary<int, string> _journalAddresses;
    private readonly Random _localRandom;
    private readonly Task _runHolder;

    internal readonly ConcurrentDictionary<int, List<JournalEntryResponseItem>> ResultsBuffer = new ConcurrentDictionary<int, List<JournalEntryResponseItem>>();

    public JournalBackgroundPoller(IConfiguration configuration)
    {
        _localRandom = new Random();

        _configuration = configuration;
        _clusterSize = 20;//for the sake of demo

        _journalAddresses = //{{1, "SOME ADDR1"}, {2, "SOME ADDR 2"}};

        _runHolder = BuildAndRun();
    }

    private Task BuildAndRun()
    {
        var pollingTasks = new List<Task>();
        var buffer = new BroadcastBlock<JournalResponsesWrapper>(item => item);

        PopulateShardsRegistry();

        foreach (var js in _journalAddresses)
        {
            var dataProcessor = new TransformBlock<JournalResponsesWrapper, JournalResponsesWrapper>(NormalizeValues,
                new ExecutionDataflowBlockOptions
                { MaxDegreeOfParallelism = 1, EnsureOrdered = true, BoundedCapacity = 1 });

            var dataStorer = new ActionBlock<JournalResponsesWrapper>(StoreValuesInBuffer,
                new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 1, EnsureOrdered = true, BoundedCapacity = 2 });

            buffer.LinkTo(dataProcessor, wrapper => wrapper.JournalDataSource.Key == js.Key);

            dataProcessor.LinkTo(dataStorer);
            dataProcessor.LinkTo(DataflowBlock.NullTarget<JournalResponsesWrapper>());

            pollingTasks.Add(PollInfinitely(js, buffer));
        }

        var r = Task.WhenAll(pollingTasks);
        return r;
    }

    private void PopulateShardsRegistry()
    {
        try
        {
            for (int i = 0; i < _clusterSize; i++)
            {
                var _ = ResultsBuffer.GetOrAdd(i, ix => new List<JournalEntryResponseItem>());
            }
        }
        catch (Exception e)
        {
            Console.WriteLine("Could `t initialize shards registry");
        }
    }

    private async Task PollInfinitely(KeyValuePair<int, string> dataSourceInfo, BroadcastBlock<JournalResponsesWrapper> buffer)
    {
        while (true)
        {
            try
            {
                //here we create a client and get a big list of journal entries, ~200k from one source. below is dummy code
                var journalEntries = new List<JournalEntryResponseItem>(200000);

                buffer.Post(
                    new JournalResponsesWrapper { JournalDataSource = dataSourceInfo, JournalEntryResponseItems = journalEntries });
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Polling {dataSourceInfo.Value} threw an exception, overwriting with empty data");
                buffer.Post(
                    new JournalResponsesWrapper { JournalDataSource = dataSourceInfo, JournalEntryResponseItems = new List<JournalEntryResponseItem>() });
            }

            await Task.Delay(_localRandom.Next(400, 601));
        }
    }

    private JournalResponsesWrapper NormalizeValues(JournalResponsesWrapper input)
    {
        try
        {
            if (input.JournalEntryResponseItems == null || !input.JournalEntryResponseItems.Any())
            {
                return input;
            }

            foreach (var journalEntry in input.JournalEntryResponseItems)
            {
                //do some transformations here
            }

            return input;
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Normalization failed for cluster {input.JournalDataSource.Value}, please review!");
            return null;
        }
    }

    private void StoreValuesInBuffer(JournalResponsesWrapper input)
    {
        try
        {
            ResultsBuffer[input.JournalDataSource.Key] = input.JournalEntryResponseItems;
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Could not write content to dictionary");
        }
    }
}

为简单起见，期刊相关实体将如下所示:

class JournalEntryResponseItem
{
    public string SomeProperty1 { get; set; }

    public string SomeProperty2 { get; set; }
}

class JournalResponsesWrapper
{
    public KeyValuePair<int, string> JournalDataSource { get; set; }

    public List<JournalEntryResponseItem> JournalEntryResponseItems { get; set; }
}

提供的代码的全局问题显然是我正在创建相对大量的对象，这些对象可能会在短时间内以 LOH 结束。数据源总是提供最新的条目，所以我不需要保留旧的条目(我也不能这样做，因为它们没有区别)。我的问题是是否可以优化内存使用、对象创建和替换往返，以便减少垃圾收集的频率？现在，垃圾收集每约 5-10 秒发生一次。
UPD 1:我通过 ResultsBuffer 访问数据并且可以在刷新之前多次读取相同的集合。不能保证一个特定的数据集只会被读取一次(或根本不会被读取)。我的大对象是 List<JournalEntryResponseItem>实例，最初来自数据源，然后保存到 ResultsBuffer .
UPD 2:数据源只有一个端点，一次返回这个“分片”中的所有实体，我无法在请求期间应用过滤。响应实体没有唯一的键/标识符。
UPD 3:一些答案建议先衡量/分析应用程序。虽然在这种特殊情况下这是完全有效的建议，但由于以下观察结果，它显然与内存/GC 相关:

视觉节流恰好发生在应用程序 RAM 消耗在稳定增长一段时间后急剧下降的那一刻。

如果我再添加 X 个日志源，应用程序的内存将增长，直到它占用服务器上的所有可用内存，然后卡住时间更长(1-3 秒)，之后内存急剧下降，应用程序继续工作，直到达到内存限制再次。

最佳答案

身后List<T>总有一个 T[]连续项目，将其标注为 200000 肯定会直接将其放入 LOH。为了避免这种情况，我建议使用简单的逻辑分区而不是物理维度和 Post分批列出。这样在每次轮询期间，巨大的列表将转到 LOH，但会在下一个 GC 第 2 代集合中收集(请确保没有更多引用)。 LOH 几乎为空，但是由于托管堆中发生的添加复制操作，GC Generation 2 收集将比以前更多。这是一个小的变化，我提供了新的 JournalBackgroundPoller类(class):

public class JournalBackgroundPoller
{
    private readonly int _clusterSize;

    private readonly IConfiguration _configuration;

    Dictionary<int, string> _journalAddresses;
    private readonly Random _localRandom;
    private readonly Task _runHolder;

    internal readonly ConcurrentDictionary<int, List<JournalEntryResponseItem>> ResultsBuffer = new ConcurrentDictionary<int, List<JournalEntryResponseItem>>();

    public JournalBackgroundPoller(IConfiguration configuration)
    {
        _localRandom = new Random();

        _configuration = configuration;
        _clusterSize = 20;//for the sake of demo

        // _journalAddresses = //{{1, "SOME ADDR1"}, {2, "SOME ADDR 2"}};
        _journalAddresses = new Dictionary<int, string>
        {
            { 1, "SOME ADDR1" },
            { 2, "SOME ADDR 2" }
        };

        _runHolder = BuildAndRun();
    }

    private Task BuildAndRun()
    {
        var pollingTasks = new List<Task>();
        var buffer = new BroadcastBlock<JournalResponsesWrapper>(item => item);

        PopulateShardsRegistry();

        foreach (var js in _journalAddresses)
        {
            var dataProcessor = new TransformBlock<JournalResponsesWrapper, JournalResponsesWrapper>(NormalizeValues,
                new ExecutionDataflowBlockOptions
                { MaxDegreeOfParallelism = 1, EnsureOrdered = true, BoundedCapacity = 1 });

            var dataStorer = new ActionBlock<JournalResponsesWrapper>(StoreValuesInBuffer,
                new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 1, EnsureOrdered = true, BoundedCapacity = 2 });

            buffer.LinkTo(dataProcessor, wrapper => wrapper.JournalDataSource.Key == js.Key);

            dataProcessor.LinkTo(dataStorer);
            dataProcessor.LinkTo(DataflowBlock.NullTarget<JournalResponsesWrapper>());

            pollingTasks.Add(PollInfinitely(js, buffer));
        }

        var r = Task.WhenAll(pollingTasks);
        return r;
    }

    private void PopulateShardsRegistry()
    {
        try
        {
            for (int i = 0; i < _clusterSize; i++)
            {
                var _ = ResultsBuffer.GetOrAdd(i, ix => new List<JournalEntryResponseItem>());
            }
        }
        catch (Exception e)
        {
            Console.WriteLine("Could `t initialize shards registry");
        }
    }

    private async Task PollInfinitely(KeyValuePair<int, string> dataSourceInfo, BroadcastBlock<JournalResponsesWrapper> buffer)
    {
        while (true)
        {
            try
            {
                //here we create a client and get a big list of journal entries, ~200k from one source. below is dummy code
                var journalEntries = new List<JournalEntryResponseItem>(200000);

                // NOTE:
                // We need to avoid references to the huge list so GC collects it ASAP in the next
                // generation 2 collection: after that, nothing else goes to the LOH.
                const int PartitionSize = 1000;
                for (var index = 0; index < journalEntries.Count; index += PartitionSize)
                {
                    var journalEntryResponseItems = journalEntries.GetRange(index, PartitionSize);
                    buffer.Post(
                        new JournalResponsesWrapper
                        {
                            JournalDataSource = dataSourceInfo,
                            JournalEntryResponseItems = journalEntryResponseItems
                        });
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Polling {dataSourceInfo.Value} threw an exception, overwriting with empty data");
                buffer.Post(
                    new JournalResponsesWrapper { JournalDataSource = dataSourceInfo, JournalEntryResponseItems = new List<JournalEntryResponseItem>() });
            }

            await Task.Delay(_localRandom.Next(400, 601));
        }
    }

    private JournalResponsesWrapper NormalizeValues(JournalResponsesWrapper input)
    {
        try
        {
            if (input.JournalEntryResponseItems == null || !input.JournalEntryResponseItems.Any())
            {
                return input;
            }

            foreach (var journalEntry in input.JournalEntryResponseItems)
            {
                //do some transformations here
            }

            return input;
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Normalization failed for cluster {input.JournalDataSource.Value}, please review!");
            return null;
        }
    }

    private void StoreValuesInBuffer(JournalResponsesWrapper input)
    {
        try
        {
            ResultsBuffer[input.JournalDataSource.Key] = input.JournalEntryResponseItems;
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Could not write content to dictionary");
        }
    }
}

请看一下 30 秒后原始内存使用情况的快照

这是30秒后优化内存使用的快照

注意区别

稀疏数组 :JournalEntryResponseItem[]从浪费的 1,600,000 和长度 200,000 到没有。

LOH 用法 : 从 3.05 MB 到没有。

关于C#对许多相对较大的对象进行垃圾回收，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64878714/

文章推荐： laravel - 在 Livewire 中强制重新加载整页

文章推荐： java - 是否可以按位置使用 PDFBox 编辑 PDF 区域？

文章推荐： python - 如何在学习曲线图中形成平坦的验证准确度曲线

文章推荐： c - 如何将与 OpenGL 一起使用的纹理嵌入到 C 数组中？

c - 如何防止c中的悬空指针/垃圾？
我是 C 新手，还没有真正掌握 C 何时决定释放对象以及何时决定保留对象。 heap_t 是指向结构堆的指针。 heap_t create_heap(){ heap_t h_t = (heap
文件末尾的 C++ 垃圾
我有一个问题，我不知道如何解决。问题是: char * ary = new Char[]; ifstream fle; fle.open(1.txt, ios_base::binary); fle.s
algorithm - 如何从字符串中删除这些符号(垃圾)？
假设我在 C# 中有字符串:“我看不到你……” 我想删除(替换为空等)这些“â€™”符号。我该怎么做？最佳答案那个“垃圾”看起来很像有人将 UTF-8 数据解释为 ISO 8859-1 或 Wi
python - 垃圾。开始爬行后如何更改蜘蛛设置？
我无法在解析方法中更改蜘蛛设置。但这绝对是一种方式。例如: class SomeSpider(BaseSpider): name = 'mySpider' allowed_domains
JVM是如何和“垃圾”发生关系的
在开始之前，我们先回顾一下堆是个什么玩意，大家可能都知道，我们每天创建的Java对象几乎都存放在堆上面，所以说堆是一个巨大的对象池一点都不过分，在这个对象池里面管理者数据巨大的对象实例。在对
c - printf() 无格式字符串打印字符和整数数组 --> 垃圾
我想知道为什么 printf() 在提供数组且没有格式化选项时成功打印字符数组，但在使用整数数组时编译器会抛出警告并打印垃圾值。这是我的代码: #include int main() { c
python - 垃圾。 LinkExtractor 中的意外符号
我正在研究 Scrapy 库并尝试制作一个小爬虫。这是爬虫的规则: rules = ( Rule(LinkExtractor(restrict_xpaths='//div[@class="w
c++ - stringstream 的第一个字符串参数被保存为指针/垃圾
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: Printing a string to a temporary stream object in C++
javascript - 是否收集了 WebGLTextures 垃圾？
这个问题在这里已经有了答案: Are WebGL objects garbage collected? (2 个答案) 关闭 3 年前。在 WebGL 中，纹理的创建和销毁使用: WebGLTex
java - 未记录的神秘类——垃圾，还是我不知道的设计模式？
我继承了以下代码: (为保护无辜者更改了一些名称。) package foo.bar.baz; import javax.swing.JPanel; //Main panel in the GUI c
java - 是否收集了 lambda 垃圾？
如果我没记错的话，在某些情况下，Java 中的 lambda 会生成为匿名类实例。例如，在这段代码中，lambda 需要从外部捕获一个变量: final int local = 123456; lis
c# - 是否收集了不安全的 C# 垃圾
我正在阅读托管代码中的内存泄漏，想知道是否可以在 C# 不安全代码中创建它？ unsafe { while(true) new int; } 我不确定如果它作为不安全代码运行，是否会被 GC
javascript - 替换文档正文时是否收集了内联 javascript 垃圾？
假设我有以下用 HTML 编写的网页(仅正文部分): ... function fn() { // do stu
shell - 编译后自动删除生成的 latex (垃圾)文件？
我想知道是否有简单的命令可以删除在 latex 编译过程中生成的所有不必要的文件，例如.aux、.log 等最好将它链接到常规的 Latex 构建命令，这样在我点击“编译”后，垃圾文件就会被删除。
java - 在 Java 中用字符串切换大小写 - 垃圾？
Java 在 Java7 中引入了带有字符串的 switch case。我想知道使用这样的开关盒是否会产生垃圾。例如在我的程序中， String s = getString(); switch(s)
c++ - Cevelop 对象到未初始化的变量 char 垃圾
Cevelop将 char junk 作为“未初始化的变量”对象。在这种情况下，解决问题的正确方法是什么？ friend std::ostream& operator>(std::istream&
css - 删除类似样式 =""的 html 垃圾
关闭。这个问题需要debugging details .它目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and t
c++ - 从客户端收到所有数据后提升 asio streambuf 垃圾
我正在编写一个发送和接收纯文本的小型 boost asio tcp 服务器和客户端。通信或多或少是请求响应。在测试期间，我想我只是向服务器发送垃圾数据，向它发送 100.000 个请求。客户端发
java - 文档元素后的 Android java XML 垃圾
我正在使用 SAX 来读取/解析 XML 文档，并且它工作正常，除了这个特定的站点，在该站点中 eclipse 告诉我“文档元素之后的垃圾”并且我没有返回任何数据 http://www.zachblu
python - 垃圾/ python : Replace empty string
这是我的 Scrapy 爬虫代码。我正在尝试从网站中提取元数据值。没有元数据在一个页面上出现多次。 class MySpider(BaseSpider): name = "courses"

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

C#对许多相对较大的对象进行垃圾回收