gpt4 book ai didi

apache-spark - Spark : how to estimate size of Eden? 中的垃圾收集调整

转载 作者:行者123 更新时间:2023-12-05 06:34:53 26 4
gpt4 key购买 nike

我正在阅读 Bill Chambers 和 Matei Zaharia 所著的 Spark:权威指南 中有关垃圾收集调优的内容。本章主要基于Spark's documentation .尽管如此,作者还是通过一个示例扩展了文档,说明如何处理过多的次要集合但处理过多的主要集合。

官方文档和书中都指出:

If there are too many minor collections but not many major GCs, allocating more memory for Eden would help. You can set the size of the Eden to be an over-estimate of how much memory each task will need. If the size of Eden is determined to be E, then you can set the size of the Young generation using the option -Xmn=4/3*E. (The scaling up by 4/3 is to account for space used by survivor regions as well.) (See here)

本书提供了一个示例(Spark:权威指南,第一版,第 324 页):

If your task is reading data from HDFS, the amount of memory used by the task can be estimated by using the size of the data block read from HDFS. Note that the size of a decompressed block is often two or three times the size of the block. So if you want to have three or four tasks' worth of working space, and the HDFS block size is 128 MB, we can estimate size of Eden to be 43,128 MB.

假设每个未压缩的 block 甚至占用 512 MB 并且我们有 4 任务,并且我们按 4/3 扩展,我不真的不知道您是如何估算出 Eden 的 43,128 MB 内存的。

鉴于本书的假设,我宁愿回答 ~3 GB 应该足够用于 Eden。

谁能解释一下这个估计应该如何计算?

最佳答案

好吧,我想the new Spark docs说清楚:

As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated using the size of the data block read from HDFS. Note that the size of a decompressed block is often 2 or 3 times the size of the block. So if we wish to have 3 or 4 tasks’ worth of working space, and the HDFS block size is 128 MB, we can estimate size of Eden to be 4*3*128MB.

因此,它是 4*3*128 MB 而不是书上所说的(即 43,128 MB)。

关于apache-spark - Spark : how to estimate size of Eden? 中的垃圾收集调整,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49954518/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com