ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW

转载作者：塔克拉玛干更新时间：2023-11-03 01:29:22

当我 fork 我的进程时，如何防止 GC 引发写时复制？由于我在我的程序中遇到了一些内存问题(我的 60 核 0.5Tb 机器上的内存不足，即使是相当小的任务)，我最近一直在分析 Ruby 中垃圾收集器的行为。对我来说，这确实限制了 ruby 在多核服务器上运行程序的实用性。我想在这里展示我的实验和结果。

当垃圾收集器在 fork 期间运行时会出现此问题。我调查了三个案例来说明这个问题。

案例一:我们使用数组在内存中分配了很多对象(不超过20字节的字符串)。字符串是使用随机数和字符串格式创建的。当进程 fork 并且我们强制 GC 在子进程中运行时，所有共享内存都变为私有(private)，导致初始内存重复。

案例2:我们使用数组在内存中分配了很多对象(字符串)，但是字符串是使用rand.to_s 函数创建的，因此与前一种情况相比，我们删除了数据的格式。我们最终使用的内存量较少，大概是因为垃圾较少。当进程 fork 并且我们强制 GC 在子进程中运行时，只有部分内存变为私有(private)。我们有初始内存的复制，但程度较小。

情况 3:与之前相比，我们分配的对象更少，但对象更大，因此分配的内存量与之前的情况相同。当进程 fork 并且我们强制 GC 在子进程中运行时，所有内存保持共享，即没有内存重复。

我在这里粘贴了用于这些实验的 Ruby 代码。要在 case 之间切换，您只需要更改 memory_object 函数中的“option”值。代码在 Ubuntu 14.04 机器上使用 Ruby 2.2.2、2.2.1、2.1.3、2.1.5 和 1.9.3 进行了测试。

案例 1 的示例输出:

ruby version 2.2.2 
 proces   pid log                   priv_dirty   shared_dirty 
 Parent  3897 post alloc                   38            0 
 Parent  3897 4 fork                        0           37 
 Child   3937 4 initial                     0           37 
 Child   3937 8 empty GC                   35            5

完全相同的代码是用 Python 编写的，在所有情况下，CoW 都运行良好。

案例 1 的示例输出:

python version 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] 
 proces   pid log                   priv_dirty shared_dirty 
 Parent  4308 post alloc                35             0 
 Parent  4308 4 fork                     0            35 
 Child   4309 4 initial                  0            35 
 Child   4309 10 empty GC                1            34

ruby 代码

$start_time=Time.new

# Monitor use of Resident and Virtual memory.
class Memory

    shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
    priv_dirty = '.+?Private_Dirty:\s+(\d+)'
    MEM_REGEXP = /#{shared_dirty}#{priv_dirty}/m

    # get memory usage
    def self.get_memory_map( pids)
        memory_map = {}
        memory_map[ :pids_found] = {}
        memory_map[ :shared_dirty] = 0
        memory_map[ :priv_dirty] = 0

        pids.each do |pid|
            begin
                lines = nil
                lines = File.read( "/proc/#{pid}/smaps")
            rescue
                lines = nil
            end
            if lines
                lines.scan(MEM_REGEXP) do |shared_dirty, priv_dirty|
                    memory_map[ :pids_found][pid] = true
                    memory_map[ :shared_dirty] += shared_dirty.to_i
                    memory_map[ :priv_dirty] += priv_dirty.to_i
                end
            end
        end
        memory_map[ :pids_found] = memory_map[ :pids_found].keys
        return memory_map
    end

    # get the processes and get the value of the memory usage
    def self.memory_usage( )
        pids   = [ $$]
        result = self.get_memory_map( pids)

        result[ :pids]   = pids
        return result
    end

    # print the values of the private and shared memories
    def self.log( process_name='', log_tag="")
        if process_name == "header"
            puts " %-6s %5s %-12s %10s %10s\n" % ["proces", "pid", "log", "priv_dirty", "shared_dirty"]
        else
            time = Time.new - $start_time
            mem = Memory.memory_usage( )
            puts " %-6s %5d %-12s %10d %10d\n" % [process_name, $$, log_tag, mem[:priv_dirty]/1000, mem[:shared_dirty]/1000]
        end
    end
end

# function to delay the processes a bit
def time_step( n)
    while Time.new - $start_time < n
        sleep( 0.01)
    end
end

# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=1)
    result = []
    count = size/20

    if option > 3 or option < 1
        count.times do
            result << "%20.18f" % rand
        end
    elsif option == 1
        count.times do
            result << rand.to_s
        end
    elsif option == 2
        count = count/10
        count.times do
            result << ("%20.18f" % rand)*30
        end
    end

    return result
end

##### main #####

puts "ruby version #{RUBY_VERSION}"

GC.disable

# print the column headers and first line
Memory.log( "header")

# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)

Memory.log( "Parent", "post alloc")

lab_time = Time.new - $start_time
if lab_time < 3.9
    lab_time = 0
end

# start the forking
pid = fork do
    time = 4
    time_step( time + lab_time)
    Memory.log( "Child", "#{time} initial")

    # force GC when nothing happened
    GC.enable; GC.start; GC.disable

    time = 8
    time_step( time + lab_time)
    Memory.log( "Child", "#{time} empty GC")

    sleep( 1)
    STDOUT.flush
    exit!
end

time = 4
time_step( time + lab_time)
Memory.log( "Parent", "#{time} fork")

# wait for the child to finish
Process.wait( pid)

Python代码

import re
import time
import os
import random
import sys
import gc

start_time=time.time()

# Monitor use of Resident and Virtual memory.
class Memory:   

    def __init__(self):
        self.shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
        self.priv_dirty = '.+?Private_Dirty:\s+(\d+)'
        self.MEM_REGEXP = re.compile("{shared_dirty}{priv_dirty}".format(shared_dirty=self.shared_dirty, priv_dirty=self.priv_dirty), re.DOTALL)

    # get memory usage
    def get_memory_map(self, pids):
        memory_map = {}
        memory_map[ "pids_found" ] = {}
        memory_map[ "shared_dirty" ] = 0
        memory_map[ "priv_dirty" ] = 0

        for pid in pids:
            try:
                lines = None

                with open( "/proc/{pid}/smaps".format(pid=pid), "r" ) as infile:
                    lines = infile.read()
            except:
                lines = None

            if lines:
                for shared_dirty, priv_dirty in re.findall( self.MEM_REGEXP, lines ):
                    memory_map[ "pids_found" ][pid] = True
                    memory_map[ "shared_dirty" ] += int( shared_dirty )
                    memory_map[ "priv_dirty" ] += int( priv_dirty )     

        memory_map[ "pids_found" ] = memory_map[ "pids_found" ].keys()
        return memory_map

    # get the processes and get the value of the memory usage   
    def memory_usage( self):
        pids   = [ os.getpid() ]
        result = self.get_memory_map( pids)

        result[ "pids" ]   = pids

        return result

    # print the values of the private and shared memories
    def log( self, process_name='', log_tag=""):
        if process_name == "header":
            print " %-6s %5s %-12s %10s %10s" % ("proces", "pid", "log", "priv_dirty", "shared_dirty")
        else:
            global start_time
            Time = time.time() - start_time
            mem = self.memory_usage( )
            print " %-6s %5d %-12s %10d %10d" % (process_name, os.getpid(), log_tag, mem["priv_dirty"]/1000, mem["shared_dirty"]/1000)

# function to delay the processes a bit
def time_step( n):
    global start_time
    while (time.time() - start_time) < n:
        time.sleep( 0.01)

# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects                                       
def memory_object( size, option=2):
    count = size/20

    if option > 3 or option < 1:
        result = [ "%20.18f"% random.random() for i in xrange(count) ]

    elif option == 1:
        result = [ str( random.random() ) for i in xrange(count) ]

    elif option == 2:
        count = count/10
        result = [ ("%20.18f"% random.random())*30 for i in xrange(count) ]

    return result

##### main #####

print "python version {version}".format(version=sys.version)

memory = Memory()

gc.disable()

# print the column headers and first line
memory.log( "header")   # Print the headers of the columns

# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)   # Allocate memory

memory.log( "Parent", "post alloc")

lab_time = time.time() - start_time
if lab_time < 3.9:
    lab_time = 0

# start the forking
pid = os.fork()     # fork the process
if pid == 0:
    Time = 4
    time_step( Time + lab_time)
    memory.log( "Child", "{time} initial".format(time=Time))

    # force GC when nothing happened
    gc.enable(); gc.collect(); gc.disable();

    Time = 10
    time_step( Time + lab_time)
    memory.log( "Child", "{time} empty GC".format(time=Time))

    time.sleep( 1)

    sys.exit(0)

Time = 4
time_step( Time + lab_time)
memory.log( "Parent", "{time} fork".format(time=Time))

# Wait for child process to finish
os.waitpid( pid, 0)

编辑

确实，在 fork 进程之前多次调用 GC 解决了这个问题，我很惊讶。我也使用 Ruby 2.0.0 运行代码，但问题甚至没有出现，因此它必须与您提到的这一代 GC 相关。但是，如果我调用 memory_object 函数而不将输出分配给任何变量(我只是在创建垃圾)，那么内存就会被复制。复制的内存量取决于我创建的垃圾量 - 垃圾越多，私有(private)内存就越多。

有什么办法可以防止这种情况发生吗？

结果如下

在 2.0.0 中运行 GC

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent  3664 post alloc           67          0
 Parent  3664 4 fork                1         69
 Child   3700 4 initial             1         69
 Child   3700 8 empty GC            6         65

在子进程中调用memory_object(1000*1000)

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent  3703 post alloc           67          0
 Parent  3703 4 fork                1         70
 Child   3739 4 initial             1         70
 Child   3739 8 empty GC           15         56

调用内存对象(1000*1000*10)

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent  3743 post alloc           67          0
 Parent  3743 4 fork                1         69
 Child   3779 4 initial             1         69
 Child   3779 8 empty GC           89          5

最佳答案

UPD2

突然想通了为什么在格式化字符串时所有内存都变为私有(private)——格式化期间会产生垃圾，禁用 GC，然后启用 GC，并且在生成的数据中有已释放对象的漏洞。然后你 fork ，新的垃圾开始占据这些洞，垃圾越多 - 私有(private)页面越多。

所以我添加了一个清理函数以每 2000 个周期运行一次 GC(只是启用惰性 GC 没有帮助):

count.times do |i|
  cleanup(i)
  result << "%20.18f" % rand
end

#......snip........#

def cleanup(i)
      if ((i%2000).zero?)
        GC.enable; GC.start; GC.disable
      end
end   

##### main #####

这导致(在 fork 之后生成 memory_object( 1000 * 1000 * 10)):

RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 0
ruby version 2.2.0
 proces   pid log          priv_dirty shared_dirty
 Parent  2501 post alloc           35          0
 Parent  2501 4 fork                0         35
 Child   2503 4 initial             0         35
 Child   2503 8 empty GC           28         22

是的，它会影响性能，但只会在 fork 之前，即在您的情况下会增加加载时间。

UPD1

刚找到criteria ruby 2.2 通过它设置旧对象位，它是 3 个 GC，所以如果你在 fork 之前添加以下内容:

GC.enable; 3.times {GC.start}; GC.disable
# start the forking

你会得到(命令行中的选项是1):

$ RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 1
ruby version 2.2.0
 proces   pid log          priv_dirty shared_dirty
 Parent  2368 post alloc           31          0
 Parent  2368 4 fork                1         34
 Child   2370 4 initial             1         34
 Child   2370 8 empty GC            2         32

但这需要进一步测试这些对象在未来 GC 上的行为，至少在 100 次 GC 之后 :old_objects 保持不变，所以我想它应该没问题

用 GC.stat 记录是 here

顺便说一句，还有一个选项 RGENGC_OLD_NEWOBJ_CHECK从头开始创建旧对象，但我怀疑这是个好主意，但可能对特定情况有用。

第一个答案

我在上面评论中的主张是错误的，实际上位图表是救世主。

(option = 1)

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent 14807 post alloc           27          0
 Parent 14807 4 fork                0         27
 Child  14809 4 initial             0         27
 Child  14809 8 empty GC            6         25 # << almost everything stays shared <<

还手动测试了 Ruby 企业版，它只比最坏的情况好一半。

ruby version 1.8.7
 proces   pid log          priv_dirty shared_dirty
 Parent 15064 post alloc           86          0
 Parent 15064 4 fork                2         84
 Child  15065 4 initial             2         84
 Child  15065 8 empty GC           40         46

(我通过将 RUBY_GC_HEAP_INIT_SLOTS 增加到 600k 使脚本严格运行 1 次 GC)

关于ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29900458/

文章推荐： c - setpgid 使交互式程序(vim，emacs ... ncurses？)无限循环

文章推荐： c++ - 具有 qml 函数和 c++ 插槽的最佳方法，反之亦然

文章推荐： c++ - 如何为非 const 类调用 const_iterator？

文章推荐： linux - Expect 脚本的重复输出

js正则表达式验证大全(收集)
引用网址 http://hi.baidu.com/quiteuniverse/blog/item/9f3f043d46ad1e07bba16716.html 以下函数调用方式：&nbs
php - 收集 cookies
我什至不确定如何描述我正在尝试做的事情，因为我对 cookie 了解不多，但就这样吧。是否可以使用PHP从浏览器缓存中收集一个cookie(或cookie文件)，将其保存到数据库中，然后清除缓存并重
android - 使用协程流时房间卡住->收集
我正在使用 Room(v. 2.2.1)和协程支持(v. 1.3.2)并进行以下设置 @Entity(tableName = "simple_table") data class SimpleEnti
java - 基于时间运算符的累加/收集
我正在尝试编写一个基于时间运算符收集/累积值的规则。 rule "Zone6 Overlap" when $i1 : Instance ($e1 : event == " Vel : 20.9
收集 BST 的所有叶子并列出它们
我有一个简单的 BST，定义了节点结构: struct node { int key_value; struct node *left; struct node *right; }; ty
Java8 收集 map
我有这个对象: public class MenuPriceByDay implements Serializable { private BigDecimal avgPrice; p
android - 收集、存储和检索传感器数据
我正在开发一个应用程序，需要访问给定传感器的“最后 5 秒有值(value)的数据”。我的计划是以某种方式存储这些数据，然后当我请求数据时，它将返回最近 5 秒内获得的所有数据。鉴于以下情况，我不确定
C# 数组映射/收集
在 Ruby 中，您可以对数组使用 map/collect 方法来修改它: a = [ "a", "b", "c", "d" ] a.collect! {|x| x + "!" } a
java - 收集、存储和检索大量数字数据
我即将开始实时收集大量数字数据(对于那些感兴趣的人，各种股票和 future 的出价/要价/最后或“磁带”)。稍后将检索数据以进行分析和模拟。这一点都不难，但我想高效地做到这一点，这会带来很多问题。我
database - 收集、维护和确保庞大数据集准确性的最佳实践是什么？
我提出这个问题是为了寻求有关如何设计系统的实用建议。像 amazon.com 和 pandora 这样的网站拥有并维护着庞大的数据集来运行他们的核心业务。例如，亚马逊(以及所有其他主要电子商务网站)
通过已知索引、收集、分散重新调整的数组缓存友好复制
假设我们有一个数据数组和另一个带索引的数组。 data = [1, 2, 3, 4, 5, 7] index = [5, 1, 4, 0, 2, 3] 我们想从 index 的 data 元素创建一个
c# - GC.收集()
好的，我已经阅读了几个关于它的主题，但现在就开始吧。假设我有一个应用程序，基本上我会时不时地点击一个按钮，几分钟内会发生很多事情，然后它可能会再闲置一个小时，或者可能只是 1 分钟。难道不是在整个结束
r - 收集 R 中相同组内的重叠坐标列
我有一个数据框，例如 Seq Chrm start end length score 0 A C1 1 50 49 12 1 B
java - 收集 Object[] 数组中的所有方法参数
我正在考虑在 Object[] 数组中收集泛型方法的所有方法参数以进行记录。我知道使用方面可以更好地实现这一点，但是我不允许使用它，并且如果可能的话我正在寻找一种基于纯反射的方法为了澄清，假设一个
收集 Java 6 套接字垃圾
快速提问: 如果 Socket 对象(及其本地缓存的 InputStream 和 OutputStream 对象)超出范围并被垃圾收集，连接是否在 JVM 中保持打开状态？ (即，不会在监听服务器上抛
facebook - 收集 Facebook 直播公开数据
是否有用于收集 facebook 公共(public)数据作为实时提要的 API。我阅读了关于用于收集数据的公共(public)提要 API，但我现在不能申请，而且它不是免费的，还有 Open str
optimization - 收集 Lucene/优化中搜索的所有命中
摘要 :我使用自定义收集器收集给定搜索的所有命中的文档 ID(它使用 ID 填充 BitSet)。根据我的需要，搜索和获取文档 ID 的速度非常快，但是当涉及到从磁盘实际获取文档时，事情变得非常缓慢。
gradle - 收集 Gradle 项目的依赖约束
我正在寻找一种方法来从自定义 Gradle 插件收集给定项目的所有依赖约束(通过常规 platform 和/或 enforcedPlatform 和/或“手动”强制执行)。在 Maven 世界中，您
powershell - 收集 samaccount powershell
我有一个 CSV 格式的用户列表，但我需要按广告中的名称从每个用户收集 SamAccount 属性。 CSV 模型脚本 Get-ADObject -Filter 'ObjectClass -eq "
angularjs - ionic 收集 - 带日期分隔符的重复
我得到了一个非常大的列表，其中包含大约 200 个带有文本和图像的项目。 ng-repeat 是一种缓慢渲染的方式。它尝试过这个 solution 。效果很好。但不适合重复收集。我的网络服务返回此:

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW

编辑

UPD2

UPD1

第一个答案