python multiprocessing.Array : huge temporary memory overhead-6ren

python multiprocessing.Array : huge temporary memory overhead

转载作者：行者123 更新时间：2023-11-28 22:34:38

如果我使用 python 的 multiprocessing.Array 创建一个 1G 的共享数组，我发现 python 进程在调用 multiprocessing.Array 期间使用大约 30G 的内存，然后减少内存使用。如果能帮助我弄清楚为什么会发生这种情况并解决它，我将不胜感激。

这是在 Linux 上重现它的代码，内存由 smem 监控:

import multiprocessing
import ctypes
import numpy
import time
import subprocess
import sys

def get_smem(secs,by):
    for t in range(secs):
        print subprocess.check_output("smem")
        sys.stdout.flush()
        time.sleep(by)



def allocate_shared_array(n):
    data=multiprocessing.Array(ctypes.c_ubyte,range(n))
    print "finished allocating"
    sys.stdout.flush()


n=10**9
secs=30
by=5
p1=multiprocessing.Process(target=get_smem,args=(secs,by))
p2=multiprocessing.Process(target=allocate_shared_array,args=(n,))
p1.start()
p2.start()
print "pid of allocation process is",p2.pid
p1.join()
p2.join()
p1.terminate()
p2.terminate()

这是输出:

pid of allocation process is 2285
  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1080     4566    11924
 2286 ubuntu   /usr/bin/python /usr/bin/sm        0     4688     5573     7152
 2276 ubuntu   python test.py                     0     4000     8163    16304
 2285 ubuntu   python test.py                     0   137948   141431   148700

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1188     4682    12052
 2287 ubuntu   /usr/bin/python /usr/bin/sm        0     4696     5560     7160
 2276 ubuntu   python test.py                     0     4016     8174    16304
 2285 ubuntu   python test.py                     0 13260064 13263536 13270752

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1188     4682    12052
 2288 ubuntu   /usr/bin/python /usr/bin/sm        0     4692     5556     7156
 2276 ubuntu   python test.py                     0     4016     8174    16304
 2285 ubuntu   python test.py                     0 21692488 21695960 21703176

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1188     4682    12052
 2289 ubuntu   /usr/bin/python /usr/bin/sm        0     4696     5560     7160
 2276 ubuntu   python test.py                     0     4016     8174    16304
 2285 ubuntu   python test.py                     0 30115144 30118616 30125832

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      771     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2527     2700
 2284 ubuntu   python test.py                     0     1192     4808    12052
 2290 ubuntu   /usr/bin/python /usr/bin/sm        0     4700     5481     7164
 2276 ubuntu   python test.py                     0     4092     8267    16304
 2285 ubuntu   python test.py                     0 31823696 31827043 31834136

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      771     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2527     2700
 2284 ubuntu   python test.py                     0     1192     4808    12052
 2291 ubuntu   /usr/bin/python /usr/bin/sm        0     4700     5481     7164
 2276 ubuntu   python test.py                     0     4092     8267    16304
 2285 ubuntu   python test.py                     0 31823696 31827043 31834136

Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "test.py", line 17, in allocate_shared_array
    data=multiprocessing.Array(ctypes.c_ubyte,range(n))
  File "/usr/lib/python2.7/multiprocessing/__init__.py", line 260, in Array
    return Array(typecode_or_type, size_or_initializer, **kwds)
  File "/usr/lib/python2.7/multiprocessing/sharedctypes.py", line 115, in Array
    obj = RawArray(typecode_or_type, size_or_initializer)
  File "/usr/lib/python2.7/multiprocessing/sharedctypes.py", line 88, in RawArray
    result = _new_value(type_)
  File "/usr/lib/python2.7/multiprocessing/sharedctypes.py", line 63, in _new_value
    wrapper = heap.BufferWrapper(size)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 243, in __init__
    block = BufferWrapper._heap.malloc(size)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 223, in malloc
    (arena, start, stop) = self._malloc(size)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 120, in _malloc
    arena = Arena(length)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 82, in __init__
    self.buffer = mmap.mmap(-1, size)
error: [Errno 12] Cannot allocate memory

最佳答案

从打印语句的格式来看，您使用的是 python 2

将 range(n) 替换为 xrange(n) 以节省一些内存。

data=multiprocessing.Array(ctypes.c_ubyte,xrange(n))

(或使用 python 3)

10 亿个范围大约需要 8GB(好吧，我刚刚在我的 Windows PC 上尝试过，但它卡住了:只是不要那样做!)

尝试使用 10**7 来确定:

>>> z=range(int(10**7))
>>> sys.getsizeof(z)
80000064  => 80 Megs! you do the math for 10**9

像 xrange 这样的生成器函数不占用内存，因为它在迭代时一个一个地提供值。

在 Python 3 中，他们一定受够了这些问题，发现大多数人使用 range 是因为他们想要生成器，因此杀死了 xrange 并转为 range 生成器。现在，如果您真的想分配所有必须分配给 list(range(n)) 的数字。至少您不会错误地分配 1 TB!

编辑:

OP的评论说明我的解释没有解决问题。我在我的 windows box 上做了一些简单的测试:

import multiprocessing,sys,ctypes
n=10**7

a=multiprocessing.RawArray(ctypes.c_ubyte,range(n))  # or xrange
z=input("hello")

上升到 500Mb，然后使用 python 2 保持在 250Mb上升到 500Mb，然后使用 python 3 保持在 7Mb(这很奇怪，因为它至少应该是 10Mb...)

结论:好的，它的峰值为 500Mb，因此不确定它是否有帮助，但是您可以在 Python 3 上尝试您的程序，看看您的整体内存峰值是否较小？

关于python multiprocessing.Array : huge temporary memory overhead，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38798330/

文章推荐： spring - tomcat 启动时自动启动 Servlet ...？

文章推荐： java - 如何通过 Java File api 加载 Tomcat 中的资源

文章推荐： python - 我可以对图像使用 numpy 渐变函数吗

文章推荐： python - 使用 django channel 和 websockets

MySQL huge IN set for huge table
将此视为一个理论问题和实际问题。一个表有 1.000.000 多条用户记录，需要从该表中提取数据，比如 50.000，仅使用 user_id。你希望 IN 表现如何？如果不好，这是唯一的选择还是还有
c - 巨型迷宫的最短路径(HUGE)
我需要解决最短路径算法问题(用 C 语言)。基本上，我得到一个文件，其中包含(稀疏)矩阵的总行数和列数、非零条目(称为门)的数量以及最后这些条目的位置和值(行、列、值) )。在这个迷宫中，我必须找出
gradle - 生成项目 : HUGE downloads
我使用 Libgdx 项目生成器创建了我的第一个 LibGDX 项目。然后我在 IntelliJ 中打开了该项目，它要求我在 build.gradle 文件中索引存储库。有问题的远程存储库是: Mav
Gwt性能问题: huge data and celltable
我有一个服务器端服务，可以向我发送大量 DTO。我需要将它们放入 CellTable 中。大概有 10-200 行，我需要同时看到所有内容。我有一个服务器端日志，用于跟踪我的服务的最后一个“人造”代
Django模型选择字段: Huge List of choices
考虑这 3 种模型: # models.py class City(models.Model): name = models.CharField(max_length=50) class In
c++ - 使用 HUGE 二进制矩阵的最有效方法？
我有一个巨大的二进制矩阵，例如 100000 x 100000。阅读本文http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT201
sql - 数据库或其他存储和动态访问 HUGE 二进制对象的方法
我有一些大的(200 GB 是正常的)平面数据文件，我想将它们存储在某种数据库中，以便可以快速访问并以数据逻辑组织的直观方式进行访问。将其视为大量非常长的录音，其中每个录音的长度(样本)相同，并且可以
Python NUMPY HUGE 矩阵乘法
我需要将两个大矩阵相乘并对它们的列进行排序。 import numpy a= numpy.random.rand(1000000, 100) b= numpy.random.rand(30000
CentOS7 禁用Transparent Huge Pages的实现方法
CentOS7 禁用Transparent Huge Pages 自CentOS6版本开始引入了Transparent Huge Pages(THP)，从CentOS7版本开始，该特性默认就会启用
.net - "Why is my .net exe so huge"分析工具？
是否有可以解释 .NET 程序集(可执行文件或 DLL 文件)大小的工具？在过去，有一个 IDE 扩展可以详细说明项目使用的空间。它应该显示大型代码文件: 和数据资源: .NET 世界有这样的事情
php - Composer : Huge vendor folder
我正在尝试 Composer，并且有 RubyGems/Bundler 背景，它确实表现出了一些有趣的行为。我尝试创建一个新的 Laravel 项目，令我惊讶的是，我发现最终得到的供应商文件夹大小超
vim - 无法配置 vim --with-features=huge
我使用 mercurial 克隆了 vim 源代码，并运行了以下命令: make distclean ./configure --with-features=huge make sudo make i
performance - 优化循环 : huge arrays operations
我正在对不适合缓存的数组进行大量计算(这里是导数，但看起来类似于图像操作)，这意味着 CPU 必须在缓存中加载部分，计算，然后加载另一部分，等等。但是因为在计算的形状中，一些数据被加载、卸载和重新加载
php - 为什么在特定表后 Huge Join 失败？
我有一个巨大的选择查询，我必须在其中加入超过 85 个表。我在运行查询时不断收到错误消息，如果我在收缩整个语句时重新运行查询，它运行良好。查看下面的部分连接，它一直执行到表 85: select $
python : huge looping in SQL Table
在 mySql 中，我有下表(名为“staff”)，其中包含 800 条记录(可能更多): day start_time end_time
PHP、MySQL、Huge Join、处理速度
这更像是一个理论查询，但我有一个复杂的联接(导致主表中多达 1900 条记录，再加上联接中的所有子结果表——如下所示的联接)，生成的网页在我的本地计算机上需要 5-10 分钟才能处理并完成构建。我意识
MySQL Alter huge table(更改字段类型)
所以我有一个包含 4 016 515 759 行的表格。我需要将我的 address_id 字段的字段类型从 int 更改为 bigint，它也是一个 FOREIGN KEY。我刚刚测试了这个查询:
html - 响应式移动网站 : huge margins, 模拟字体
关闭。这个问题需要debugging details .它目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and t
c - 如何将巨大的tlb(huge page)挂载为文件系统？
这是我的机器详细信息 (ubuntu): $uname -a Linux rex-think 3.13.0-46-generic#76-Ubuntu SMP Thu Feb 26 18:52:13 U
c - 对普通变量使用类型修饰符(near、far、huge)
我使用类型修饰符(far,near,huge) 普通变量而不是指针，发现这些指针类型修饰符只适用于全局普通变量，但使用时会产生错误 block 的局部变量。 int near a,far b,huge

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python multiprocessing.Array : huge temporary memory overhead