gpt4 book ai didi

python - 如何修复将文件上传到 Google Cloud Storage 时的内存泄漏?

转载 作者:行者123 更新时间:2023-12-05 07:21:51 25 4
gpt4 key购买 nike

内存泄漏是通过 memory_profiler 检测到的。由于这么大的文件将从 128MB GCFf1-micro GCE 上传,我怎么能阻止这个内存泄漏?

✗ python -m memory_profiler tests/test_gcp_storage.py
67108864

Filename: tests/test_gcp_storage.py

Line # Mem usage Increment Line Contents
================================================
48 35.586 MiB 35.586 MiB @profile
49 def test_upload_big_file():
50 35.586 MiB 0.000 MiB from google.cloud import storage
51 35.609 MiB 0.023 MiB client = storage.Client()
52
53 35.609 MiB 0.000 MiB m_bytes = 64
54 35.609 MiB 0.000 MiB filename = int(datetime.utcnow().timestamp())
55 35.609 MiB 0.000 MiB blob_name = f'test/{filename}'
56 35.609 MiB 0.000 MiB bucket_name = 'my_bucket'
57 38.613 MiB 3.004 MiB bucket = client.get_bucket(bucket_name)
58
59 38.613 MiB 0.000 MiB with open(f'/tmp/{filename}', 'wb+') as file_obj:
60 38.613 MiB 0.000 MiB file_obj.seek(m_bytes * 1024 * 1024 - 1)
61 38.613 MiB 0.000 MiB file_obj.write(b'\0')
62 38.613 MiB 0.000 MiB file_obj.seek(0)
63
64 38.613 MiB 0.000 MiB blob = bucket.blob(blob_name)
65 102.707 MiB 64.094 MiB blob.upload_from_file(file_obj)
66
67 102.715 MiB 0.008 MiB blob = bucket.get_blob(blob_name)
68 102.719 MiB 0.004 MiB print(blob.size)

此外,如果文件不是以二进制模式打开,内存泄漏将是文件大小的两倍

67108864Filename: tests/test_gcp_storage.pyLine #    Mem usage    Increment   Line Contents================================================    48   35.410 MiB   35.410 MiB   @profile    49                             def test_upload_big_file():    50   35.410 MiB    0.000 MiB     from google.cloud import storage    51   35.441 MiB    0.031 MiB     client = storage.Client()    52                                 53   35.441 MiB    0.000 MiB     m_bytes = 64    54   35.441 MiB    0.000 MiB     filename = int(datetime.utcnow().timestamp())    55   35.441 MiB    0.000 MiB     blob_name = f'test/{filename}'    56   35.441 MiB    0.000 MiB     bucket_name = 'my_bucket'    57   38.512 MiB    3.070 MiB     bucket = client.get_bucket(bucket_name)    58                                 59   38.512 MiB    0.000 MiB     with open(f'/tmp/{filename}', 'w+') as file_obj:    60   38.512 MiB    0.000 MiB       file_obj.seek(m_bytes * 1024 * 1024 - 1)    61   38.512 MiB    0.000 MiB       file_obj.write('\0')    62   38.512 MiB    0.000 MiB       file_obj.seek(0)    63                                 64   38.512 MiB    0.000 MiB       blob = bucket.blob(blob_name)    65  152.250 MiB  113.738 MiB       blob.upload_from_file(file_obj)    66                                 67  152.699 MiB    0.449 MiB     blob = bucket.get_blob(blob_name)    68  152.703 MiB    0.004 MiB     print(blob.size)

要点:https://gist.github.com/northtree/8b560a6b552a975640ec406c9f701731

最佳答案

要限制上传期间使用的内存量,您需要在调用 upload_from_file() 之前明确配置 blob 的 block 大小:

blob = bucket.blob(blob_name, chunk_size=10*1024*1024)
blob.upload_from_file(file_obj)

我同意这是 Google 客户端 SDK 的错误默认行为,并且也没有详细记录解决方法。

关于python - 如何修复将文件上传到 Google Cloud Storage 时的内存泄漏?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56784548/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com