我正在使用 blobstore 以 csv 格式备份和恢复实体。这个过程对我所有的小模型都很有效。但是,一旦我开始处理实体超过 2K 的模型,就会超出软内存限制。我一次只获取 50 个实体,然后将结果写入 blobstore,所以我不清楚为什么我的内存使用量会增加。我可以通过增加下面传递的“限制”值来可靠地使该方法失败,这会导致该方法运行的时间稍长一些,以导出更多实体。
此外,生成的文件大小将只有 <500KB。为什么该进程会使用 140 MB 内存?
file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
writer = csv.DictWriter(f, fieldnames=properties)
for entity in models.Player.all():
row = backup.get_dict_for_entity(entity)
产生错误:在为总共 7 个请求提供服务后,超过 150.957 MB 的软专用内存限制
简化示例 2:
问题似乎与在 python 2.5 中使用文件和 with 语句有关。排除 csv 内容,我可以通过简单地尝试将 4000 行文本文件写入 blobstore 来重现几乎相同的错误。
from __future__ import with_statement
from google.appengine.api import files
from google.appengine.ext.blobstore import blobstore
file_name = files.blobstore.create(mime_type='application/octet-stream')
myBuffer = StringIO.StringIO()
#Put 4000 lines of text in myBuffer
with files.open(file_name, 'a') as f:
for line in myBuffer.getvalue().splitlies():
blob_key = files.blobstore.get_blob_key(file_name)
产生错误:在为总共 24 个请求提供服务后,超过了 154.977 MB 的软专用内存限制
def backup_model_to_blobstore(model, limit=None, batch_size=None):
file_name = files.blobstore.create(mime_type='application/octet-stream')
# Open the file and write to it
with files.open(file_name, 'a') as f:
#Get the fieldnames for the csv file.
query = model.all().fetch(1)
entity = query[0]
properties = entity.__class__.properties()
#Add ID as a property
properties['ID'] = entity.key().id()
#For debugging rather than try and catch
if True:
writer = csv.DictWriter(f, fieldnames=properties)
#Write out a header row
headers = dict( (n,n) for n in properties )
numBatches = int(limit/batch_size)
if numBatches == 0:
numBatches = 1
for x in range(numBatches):
logging.info("************** querying with offset %s and limit %s", x*batch_size, batch_size)
query = model.all().fetch(limit=batch_size, offset=x*batch_size)
for entity in query:
#This just returns a small dictionary with the key-value pairs
row = get_dict_for_entity(entity)
#write out a row for each entity.
# Finalize the file. Do this before attempting to read it.
blob_key = files.blobstore.get_blob_key(file_name)
return blob_key
2012-02-02 21:59:19.063
************** querying with offset 2050 and limit 50
I 2012-02-02 21:59:20.076
************** querying with offset 2100 and limit 50
I 2012-02-02 21:59:20.781
************** querying with offset 2150 and limit 50
I 2012-02-02 21:59:21.508
Exception for: Chris (
Traceback (most recent call last):
blob_key = backup_model_to_blobstore(model, limit=limit, batch_size=batch_size)
File "/base/data/home/apps/singpath/163.356548765202135434/singpath/backup.py", line 125, in backup_model_to_blobstore
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
self._make_rpc_call_with_retry('Close', request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
raise FileNotOpenedError()
C 2012-02-02 21:59:23.009
Exceeded soft private memory limit with 149.426 MB after servicing 14 requests total
最好不要自己进行批处理,而只是遍历查询。迭代器将选择一个应该足够的批量大小(可能是 20):
q = model.all()
for entity in q:
row = get_dict_for_entity(entity)
关于内存使用的一个经常被忽视的事实是,与实体的序列化形式相比,实体的内存中表示可以使用 30-50 倍的 RAM;例如磁盘上 3KB 的实体可能会使用 100KB 的 RAM。 (确切的膨胀系数取决于许多因素;如果您有很多名称很长而值很小的属性,情况会更糟,对于重复的名称很长的属性,情况更糟。)
