gpt4 book ai didi

Python 3.1 - 大列表采样期间的内存错误

转载 作者:太空宇宙 更新时间:2023-11-03 12:42:29 25 4
gpt4 key购买 nike

输入列表可以超过100万个数字。当我使用较小的“重复”运行以下代码时,没问题;

def sample(x):
length = 1000000
new_array = random.sample((list(x)),length)
return (new_array)

def repeat_sample(x):
i = 0
repeats = 100
list_of_samples = []
for i in range(repeats):
list_of_samples.append(sample(x))
return(list_of_samples)

repeat_sample(large_array)

但是,使用高重复次数(例如上面的 100 次)会导致 MemoryError。回溯如下;

Traceback (most recent call last):
File "C:\Python31\rnd.py", line 221, in <module>
STORED_REPEAT_SAMPLE = repeat_sample(STORED_ARRAY)
File "C:\Python31\rnd.py", line 129, in repeat_sample
list_of_samples.append(sample(x))
File "C:\Python31\rnd.py", line 121, in sample
new_array = random.sample((list(x)),length)
File "C:\Python31\lib\random.py", line 309, in sample
result = [None] * k
MemoryError

我假设我的内存不足。我不知道如何解决这个问题。

感谢您的宝贵时间!

最佳答案

扩展我的评论:

假设您对每个样本所做的处理是计算其均值。

def mean(samplelists):
means = []
n = float(len(samplelists[0]))
for sample in samplelists:
mean = sum(sample)/n
means.append(mean)
return means

calc_means(repeat_sample(large_array))

这会让您记住所有这些列表而汗流浃背。你可以像这样让它更轻:

def mean(sample, n):
n = float(n)
mean = sum(sample)/n
return mean

def sample(x):
length = 1000000
new_array = random.sample(x, length)
return new_array

def repeat_means(x):
repeats = 100
list_of_means = []
for i in range(repeats):
list_of_means.append(mean(sample(x)))
return list_of_means

repeat_means(large_array)

但这还不够好...您只需构建结果列表即可完成所有操作:

import random

def sampling_mean(population, k, times):
# Part of this is lifted straight from random.py
_int = int
_random = random.random

n = len(population)
kf = float(k)
result = []

if not 0 <= k <= n:
raise ValueError, "sample larger than population"

for t in range(times):
selected = set()
sum_ = 0
selected_add = selected.add

for i in xrange(k):
j = _int(_random() * n)
while j in selected:
j = _int(_random() * n)
selected_add(j)
sum_ += population[j]

mean = sum_/kf
result.append(mean)
return result

sampling_mean(x, 1000000, 100)

现在,你的算法可以这样精简吗?

关于Python 3.1 - 大列表采样期间的内存错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4706151/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com