gpt4 book ai didi

python - 在 Python 中连接数据帧时出现内存错误

转载 作者:太空宇宙 更新时间:2023-11-03 15:01:56 26 4
gpt4 key购买 nike

我有一个 680 MB 的大型 csv 文件,我必须在数据帧内读取该文件。

我将文件分割成 block ,然后将这些 block 附加到列表中。

然后我尝试使用 pd.concat() 创建一个合并的数据框。

我使用下面的代码来实现这一点:

temp_list = []
chunksize = 10 ** 5

for chunk in pd.read_csv('./data/properties_2016.csv', chunksize=chunksize, low_memory=False):
temp_list.append(chunk)

properties_df = temp_list[0]

for df in temp_list[1:]:
properties_df = pd.concat([properties_df, df], ignore_index=True)

我正在尝试通过运行 docker 镜像来做到这一点。

我遇到以下内存错误:

Traceback (most recent call last):
File "dataIngestion.py", line 53, in <module>
properties_df = pd.concat([properties_df, df], ignore_index=True)
File "/usr/local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 206, in concat
copy=copy)
File "/usr/local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 266, in __init__
obj._consolidate(inplace=True)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3156, in _consolidate
self._consolidate_inplace()
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3138, in _consolidate_inplace
self._protect_consolidate(f)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3127, in _protect_consolidate
result = f()
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3136, in f
self._data = self._data.consolidate()
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3573, in consolidate
bm._consolidate_inplace()
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3578, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 4525, in _consolidate
_can_consolidate=_can_consolidate)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 4548, in _merge_blocks
new_values = new_values[argsort]
MemoryError

请帮忙!!

最佳答案

连接数据帧不能以这种方式工作。我认为这个link会有帮助的

这才是正确的做法

temp_list = []
chunksize = 10 ** 5

for chunk in pd.read_csv('./data/properties_2016.csv', chunksize=chunksize, low_memory=False):
temp_list.append(chunk)

frames = []
for df in temp_list:
frames.append(df)
properties_df = pd.concat(frames, ignore_index=True)

我在一个小文件上尝试过并成功,如果您仍然遇到相同的错误,请告诉我。

关于python - 在 Python 中连接数据帧时出现内存错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45002099/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com