gpt4 book ai didi

python - 将 SparseDataFrame 保存到 CSV 会引发 IndexError

转载 作者:太空宇宙 更新时间:2023-11-03 15:24:27 25 4
gpt4 key购买 nike

用示例更新了问题

我尝试重现我的问题。事实证明,它甚至与我的数据集的大小无关。这是重现我的问题的最小示例:

>>> import pandas as pd
>>> data = pd.SparseDataFrame({ 'user': ['a', 'b', 'c', 'd'], 'week': [4, 3, 2, 1] }, default_fill_value=0)
>>> data.info()
<class 'pandas.sparse.frame.SparseDataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
user 4 non-null object
week 4 non-null int64
dtypes: int64(1), object(1)
memory usage: 144.0+ bytes
>>> data.to_csv('error.csv', index=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 1383, in to_csv
formatter.save()
File "/usr/local/lib/python3.6/site-packages/pandas/formats/format.py", line 1475, in save
self._save()
File "/usr/local/lib/python3.6/site-packages/pandas/formats/format.py", line 1576, in _save
self._save_chunk(start_i, end_i)
File "/usr/local/lib/python3.6/site-packages/pandas/formats/format.py", line 1590, in _save_chunk
quoting=self.quoting)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 596, in to_native_types
values = values[:, slicer]
File "/usr/local/lib/python3.6/site-packages/pandas/sparse/array.py", line 401, in __getitem__
data_slice = self.values[key]
IndexError: too many indices for array

这是一个错误还是我做错了什么?

<小时/>

原始问题

我有一个巨大的稀疏数据框。

>>> data.shape
(3827022, 4893)

>>> type(data)
pandas.sparse.frame.SparseDataFrame

当我尝试将其保存到 CSV 文件时,它会引发 IndexError。难道是因为数据太大了?指定 chunksize 并不能解决问题。

>>> data.to_csv('../data/hashtags_binarized.csv', index=False)

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-58-550cc98888dc> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', "data.to_csv('../data/hashtags_binarized.csv', index=False)")

/usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2113 magic_arg_s = self.var_expand(line, stack_depth)
2114 with self.builtin_trap:
-> 2115 result = fn(magic_arg_s, cell)
2116 return result
2117

<decorator-gen-59> in time(self, line, cell, local_ns)

/usr/local/lib/python3.6/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):

/usr/local/lib/python3.6/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1179 if mode=='eval':
1180 st = clock2()
-> 1181 out = eval(code, glob, local_ns)
1182 end = clock2()
1183 else:

<timed eval> in <module>()

/usr/local/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1381 doublequote=doublequote,
1382 escapechar=escapechar, decimal=decimal)
-> 1383 formatter.save()
1384
1385 if path_or_buf is None:

/usr/local/lib/python3.6/site-packages/pandas/formats/format.py in save(self)
1473 self.writer = csv.writer(f, **writer_kwargs)
1474
-> 1475 self._save()
1476
1477 finally:

/usr/local/lib/python3.6/site-packages/pandas/formats/format.py in _save(self)
1574 break
1575
-> 1576 self._save_chunk(start_i, end_i)
1577
1578 def _save_chunk(self, start_i, end_i):

/usr/local/lib/python3.6/site-packages/pandas/formats/format.py in _save_chunk(self, start_i, end_i)
1588 decimal=self.decimal,
1589 date_format=self.date_format,
-> 1590 quoting=self.quoting)
1591
1592 for col_loc, col in zip(b.mgr_locs, d):

/usr/local/lib/python3.6/site-packages/pandas/core/internals.py in to_native_types(self, slicer, na_rep, quoting, **kwargs)
594 values = self.values
595 if slicer is not None:
--> 596 values = values[:, slicer]
597 mask = isnull(values)
598

/usr/local/lib/python3.6/site-packages/pandas/sparse/array.py in __getitem__(self, key)
399 return self._get_val_at(key)
400 elif isinstance(key, tuple):
--> 401 data_slice = self.values[key]
402 else:
403 if isinstance(key, SparseArray):

IndexError: too many indices for array

最佳答案

使用另一个应用“toCSV("name.csv") 创建 CSV 的选项,您将收到错误“SparseDataFrame”对象没有属性“toCSV”。所以使用'.to_dense().to_csv('name.csv')

df.to_dense().to_csv("name.csv", index = False, sep=',', encoding='utf-8')

关于python - 将 SparseDataFrame 保存到 CSV 会引发 IndexError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43263074/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com