python - 索引错误 : index 201850347 is out of bounds for axis 0 with size 201837124 while using unstack operation for large dataset-6ren

python - 索引错误 : index 201850347 is out of bounds for axis 0 with size 201837124 while using unstack operation for large dataset

转载作者：太空宇宙更新时间：2023-11-04 06:41:31

25

4

我正在尝试为我的项目实现用户-用户协同过滤算法。下面的代码适用于非常小的数据集，但在 yelp 数据集(100 万用户 200k 产品)上运行它时，它在使用 unstack 的行上给出了索引错误。它可以正确读取和打印庞大的数据集，但不会对其进行拆分。输入是一个包含用户、产品和相关评级的数据集。输出必须是计算出的预测。

我在 stackoverflow 上发现了许多其他问题来解决这个问题，但它们与 Python 的 unstack 操作无关。我尝试了替代方案，例如不使用 unstack 并单独使用 groupby 进行所有操作，但这是不可行的。我不知道如何解决这个问题。

import pandas as pd;
from math import *;

df = pd.read_csv('preprocessed.csv', names    ['users','Products','stars'],low_memory=False)
s = df.groupby(['users', 'Products']).sum()
m = s.unstack(fill_value=0.0)
print(m)

输出:

stars
Products product1 product2 product3
users
user1         1.0      0.0      4.0
user2         1.0      3.0      0.0
user3         1.0      0.0      0.0
user4         0.0      0.0      3.0


Predicted ratings
     stars
Products product1  product2  product3
users
user1         1.0  0.115504  4.000000
user2         1.0  3.000000  1.489822
user3         1.0  0.478533  0.521467
user4         1.5  0.500000  3.000000

最佳答案

当我拆开一个大的 groupby 系列时，我遇到了和你一样的问题。问题来自没有足够的内存。因此，最好的解决方案是使用更大的 RAM 机器来运行您的计算。

关于python - 索引错误 : index 201850347 is out of bounds for axis 0 with size 201837124 while using unstack operation for large dataset，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42816744/

25

4

0

文章推荐： java - 在 JFrame 中设置布局的问题

文章推荐： c - read() 缓冲区中出现乱码？

文章推荐： c - 获取字符数组的输入

python - pandas df 转换 : a better way than df. unstack().unstack()
尝试将 pandas DataFrames 从宽格式转换为长格式。我尝试过 melt() ，使用wide_to_long() (简单的 melt())，但一直与我收到的语法和输出混淆。我还阅读了
python-3.x - 大数据集，使用 unstack() 时收到 "Unstacked DataFrame is too big, causing int32 overflow"
我试过 pivot和 groupby + unstack ，两者都给了我错误。错误说 "Unstacked DataFrame is too big, causing int32 overflow"
python - 有没有办法将数据框设为 "unstack"并作为列表值返回
我有一个如下所示的数据框: import pandas as pd df = pd.DataFrame({'type_a': [1,0,0,0,0,1,0,0,0,1],
tensorflow - 具有动态形状的 tf.unstack
我正在尝试解开张量，因为我需要一个序列作为 RNN 的输入。我正在使用可变序列长度，这使我无法正确使用 tf.unstack . def MapToSequences(x): # x.get_
r - Unstack lubridate 的间隔类
我正在尝试通过取消嵌套/取消堆叠 df 列，将包含 value 列、两个日期列( start 和 end )和间隔列( duration )的数据帧 duration 转换为长格式。 library(
python - 如何在 unstack 过程中使用自定义列名称并更改结构？
我有一个如下所示的数据框 op1 = pd.DataFrame({ 'subject_id':[1,1,1,1,2,2,2,2], 'date' : ['1/1/2017','1/2/2017','1
python - Pandas unstack 不起作用
最初我有 DF，其中 1 列操作用 DatetimeIndex 索引: In [371]: dates 2013-12-29 19:21:00 action1 2013-12-29 19:21:
python - Pandas unstack 不应对剩余索引进行排序
我在问自己是否有可能取消堆叠多索引数据帧的一层，以便不对返回的数据帧的其余索引进行排序!代码示例: arrays = [["room1", "room1", "room1", "room1", "ro
python - 使用 unstack() 时将数据帧值应用于数据帧
似乎有很多这方面的内容，但我找不到我需要的东西。我正在使用 unstack() 创建一个可以绘制其中项目的 DataFrame。启动 DataFrame 示例: Date word
Pandas - Pivot/stack/unstack/melt
我有一个如下所示的数据框: 名称值 1 值 2 A 100 101 A 100 102 A 100 103 B 200 201 B 200 202 B 200 203 C 300 301 C 30
python - NumPy 中有 unstack 吗？
有np.stack在 NumPy ，但是否有相反的np.unstack同 tf.unstack ? 最佳答案遇到这么晚，这里有一个简单得多的答案: def unstack(a, axis=0):
python - Pandas.Series.unstack() 会影响数据类型吗？
我正在开发一个包含混合类型值(timedeltas 和 int)的 MultiIndex 系列: char 7 a 103 minutes s 6
python - 如何在 pandas unstack 之前动态重命名列？
我使用 groupby 和 sum 创建了以下数据框:- year_month Country 2008-01 Afghanistan 2
python - Pandas :DataFrame.unstack 错误
我编写了以下函数将数据框的几列转换为数值: def factorizeMany(data, columns): """ Factorize a bunch of columns in a da
python - 为什么 pandas unstack 会抛出错误？
我正在尝试拆开两列: cols = res.columns[:31] res[cols] = res[cols].ffill() res = res.set_index(cols + [31])[32
python - Pandas stack/unstack 的错误结果
我有这个 Pandas DataFrame: rnd non-rnd first last andrew wood 0 1
python - "unstack"包含多行列表的 pandas 列
这个问题已经有答案了: How to unnest (explode) a column in a pandas DataFrame, into multiple rows (16 个回答) 已关闭
python - 迭代 Groupby.unstack() 项以制作单独的图
我有一个名为 afplot 的数据框: apple_fplot = apple_f1.groupby(['Year','Domain Category'])['Value'].sum() afplot
python - 在tensorflow中解压(unstack)一个没有维度的输入(占位符)
我正在尝试将 LSTM 与具有不同时间步长(不同帧数)的输入一起使用。 rnn.static_rnn 的输入应该是一个 tf 序列(不是 tf!)。所以，我应该将我的输入转换为序列。我尝试使用 tf.
python - Pandas DataFrame.unstack() 更改行和列标题的顺序
我遇到了以下对行和列标题进行排序的问题。这里是重现这个的方法: X =pd.DataFrame(dict(x=np.random.normal(size=100), y=np.random.norm

首页

博学

6Ren·AI

商城

python - 索引错误 : index 201850347 is out of bounds for axis 0 with size 201837124 while using unstack operation for large dataset