gpt4 book ai didi

python - 尝试通过选择特定数据在 Python 中创建 3D 矩阵

转载 作者:行者123 更新时间:2023-12-01 06:30:34 24 4
gpt4 key购买 nike

嗨,我想在这里制作一个 3D 矩阵。这是 MovieLens 数据 ( https://grouplens.org/datasets/movielens/100k/ ),我从中获取 u1.baseu1.test 配对作为训练集和测试集(分别)。下面是您将在代码中发现的变量 training_set 数据格式的图像。

enter image description here

我尝试创建的 3D 矩阵的格式为 (User, Movie, Timestamp),每个单元格中的数据是用户 1 给的评分时间 1 播放电影 1。

如果有任何帮助,下面是创建二维矩阵的代码,行中包含用户,列中包含所有电影。

import numpy as np
import pandas as pd

training_set = pd.read_csv('ml-100k/u1.base', delimiter = '\t')
training_set = np.array(training_set, dtype='int')
test_set = pd.read_csv('ml-100k/u1.test', delimiter = '\t')
test_set = np.array(test_set, dtype = 'int64')

nb_users = int(max(max(training_set[:, 0]), max(test_set[:, 0])))
nb_movies = int(max(max(training_set[:, 1]), max(test_set[:, 1])))

def convert(data):
new_data = [] #final list that we will return
for id_users in range(1, nb_users+1):
id_movies = data[:, 1][data[:, 0] == id_users] #contains the IDs of the movies rated by the id_user
id_ratings = data[:, 2][data[:, 0] == id_users] #all movie ratings given by specific user
ratings = np.zeros(nb_movies)
ratings[id_movies-1] = id_ratings #these two lines are just so that the movies that are not rated by user have null (0) values
new_data.append(list(ratings))
return (new_data)
training_set = convert(training_set)
test_set = convert(test_set)

下面是我尝试过的代码,它给出了许多错误,错误太多,以至于我无法滚动到它抛出的第一个错误。

import numpy as np
import pandas as pd
training_set = pd.read_csv('ml-100k/u1.base', delimiter = '\t')
training_set = np.array(training_set, dtype='int')
test_set = pd.read_csv('ml-100k/u1.test', delimiter = '\t')
test_set = np.array(test_set, dtype = 'int64')

nb_users = int(max(max(training_set[:, 0]), max(test_set[:, 0])))
nb_movies = int(max(max(training_set[:, 1]), max(test_set[:, 1])))

#The changes I made start here --

nb_timestamps = int(max(len(training_set[:, 3]), len(test_set[:, 3])))

ts_min = int(min(min(training_set[:, 3]), min(test_set[:, 3])))
ts_max = int(max(max(training_set[:, 3]), max(test_set[:, 3])))


def convert(data):
new_data = [] #final list that we will return
for timestamp in range(ts_min, ts_max+1):
for id_users in range(1, nb_users+1):
id_movies = data[:, 1][data[:, 0] == id_users][data[:, 3] == timestamp]
#contains the IDs of the movies rated by the id_user
id_ratings = data[:, 2][data[:, 0] == id_users][data[:, 3] == timestamp]
ratings = np.zeros(nb_movies)
ratings[id_movies-1] = id_ratings
new_data.append(list(ratings))
return (new_data)
training_set = convert(training_set)
test_set = convert(test_set)

最佳答案

备注:请不要将此视为答案。

您的代码中有一些需要改进的地方:

  • 当您读取 csv 时,您会将第一行作为标题,这意味着您没有考虑所有数据
  • 如果在这种情况下(asn 应该如此)只有一个用户只能对一部电影评分一次,您可以使用 pd.pivot_table 来获取 2D 矩阵。
import pandas as pd
import numpy as np
training_set = pd.read_csv('ml-100k/u1.base',
delimiter='\t',
header=None, # First row is not header
names=["user", "movie",
"rating", "timestamp"]) # rename headers

# with pd.pivot_table you get a df where user are in rows
# and movies in columns. The value is the rating for movie (i,j)
ratings = pd.pivot_table(training_set,
index=["user"],
columns=["movie"],
values="rating")

如果您想要 0s 而不是 NaN,您可以使用 ratings.fillna(0)。但我不会这样做。您应该小心,因为这会弄乱您想要提取的最终统计信息。

如果您需要二维矩阵,您可以使用 ratings.values

更新

为了获得 3D 矩阵,我们可以使用时间戳进行相同的旋转

timestamps = pd.pivot_table(training_set,
index=["user"],
columns=["movie"],
values="timestamp")

# get matrix
mat_ratings = ratings.values
mat_timestamps = timestamps.values

# stack matrix
mat3d = np.dstack((mat_ratings, mat_timestamps))

您现在可以检查一下,从形状为 (943, 1650) 的 2 个矩阵中,我们得到了形状为 (943, 1650, 2) 的矩阵之一。请注意,要获取矩阵 mat 的形状,只需运行 mat.shape

关于python - 尝试通过选择特定数据在 Python 中创建 3D 矩阵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59932341/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com