gpt4 book ai didi

r - R中的复杂数据表操作

转载 作者:行者123 更新时间:2023-12-04 11:33:39 25 4
gpt4 key购买 nike

假设我有一个包含观看电影的人的数据表,例如

library(data.table)
DT = fread("
User, Movie
Alice , Fight Club
Alice, The Godfather
Bob, Titanic
Charlotte, The Godfather")

我想计算每对电影的观看人数和至少观看一部的人数,即
Movie1        Movie2           WatchedOne   WatchedBoth
Fight Club The Godfather 2 1
The Godfather Titanic 3 0
Fight Club Titanic 2 0

我有数百万行,我需要一个极快的 data.table 函数:-)

感谢帮助!

最佳答案

其它的办法:

DT = DT[, .(Users = list(User)), keyby='Movie']

Y = data.table(t(combn(DT$Movie, 2)))
setnames(Y, c('Movie1','Movie2'))

Y[DT, on=.(Movie1==Movie), Movie1.Users:= Users]
Y[DT, on=.(Movie2==Movie), Movie2.Users:= Users]

#Y[, WatchedOne:= lengths(Map(union, Movie1.Users, Movie2.Users))]
Y[, WatchedBoth:= lengths(Map(intersect, Movie1.Users, Movie2.Users))]
# better:
Y[, WatchedOne:= lengths(Movie1.Users) + lengths(Movie2.Users) - WatchedBoth]

> Y[, -(3:4)]
# Movie1 Movie2 WatchedBoth WatchedOne
# 1: Fight Club The Godfather 1 2
# 2: Fight Club Titanic 0 2
# 3: The Godfather Titanic 0 3

关于r - R中的复杂数据表操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46057401/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com