gpt4 book ai didi

r - 在R中将团队结构重组为个人级别的数据(同时保留团队级别的信息)

转载 作者:行者123 更新时间:2023-12-04 11:58:00 26 4
gpt4 key购买 nike

我当前的数据如下所示:

Person  Team
10 100
11 100
12 100
10 200
11 200
14 200
15 200

我想根据他们在一起的团队来推断谁是彼此认识的人。我还想计算一个团队中一个双子组在一起的次数,并且我想跟踪链接每对人的团队识别码。换句话说,我想创建一个看起来像这样的数据集:
Person1 Person2 Count   Team1   Team2   Team3
10 11 2 100 200 NA
10 12 1 100 NA NA
11 12 1 100 NA NA
10 14 1 200 NA NA
10 15 1 200 NA NA
11 14 1 200 NA NA
11 15 1 200 NA NA

结果数据集捕获可以根据原始数据集中概述的团队推断出的关系。 “Count”变量反射(reflect)了一对人在一个团队中在一起的实例数量。 “Team1”,“Team2”和“Team3”变量列出了将每对人员彼此链接的团队ID。列出哪个人/团队ID与第二个ID并没有什么区别。小组规模从2名成员到8名成员不等。

最佳答案

这是一个“data.table”解决方案,它似乎可以到达您想要的位置(尽管有很多代码):

library(data.table)
dcast.data.table(
dcast.data.table(
as.data.table(d)[, combn(Person, 2), by = Team][
, ind := paste0("Person", c(1, 2))][
, time := sequence(.N), by = list(Team, ind)],
time + Team ~ ind, value.var = "V1")[
, c("count", "time") := list(.N, sequence(.N)), by = list(Person1, Person2)],
Person1 + Person2 + count ~ time, value.var = "Team")
# Person1 Person2 count 1 2
# 1: 10 11 2 100 200
# 2: 10 12 1 100 NA
# 3: 10 14 1 200 NA
# 4: 10 15 1 200 NA
# 5: 11 12 1 100 NA
# 6: 11 14 1 200 NA
# 7: 11 15 1 200 NA
# 8: 14 15 1 200 NA

更新:上面的逐步版本

要了解上述情况,请按以下步骤操作:
## The following would be a long data.table with 4 columns:
## Team, V1, ind, and time
step1 <- as.data.table(d)[
, combn(Person, 2), by = Team][
, ind := paste0("Person", c(1, 2))][
, time := sequence(.N), by = list(Team, ind)]
head(step1)
# Team V1 ind time
# 1: 100 10 Person1 1
# 2: 100 11 Person2 1
# 3: 100 10 Person1 2
# 4: 100 12 Person2 2
# 5: 100 11 Person1 3
# 6: 100 12 Person2 3

## Here, we make the data "wide"
step2 <- dcast.data.table(step1, time + Team ~ ind, value.var = "V1")
step2
# time Team Person1 Person2
# 1: 1 100 10 11
# 2: 1 200 10 11
# 3: 2 100 10 12
# 4: 2 200 10 14
# 5: 3 100 11 12
# 6: 3 200 10 15
# 7: 4 200 11 14
# 8: 5 200 11 15
# 9: 6 200 14 15

## Create a "count" column and a "time" column,
## grouped by "Person1" and "Person2".
## Count is for the count column.
## Time is for going to a wide format
step3 <- step2[, c("count", "time") := list(.N, sequence(.N)),
by = list(Person1, Person2)]
step3
# time Team Person1 Person2 count
# 1: 1 100 10 11 2
# 2: 2 200 10 11 2
# 3: 1 100 10 12 1
# 4: 1 200 10 14 1
# 5: 1 100 11 12 1
# 6: 1 200 10 15 1
# 7: 1 200 11 14 1
# 8: 1 200 11 15 1
# 9: 1 200 14 15 1

## The final step of going wide
out <- dcast.data.table(step3, Person1 + Person2 + count ~ time,
value.var = "Team")
out
# Person1 Person2 count 1 2
# 1: 10 11 2 100 200
# 2: 10 12 1 100 NA
# 3: 10 14 1 200 NA
# 4: 10 15 1 200 NA
# 5: 11 12 1 100 NA
# 6: 11 14 1 200 NA
# 7: 11 15 1 200 NA
# 8: 14 15 1 200 NA

关于r - 在R中将团队结构重组为个人级别的数据(同时保留团队级别的信息),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27809180/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com