gpt4 book ai didi

r - 检查唯一值并在 R data.table 中只有一个唯一值时返回它的最快方法

转载 作者:行者123 更新时间:2023-12-03 16:15:31 34 4
gpt4 key购买 nike

假设我有一个大 data.table看起来像 dt以下。

dt <- data.table(
player_1 = c("a", "b", "b", "c"),
player_1_age = c(10, 20, 20, 30),
player_2 = c("b", "a", "c", "a"),
player_2_age = c(20, 10, 30, 10)
)
# dt
# player_1 player_1_age player_2 player_2_age
# 1: a 10 b 20
# 2: b 20 a 10
# 3: b 20 c 30
# 4: c 30 a 10

来自 dt以上,我想创建一个 data.table具有独特的球员和他们的年龄如下所示, player_dt :
# player_dt
# player age
# a 10
# b 20
# c 30

为此,我尝试了下面的代码,但在我的较大数据集上花费的时间太长,可能是因为我正在创建 data.table对于 sapply 的每次迭代.

您将如何获得 player_dt以上, 同时检查每个 player只有一个唯一age ?
# get unique players
player <- sort(unique(c(dt$player_1, dt$player_2)))

# for each player, get their age, if there is only one age value
age <- sapply(player, function(x) {
unique_values <- unique(c(
dt[player_1 == x][["player_1_age"]],
dt[player_2 == x][["player_2_age"]]))
if(length(unique_values) > 1) stop() else return(unique_values)
})

# combine to create the player_dt
player_dt <- data.table(player, age)

最佳答案

我使用来自@DavidT 的数据作为输入。

dt
# player_1 player_1_age player_2 player_2_age
#1: a 10 b 20
#2: b 20 a 10
#3: b 20 c 30
#4: c 30 a 11 # <--

TL;博士

你可以做
nm <- names(dt)
idx <- endsWith(nm, "age")
colsAge <- nm[idx]
colsOther <- nm[!idx]

out <-
unique(melt(
dt,
measure.vars = list(colsAge, colsOther),
value.name = c("age", "player")
)[, .(age, player)])[, if (.N == 1) # credit: https://stackoverflow.com/a/34427944/8583393
.SD, by = player]
out
# player age
#1: b 20
#2: c 30

一步一步

您可以做的是同时熔化多个列 - 以 "age" 结尾的列。和那些没有。
nm <- names(dt)
idx <- endsWith(nm, "age")
colsAge <- nm[idx]
colsOther <- nm[!idx]
dt1 <- melt(dt, measure.vars = list(colsAge, colsOther), value.name = c("age", "player"))

结果是
dt1
# variable age player
#1: 1 10 a
#2: 1 20 b
#3: 1 20 b
#4: 1 30 c
#5: 2 20 b
#6: 2 10 a
#7: 2 30 c
#8: 2 11 a

现在我们打电话 unique ...
out <- unique(dt1[, .(age, player)])
out
# age player
#1: 10 a
#2: 20 b
#3: 30 c
#4: 11 a

...并过滤 "player" 的组长度等于 1
out <- out[, if(.N == 1) .SD, by=player]
out
# player age
#1: b 20
#2: c 30

鉴于 OP 的输入数据,不需要最后一步。

数据
library(data.table)
dt <- data.table(
player_1 = c("a", "b", "b", "c"),
player_1_age = c(10, 20, 20, 30),
player_2 = c("b", "a", "c", "a"),
player_2_age = c(20, 10, 30, 11)
)

引用: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html

关于r - 检查唯一值并在 R data.table 中只有一个唯一值时返回它的最快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61563812/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com