% anti_join(-6ren">
gpt4 book ai didi

r - 检查 df1 中的所有名称是否都出现在 df2 中的每个 ID

转载 作者:行者123 更新时间:2023-12-04 10:28:12 25 4
gpt4 key购买 nike

我有两个这样的dfs:

df1

name <- c("Ted","Bill","James","Randy","Mark","Jimmy","Eric","Allen")
team <- c("Hawks","Tigers","Bears","Tigers","Lions","Bears","Hawks","Lions")
df1 <- data.frame(name,team)

df2

 name <- c("Ted","Bill","Mark","Jimmy","Eric","James","Allen","Randy","Bill","James","Mark")
team <- c("Hawks","Tigers","Lions","Bears","Hawks","Bears","Lions","Tigers","Tigers","Bears","Lions")
game_id <- c("21","23","28","21","21","21","29","22","22","32","42")
df2 <- data.frame(name,team,game_id)

如果 game_id 在 df1 中没有其各自团队的所有名称,我想用 NA 标记 df2 中的 game_id。例如,在我提供的示例数据中,包含“James”和“Bears”的行中的 game_id 32 将是标记为 NA 的 game_id 之一,因为“Jimmy”不代表 game_id 32 df2。我们知道 Jimmy 必须被代表,因为他出现在 df1 的一行中,并为他的团队指定了“Bears”。

我想要的示例数据输出如下所示:

df3

name <- c("Ted","Bill","Mark","Jimmy","Eric","James","Allen","Randy","Bill","James","Mark")
team <- c("Hawks","Tigers","Lions","Bears","Hawks","Bears","Lions","Tigers","Tigers","Bears","Lions")
game_id <- c("21",NA,NA,"21","21","21",NA,"22","22",NA,NA)
df3 <- data.frame(name,team,game_id)

我认为解决方案首先是传播 df1(在添加唯一 ID 列之后),如下所示:

df1$row_index <- seq.int(nrow(df1))
df1 <- spread(df1,team,name)

但在那之后我就卡住了。执行此操作的最佳方法是什么?

最佳答案

你应该能够通过针对所有正确的团队/名称组合的“反加入”来做到这一点:

badgames <- df1 %>%
full_join(distinct(select(df2, game_id, team)), by="team") %>%
anti_join(df2, by=c("team", "game_id", "name")) %>%
select(game_id,team) %>%
mutate(hit = 1)

df2 %>%
left_join(badgames, by=c("game_id","team")) %>%
mutate(game_id = replace(game_id, hit==1, NA), hit = NULL)

相同的逻辑适用于 data.table 键连接,您可以通过将 ! 放在连接表的前面来指定反连接。您还可以使用 := 在同一步骤中完成所有更新,而不是创建中间数据集:

library(data.table)
setDT(df1)
setDT(df2)
df2[
df1[unique(df2[, .(game_id,team)]), on=.(team)][
!df2, on=.(game_id, team, name)], on=.(game_id,team),
game_id := NA
]

两者都导致:

#    name   team game_id
#1 Ted Hawks 21
#2 Bill Tigers <NA>
#3 Mark Lions <NA>
#4 Jimmy Bears 21
#5 Eric Hawks 21
#6 James Bears 21
#7 Allen Lions <NA>
#8 Randy Tigers 22
#9 Bill Tigers 22
#10 James Bears <NA>
#11 Mark Lions <NA>

关于r - 检查 df1 中的所有名称是否都出现在 df2 中的每个 ID,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49764056/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com