gpt4 book ai didi

r - 将匹配 NA 的 df 的字符行与所有内容进行比较,并根据比较创建新列或 df

转载 作者:行者123 更新时间:2023-12-04 13:06:14 30 4
gpt4 key购买 nike

我有一个包含字符值的非常大的数据框。我想将行相互比较并根据比较创建 ID。问题是 df 中有 NA,我希望将它们评估为匹配任何值。另一个问题是 ID 也需要在同一步骤中创建(或者我以过于复杂的方式考虑问题)。

这是我创建的玩具 df:

library(tidyverse)
library(purrr)

# make toy df
Set1 <- c("A", "B", "C","A")
Set2 <- c("A", "D", "B", "A")
Set3 <- c(NA, "B", "C", "A")
Set4 <- c("A", "G", "B", "A")
Set5 <- c("F", "G", NA, "F")
Set6 <- c("A", "B", "C", "C")
sets <- rbind(Set1, Set2, Set3, Set4, Set5, Set6)
colnames(sets) <- c("Var1", "Var2", "Var3", "Var4")
sets

Var1 Var2 Var3 Var4
Set1 "A" "B" "C" "A"
Set2 "A" "D" "B" "A"
Set3 NA "B" "C" "A"
Set4 "A" "D" "B" "A"
Set5 "F" "G" NA "F"
Set6 "A" "B" "C" "C"

这是所需的输出,作为单独的 df 或作为新列,两者都一样好:

# as new column
Var1 Var2 Var3 Var4 COMP
Set1 "A" "B" "C" "A" "Group1"
Set2 "A" "D" "B" "A" "Group2
Set3 NA "B" "C" "A" "Group1"
Set4 "A" "D" "B" "A" "Group3"
Set5 "F" "G" NA "F" "Group4"
Set6 "A" "B" "C" "C" "Group5"

# as new df
COMP
Set1 "Group1"
Set2 "Group2
Set3 "Group1"
Set4 "Group3"
Set5 "Group4"
Set6 "Group5"

我认为这可以通过 rowwise()map 来实现,但即使阅读了类似的 questions我无法弄清楚如何实现这一目标,尤其是如何连续一致地命名新组。任何想法将不胜感激。

最佳答案

您可以将 NA 替换为 ,粘贴到字符串中并使用 grepl() 进行模式匹配。

library(magrittr)

sets <- as.data.frame(sets)

sets %>%
replace(is.na(sets), ".") %>%
do.call(paste0, .) %>%
outer(., ., function(x, y) mapply(grepl, x, y)) %>%
t() %>%
max.col(ties.method = "last") %>%
match(unique(.))

[1] 1 2 1 2 3 4

但有可能将 NA 视为 wild 将匹配多行,因此这样做可能更安全:

# Change Row 6 so both Row 6 and Row 1 match Row 3
Set6 <- c("B", "B", "C", "A")

sets %>%
replace(is.na(sets), ".") %>%
do.call(paste0, .) %>%
outer(., ., function(x, y) mapply(grepl, x, y)) %>%
apply(2, which)

[[1]]
[1] 1 3

[[2]]
[1] 2 4

[[3]]
[1] 3

[[4]]
[1] 2 4

[[5]]
[1] 5

[[6]]
[1] 3 6

这表明哪一行与另一行(包括它本身)匹配。

关于r - 将匹配 NA 的 df 的字符行与所有内容进行比较,并根据比较创建新列或 df,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69325281/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com