gpt4 book ai didi

R - 减少合并和超过 2 个后缀(或 : how to merge multiple dataframes and keep track of columns)

转载 作者:行者123 更新时间:2023-12-04 16:05:49 30 4
gpt4 key购买 nike

我正在尝试基于 2 列合并 4 个数据帧,但要跟踪列源自哪个数据帧。我在跟踪列时遇到了问题。

(见 dput(dfs) 帖子的结尾)

#df example (df1)
Name Color Freq
banana yellow 3
apple red 1
apple green 4
plum purple 8


#create list of dataframes
list.df <- list(df1, df2, df3, df4)

#merge dfs on column "Name" and "Color"
combo.df <- Reduce(function(x,y) merge(x,y, by = c("Name", "Color"), all = TRUE, accumulate=FALSE, suffixes = c(".df1", ".df2", ".df3", ".df4")), list.df)

这给出了以下警告:

Warning message: In merge.data.frame(x, y, by = c("Name", "Color"), all = TRUE, : column names ‘Freq.df1’, ‘Freq.df2’ are duplicated in the result



并输出此数据帧:
#combo df example
Name Color Freq.df1 Freq.df2 Freq.df1 Freq.df2
banana yellow 3 3 7 NA
apple red 1 2 9 1
apple green 4 NA 8 2
plum purple 8 1 NA 6
df1df2只是名字重复。填充 combo 的第三和第四列的值实际上来自 df3df4分别。

我真正想要的是:
Name    Color    Freq.df1   Freq.df2  Freq.df3  Freq.df4
banana yellow 3 3 7 NA
apple red 1 2 9 1
apple green 4 NA 8 2
plum purple 8 1 NA 6

我怎样才能做到这一点?我知道 merge(..., suffixes) function 只能处理 2 的字符向量,但我不知道解决方法应该是什么。谢谢!
df1 <- 
structure(list(Name = structure(c(2L, 1L, 1L, 3L), .Label = c("apple",
"banana", "plum"), class = "factor"), Color = structure(c(4L,
3L, 1L, 2L), .Label = c("green", "purple", "red", "yellow"), class = "factor"),
Freq = c(3, 1, 4, 8)), .Names = c("Name", "Color", "Freq"
), row.names = c(NA, -4L), class = "data.frame")

df2 <-
structure(list(Name = structure(c(2L, 1L, 3L), .Label = c("apple",
"banana", "plum"), class = "factor"), Color = structure(c(3L,
2L, 1L), .Label = c("purple", "red", "yellow"), class = "factor"),
Freq = c(3, 2, 1)), .Names = c("Name", "Color", "Freq"), row.names = c(NA,
-3L), class = "data.frame")

df3 <-
structure(list(Name = structure(c(2L, 1L, 1L), .Label = c("apple",
"banana"), class = "factor"), Color = structure(c(3L, 2L, 1L), .Label = c("green",
"red", "yellow"), class = "factor"), Freq = c(7, 9, 8)), .Names = c("Name",
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")

df4 <-
structure(list(Name = structure(c(1L, 1L, 2L), .Label = c("apple",
"plum"), class = "factor"), Color = structure(c(3L, 1L, 2L), .Label = c("green",
"purple", "red"), class = "factor"), Freq = c(1, 2, 6)), .Names = c("Name",
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")

最佳答案

这似乎更容易使用 for循环为 Reducereduce ( purrr ) 一次只需要两个数据集,所以我们不能有超过两个 suffixesmerge .

在这里,我们创建了一个后缀向量 ('sfx')。使用第一个 list 初始化输出数据集元素。然后遍历'list.df'的序列并执行顺序merge 'res' 和 list.df 的下一个元素在每一步更新“res”时

sfx <- c(".df1", ".df2", ".df3", ".df4")
res <- list.df[[1]]
for(i in head(seq_along(list.df), -1)) {

res <- merge(res, list.df[[i+1]], all = TRUE,
suffixes = sfx[i:(i+1)], by = c("Name", "Color"))
}

res
# Name Color Freq.df1 Freq.df2 Freq.df3 Freq.df4
#1 apple green 4 NA 8 2
#2 apple red 1 2 9 1
#3 banana yellow 3 3 7 NA
#4 plum purple 8 1 NA 6

关于R - 减少合并和超过 2 个后缀(或 : how to merge multiple dataframes and keep track of columns),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48799959/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com