gpt4 book ai didi

r - 循环遍历 2 个数据框以识别公共(public)列

转载 作者:行者123 更新时间:2023-12-04 09:47:57 26 4
gpt4 key购买 nike

我这里有 2 个可重现的数据帧。我试图确定哪一列包含与另一列相似的值。我希望我的代码能够处理每一行并循环遍历 df2 中的每一列。我的代码在下面工作,但它需要微调以允许与同一列进行多次匹配。

df1 <- data.frame(fruit=c("Apple", "Orange", "Pear"), location = c("Japan", "China", "Nigeria"), price = c(32,53,12))
df2 <- data.frame(grocery = c("Durian", "Apple", "Watermelon"),
place=c("Korea", "Japan", "Malaysia"),
name = c("Mark", "John", "Tammy"),
favourite.food = c("Apple", "Wings", "Cakes"),
invoice = c("XD1", "XD2", "XD3"))

df <- sapply(names(df1), function(x) {
temp <- sapply(names(df2), function(y)
if(any(match(df1[[x]], df2[[y]], nomatch = FALSE))) y else NA)
ifelse(all(is.na(temp)), NA, temp[which.max(!is.na(temp))])
}
)

t1 <- data.frame(lapply(df, type.convert), stringsAsFactors=FALSE)
t1 <- data.frame(t(t1))
t1 <- cbind(newColName = rownames(t1), t1)
rownames(t1) <- 1:nrow(t1)
colnames(t1) <- c("Columns from df1", "Columns from df2")

df1
fruit location price
1 Apple Japan 32
2 Orange China 53
3 Pear Nigeria 12

df2
grocery place name favourite.food invoice
1 Durian Korea Mark Apple XD1
2 Apple Japan John Wings XD2
3 Watermelon Malaysia Tammy Cakes XD3

t1 #(OUTPUT FROM CODE ABOVE)

Columns from df1 Columns from df2
1 fruit grocery
2 location place
3 price <NA>

这是我希望获得的输出:

  Columns from df1    Columns from df2
1 fruit grocery, favourite.food
2 location place
3 price <NA>

Notice that the columns, "Grocery" and "favourite.food" both matches to the column "fruit", whereas my code only returns one column.

最佳答案

我们可以更改代码以返回所有匹配项,并使用 toString

将它们包装在一个字符串中
vec <- sapply(names(df1), function(x) {
temp <- sapply(names(df2), function(y)
if(any(match(df1[[x]], df2[[y]], nomatch = FALSE))) y else NA)
ifelse(all(is.na(temp)), NA, toString(temp[!is.na(temp)]))
}
)

vec

# fruit location price
#"grocery, favourite.food" "place" NA

要将其转换为dataframe,我们可以这样做

data.frame(columns_from_df1 = names(vec), columns_from_df2 = vec, row.names = NULL)

# columns_from_df1 columns_from_df2
#1 fruit grocery, favourite.food
#2 location place
#3 price <NA>

关于r - 循环遍历 2 个数据框以识别公共(public)列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54191788/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com