gpt4 book ai didi

r - Data.table - 多个表的左外连接

转载 作者:行者123 更新时间:2023-12-04 21:51:37 25 4
gpt4 key购买 nike

假设你有这样的数据

fruits <- data.table(FruitID=c(1,2,3), Fruit=c("Apple", "Banana", "Strawberry"))
colors <- data.table(ColorID=c(1,2,3,4,5), FruitID=c(1,1,1,2,3), Color=c("Red","Yellow","Green","Yellow","Red"))
tastes <- data.table(TasteID=c(1,2,3), FruitID=c(1,1,3), Taste=c("Sweeet", "Sour", "Sweet"))

setkey(fruits, "FruitID")
setkey(colors, "ColorID")
setkey(tastes, "TasteID")

fruits
FruitID Fruit
1: 1 Apple
2: 2 Banana
3: 3 Strawberry

colors
ColorID FruitID Color
1: 1 1 Red
2: 2 1 Yellow
3: 3 1 Green
4: 4 2 Yellow
5: 5 3 Red

tastes
TasteID FruitID Taste
1: 1 1 Sweeet
2: 2 1 Sour
3: 3 3 Sweet

我通常需要对这样的数据执行左外连接。例如,“给我所有的水果和它们的颜色”需要我写(也许有更好的方法?)
setkey(colors, "FruitID")
result <- colors[fruits, allow.cartesian=TRUE]
setkey(colors, "ColorID")

这么简单又频繁的任务,三行代码显得多余,于是写了一个方法 myLeftJoin
myLeftJoin <- function(tbl1, tbl2){
# Performs a left join using the key in tbl1 (i.e. keeps all rows from tbl1 and only matching rows from tbl2)

oldkey <- key(tbl2)
setkeyv(tbl2, key(tbl1))
result <- tbl2[tbl1, allow.cartesian=TRUE]
setkeyv(tbl2, oldkey)
return(result)
}

我可以使用
myLeftJoin(fruits, colors)
ColorID FruitID Color Fruit
1: 1 1 Red Apple
2: 2 1 Yellow Apple
3: 3 1 Green Apple
4: 4 2 Yellow Banana
5: 5 3 Red Strawberry

我如何扩展这个方法,以便我可以将任意数量的表传递给它并获得所有这些表的链式左外连接?类似 myLeftJoin(tbl1, ...)
例如,我想要 myleftJoin(fruits, colors, tastes) 的结果相当于
setkey(colors, "FruitID")
setkey(tastes, "FruitID")
result <- tastes[colors[fruits, allow.cartesian=TRUE], allow.cartesian=TRUE]
setkey(tastes, "TasteID")
setkey(colors, "ColorID")

result
TasteID FruitID Taste ColorID Color Fruit
1: 1 1 Sweeet 1 Red Apple
2: 2 1 Sour 1 Red Apple
3: 1 1 Sweeet 2 Yellow Apple
4: 2 1 Sour 2 Yellow Apple
5: 1 1 Sweeet 3 Green Apple
6: 2 1 Sour 3 Green Apple
7: NA 2 NA 4 Yellow Banana
8: 3 3 Sweet 5 Red Strawberry

也许我错过了使用 data.table 包中的方法的优雅解决方案?谢谢

(编辑:修正了我的数据中的一个错误)

最佳答案

我刚刚在 data.table, v1.9.5 中提交了一个新功能,我们可以在不设置键的情况下加入(即直接指定要加入的列,而不必先使用 setkey()):

有了这个,这很简单:

require(data.table) # v1.9.5+
fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
# FruitID Fruit TasteID Taste ColorID Color
# 1: 1 Apple 1 Sweeet 1 Red
# 2: 1 Apple 2 Sour 1 Red
# 3: 1 Apple 1 Sweeet 2 Yellow
# 4: 1 Apple 2 Sour 2 Yellow
# 5: 1 Apple 1 Sweeet 3 Green
# 6: 1 Apple 2 Sour 3 Green
# 7: 2 NA NA NA 4 Yellow
# 8: 3 Strawberry 3 Sweet 5 Red

关于r - Data.table - 多个表的左外连接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31480615/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com