gpt4 book ai didi

r data.table ( <= 1.9.4) 连接行为

转载 作者:行者123 更新时间:2023-12-02 01:36:15 31 4
gpt4 key购买 nike

一段时间后我又开始使用 r 和 data.table,但我仍然对连接有疑问。我之前问过 this question得到了令人满意的解释,但我仍然没有真正理解逻辑。让我们考虑几个例子:

library("data.table")
X <- data.table(chiave=c("a", "a", "a", "b", "b"),valore1=1:5)
Y <- data.table(chiave=c("a", "b", "c", "d"),valore2=1:4)
X
chiave valore1
1: a 1
2: a 2
3: a 3
4: b 4
5: b 5
Y
chiave valore2
1: a 1
2: b 2
3: c 3
4: d 4

当我加入时出现错误:

 setkey(X,chiave)
X[Y]
# Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x), :
Join results in 7 rows; more than 5 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

所以:

 X[Y,allow.cartesian=T]
chiave valore1 valore2
1: a 1 1
2: a 2 1
3: a 3 1
4: b 4 2
5: b 5 2
6: c NA 3
7: d NA 4

请注意 X 有重复键而 i 没有。如果我将 Y 更改为:

 Y <- data.table(chiave=c("b", "c", "d"),valore2=1:3)
Y
chiave valore2
1: b 1
2: c 2
3: d 3

连接完成时没有错误消息,也不需要 allow.cartesian,但逻辑上情况是一样的:X 有多个键而 i 没有

 X[Y]
chiave valore1 valore2
1: b 4 1
2: b 5 1
3: c NA 2
4: d NA 3

另一方面:

 X <- data.table(chiave=c("a", "a", "a", "a", "a", "a", "b", "b"),valore1=1:8)
Y <- data.table(chiave=c("b", "b", "d"),valore2=1:3)
X
chiave valore1
1: a 1
2: a 2
3: a 3
4: a 4
5: a 5
6: a 6
7: b 7
8: b 8
Y
chiave valore2
1: b 1
2: b 2
3: d 3

我在 Xi 中都有多个键,但是连接(和笛卡尔积)已经完成,没有错误消息,也不需要 allow .笛卡尔

 setkey(X,chiave)
X[Y]
chiave valore1 valore2
1: b 7 1
2: b 8 1
3: b 7 2
4: b 8 2
5: d NA 3

从我的角度来看,当且仅当我在 X 和 i 中都有多个键时(不仅仅是结果表的行数超过 max(nrow(x),nrow( i))) 并且只有在这种情况下我才看到 allow.cartesian 的需要(所以在我的前两个示例中没有)。

最佳答案

只是为了回答这个问题,allow.cartesian 的这种行为已在当前开发版本 v1.9.5 中得到修复,并将很快在 CRAN 上作为 v1.9.6。奇怪的版本是开发的,甚至是稳定的。来自 NEWS :

  1. allow.cartesian is ignored during joins when:

    • i has no duplicates and mult="all". Closes #742. Thanks to @nigmastar for the report.
    • assigning by reference, i.e., j has :=. Closes #800. Thanks to @matthieugomez for the report.

    In both these cases (and during a not-join which was already fixed in 1.9.4), allow.cartesian can be safely ignored.

关于r data.table ( <= 1.9.4) 连接行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31052933/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com