gpt4 book ai didi

r - 加入多个条件时的奇怪行为

转载 作者:行者123 更新时间:2023-12-04 14:26:15 25 4
gpt4 key购买 nike

answering this question关于滚动连接与 data.table包,我在使用多个条件时遇到了一些奇怪的行为。

考虑以下数据集:

dt <- data.table(t_id = c(1,4,2,3,5), place = c("a","a","d","a","d"), num = c(5.1, 5.1, 6.2, 5.1, 6.2), key=c("place"))
dt_lu <- data.table(f_id = c(rep(1,4),rep(2,3)), place = c("a","b","c","d","a","d","a"), num = c(6,7,8,9,6,7,8), key=c("place"))

当我想加入 dtdt_lu只有 dt_lu 的那些情况具有相同的 place哪里 dt_lu$num高于 dt$num如下:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num],
fid = f_id),
by = .EACHI]

我得到了想要的结果:
    place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 1 5.1 8 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
6: a 4 5.1 8 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
9: a 3 5.1 8 2
10: d 2 6.2 9 1
11: d 2 6.2 7 2
12: d 5 6.2 9 1
13: d 5 6.2 7 2

当我想添加一个附加条件时,我可以通过如下链接附加条件来轻松获得所需的结果:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num],
fid = f_id),
by = .EACHI][fnum - tnum < 2]

这给了我:
   place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 4 5.1 6 1
4: a 4 5.1 6 2
5: a 3 5.1 6 1
6: a 3 5.1 6 2
7: d 2 6.2 7 2
8: d 5 6.2 7 2

但是,当我添加额外条件时(即:差异必须小于 2 )如下:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num & num - i.num < 2],
fid = f_id),
by = .EACHI]

我没有得到预期的结果:
    place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 1 5.1 6 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
6: a 4 5.1 6 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
9: a 3 5.1 6 2
10: d 2 6.2 7 1
11: d 2 6.2 7 2
12: d 5 6.2 7 1
13: d 5 6.2 7 2

此外,我收到以下警告消息:

Warning message: In [.data.table(dt_lu, dt, list(tid = i.t_id, tnum = i.num, fnum = num[i.num < : Column 3 of result for group 1 is length 2 but the longest column in this result is 3. Recycled leaving remainder of 1 items. This warning is once only for the first group with this issue.



预期的结果是:
    place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
11: d 2 6.2 7 2
13: d 5 6.2 7 2

我特意保留了第一个示例中的行号,以显示最终结果中必须保留哪些行(与工作解决方案相同)。

this answer显示,应该可以在连接操作中使用多个条件。

我尝试了以下替代方法,但它们都不起作用:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[(i.num < num) & (num - i.num < 2)],
fid = f_id),
by = .EACHI]

dt_lu[dt, {
val = num[(i.num < num) & (num - i.num < 2)];
list(tid = i.t_id,
tnum = i.num,
fnum = val,
fid = f_id)},
by = .EACHI]

有人可以解释我为什么在连接操作中使用多个条件没有得到想要的结果吗?

最佳答案

警告消息泄露了问题。另外,使用 print()在这里很有帮助。

dt_lu[dt, print(i.num < num & num - i.num < 2), by=.EACHI]
# [1] TRUE TRUE FALSE
# [1] TRUE TRUE FALSE
# [1] TRUE TRUE FALSE
# [1] FALSE TRUE
# [1] FALSE TRUE
# Empty data.table (0 rows) of 3 cols: place,place,num

考虑条件评估为 TRUE, TRUE, FALSE 的第一种情况.该组有 3 个观察值。还有您的 j-expression包含:
.(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num & num - i.num < 2],
fid = f_id)
i.t_idi.num长度为 1(因为它们来自 dt )。但是 num[..condn..]将返回长度 = 2,而 f_id将返回长度 = 3。长度 = 1 和长度 = 2 的项目都将被回收到最长项目/向量 = 3 的长度。这会导致错误的结果。由于 3 不能完全被 2 整除,因此它返回警告。

你打算做的是:
.(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num & num - i.num < 2],
fid = f_id[i.num < num & num - i.num < 2])

或等效地:
{  
idx = i.num < num & num - i.num < 2
.(tid = i.t_id, tnum = i.num, fnum = num[idx], fid = f_id[idx])
}

把它放在一起:
dt_lu[dt, 
{
idx = i.num < num & num - i.num < 2
.(tid = i.t_id, tnum = i.num, fnum = num[idx], fid = f_id[idx])
},
by = .EACHI]
# place tid tnum fnum fid
# 1: a 1 5.1 6 1
# 2: a 1 5.1 6 2
# 3: a 4 5.1 6 1
# 4: a 4 5.1 6 2
# 5: a 3 5.1 6 1
# 6: a 3 5.1 6 2
# 7: d 2 6.2 7 2
# 8: d 5 6.2 7 2

关于r - 加入多个条件时的奇怪行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32034222/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com