gpt4 book ai didi

r - data.table join(vecseq 中的错误)是 X 和 i 上都需要的关键?

转载 作者:行者123 更新时间:2023-12-04 08:34:55 26 4
gpt4 key购买 nike

我是 R 和 data.table 的新手,我觉得它有用且快速。我正在尝试加入 2 个数据表:

> TotFreq
Legacy_Store_Number WeekDay Date Item_Key Distr NoSellingDays meanUnits ItemType
1: 113802 1 2013-03-24 000000000120 2.428985e-04 0 8.00 FM
2: 113802 1 2013-03-24 000000000126 1.104030e-03 0 47.50 FM
3: 113802 1 2013-03-24 000000000170 1.126004e-03 0 48.75 FM
4: 113802 1 2013-03-24 000000000180 5.143034e-04 0 19.00 FM
5: 113802 1 2013-03-24 000000000260 3.854306e-04 0 12.25 FM
160167: 113802 7 2013-03-23 978125002327 5.902655e-07 27 1.00 SM
160168: 113802 7 2013-03-23 978141970584 1.770796e-06 25 1.00 SM
160169: 113802 7 2013-03-23 978145300697 1.180531e-06 26 1.00 SM
160170: 113802 7 2013-03-23 978145552558 5.902655e-07 27 1.00 SM
160171: 113802 7 2013-03-23 978160139536 5.902655e-07 27 1.00 SM

> Count_SM_FM
Legacy_Store_Number WeekDay ItemType ObjItems
1: 113802 1 SM 12305
2: 113802 1 FM 1942
3: 113802 2 SM 11014
4: 113802 2 FM 1398
5: 113802 3 SM 10154
6: 113802 3 FM 1117
7: 113802 4 SM 10414
8: 113802 4 FM 1167
9: 113802 5 SM 10258
10: 113802 5 FM 1200
11: 113802 6 SM 11116
12: 113802 6 FM 1575
13: 113802 7 SM 13098
14: 113802 7 FM 2326
> setkey(TotFreq,Legacy_Store_Number,WeekDay,ItemType)
>
> ResultJoin <- TotFreq[Count_SM_FM]
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), :
Join results in 320342 rows; more than 160171 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

但是我在 i 中没有重复的键!

使用:
> ResultJoin <- TotFreq[Count_SM_FM,allow.cartesian=T]
>
> ResultJoin
Legacy_Store_Number WeekDay Date Item_Key Distr NoSellingDays meanUnits ItemType ItemType.1 ObjItems
1: 113802 1 2013-03-24 000000000120 2.428985e-04 0 8.00 FM SM 12305
2: 113802 1 2013-03-24 000000000126 1.104030e-03 0 47.50 FM SM 12305
3: 113802 1 2013-03-24 000000000170 1.126004e-03 0 48.75 FM SM 12305
4: 113802 1 2013-03-24 000000000180 5.143034e-04 0 19.00 FM SM 12305
5: 113802 1 2013-03-24 000000000260 3.854306e-04 0 12.25 FM SM 12305
---
320338: 113802 7 2013-03-23 978125002327 5.902655e-07 27 1.00 SM FM 2326
320339: 113802 7 2013-03-23 978141970584 1.770796e-06 25 1.00 SM FM 2326
320340: 113802 7 2013-03-23 978145300697 1.180531e-06 26 1.00 SM FM 2326
320341: 113802 7 2013-03-23 978145552558 5.902655e-07 27 1.00 SM FM 2326
320342: 113802 7 2013-03-23 978160139536 5.902655e-07 27 1.00 SM FM 2326

我得到的记录实际上是我原来的记录的两倍 TotFreq table 。如果我也在 Count_SM_FM 上添加一个 key 加入工作:
> setkey(TotFreq,Legacy_Store_Number,WeekDay,ItemType)
> setkey(Count_SM_FM,Legacy_Store_Number,WeekDay,ItemType)
> ResultJoin <- TotFreq[Count_SM_FM]
>
> ResultJoin
Legacy_Store_Number WeekDay ItemType Date Item_Key Distr NoSellingDays meanUnits ObjItems
1: 113802 1 FM 2013-03-24 000000000120 2.428985e-04 0 8.00 1942
2: 113802 1 FM 2013-03-24 000000000126 1.104030e-03 0 47.50 1942
3: 113802 1 FM 2013-03-24 000000000170 1.126004e-03 0 48.75 1942
4: 113802 1 FM 2013-03-24 000000000180 5.143034e-04 0 19.00 1942
5: 113802 1 FM 2013-03-24 000000000260 3.854306e-04 0 12.25 1942
---
160167: 113802 7 SM 2013-03-23 978125002327 5.902655e-07 27 1.00 13098
160168: 113802 7 SM 2013-03-23 978141970584 1.770796e-06 25 1.00 13098
160169: 113802 7 SM 2013-03-23 978145300697 1.180531e-06 26 1.00 13098
160170: 113802 7 SM 2013-03-23 978145552558 5.902655e-07 27 1.00 13098
160171: 113802 7 SM 2013-03-23 978160139536 5.902655e-07 27 1.00 13098

我试图用一个例子来验证,也许问题在于没有将关键变量作为 TotFreq 的第一列。或者没有 Count_SM_FM未排序但我无法重现错误
> daysType <- data.table(
+ key1=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1),
+ key2=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7),
+ key3=c("b","a","a","b","a","b","a","b","a","b","a","b","a","b"),
+ var1=c(2,4,6,8,4,5,7,3,7,9,6,3,5,6)
+ )
>
>
> detailData <- data.table(
+ key1=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
+ key2=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7,7,7),
+ var2=c(10,11,12,13,15,16,17,10,11,12,13,14,15,16,10,11,12,15,16,17,10,11,12,13,14,15,16,17,10,11,13,14,15,16,17,10,11,12,13,14,15,10,11,12,13,14,15,16,17),
+ var3=c(1,2,4,6,6,7,3,6,8,9,3,5,7,8,6,7,8,6,7,2,4,6,7,8,2,3,5,7,4,7,8,3,6,4,2,5,7,3,6,7,3,4,2,4,6,4,7,2,9),
+ key3=c("a","a","a","a","b","b","b","a","a","a","a","b","b","b","a","a","a","b","b","b","a","a","a","a","b","b","b","b","a","a","a","b","b","b","b","a","a","a","a","b","b","a","a","a","a","b","b","b","b")
+ )
>
> setkey(detailData,key1,key2,key3)
> JoinResult <- detailData[daysType]

问题与问题不同

Join of two data.tables fails



因为那里有 allow.cartesian解决了这个问题。

这里有什么问题?为什么要将 key 添加到 Count_SM_FM解决了吗?

谢谢!

最佳答案

2014 年 10 月更新: Arun 在 v1.9.5 中修复了它:

allow.cartesian is now ignored when i has no duplicates, #742 and #508. Thanks to @nigmastar, @user3645882 and others for the reports.





上一个回答...

首先让我们解决 allow.cartesian部分。错误消息可能应该更改以指出即使您在 i 中没有重复项,您也可以获得大尺寸。 ,但您在左侧有重复项 data.table .这是一个简单的例子:
dt1 = data.table(a = c(1,1), b = 1:2, key = 'a')
dt2 = data.table(a = c(1,2), c = 3:4)

dt1[dt2] # this gives an error, because join results in 3 rows, as seen below

dt1[dt2, allow.cartesian = TRUE]
# a b c
#1: 1 1 3
#2: 1 2 3
#3: 2 NA 4

现在就设置 key 而言 - 不,您不需要为 i 设置 key ,它只会假设前几列是键。查看您的第一个连接结果,您可以看到它是 不是 加入 ItemType并且您使用的是旧版 data.table版本(我使用的是 1.9.3)。所以我的猜测是你实际上没有正确设置 key 并且没有包含 ItemType或者从那时起修复了旧版本中的一些错误。

关于r - data.table join(vecseq 中的错误)是 X 和 i 上都需要的关键?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23809517/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com