作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我是 R 和 data.table
的新手,我觉得它有用且快速。我正在尝试加入 2 个数据表:
> TotFreq
Legacy_Store_Number WeekDay Date Item_Key Distr NoSellingDays meanUnits ItemType
1: 113802 1 2013-03-24 000000000120 2.428985e-04 0 8.00 FM
2: 113802 1 2013-03-24 000000000126 1.104030e-03 0 47.50 FM
3: 113802 1 2013-03-24 000000000170 1.126004e-03 0 48.75 FM
4: 113802 1 2013-03-24 000000000180 5.143034e-04 0 19.00 FM
5: 113802 1 2013-03-24 000000000260 3.854306e-04 0 12.25 FM
160167: 113802 7 2013-03-23 978125002327 5.902655e-07 27 1.00 SM
160168: 113802 7 2013-03-23 978141970584 1.770796e-06 25 1.00 SM
160169: 113802 7 2013-03-23 978145300697 1.180531e-06 26 1.00 SM
160170: 113802 7 2013-03-23 978145552558 5.902655e-07 27 1.00 SM
160171: 113802 7 2013-03-23 978160139536 5.902655e-07 27 1.00 SM
> Count_SM_FM
Legacy_Store_Number WeekDay ItemType ObjItems
1: 113802 1 SM 12305
2: 113802 1 FM 1942
3: 113802 2 SM 11014
4: 113802 2 FM 1398
5: 113802 3 SM 10154
6: 113802 3 FM 1117
7: 113802 4 SM 10414
8: 113802 4 FM 1167
9: 113802 5 SM 10258
10: 113802 5 FM 1200
11: 113802 6 SM 11116
12: 113802 6 FM 1575
13: 113802 7 SM 13098
14: 113802 7 FM 2326
> setkey(TotFreq,Legacy_Store_Number,WeekDay,ItemType)
>
> ResultJoin <- TotFreq[Count_SM_FM]
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), :
Join results in 320342 rows; more than 160171 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
i
中没有重复的键!
> ResultJoin <- TotFreq[Count_SM_FM,allow.cartesian=T]
>
> ResultJoin
Legacy_Store_Number WeekDay Date Item_Key Distr NoSellingDays meanUnits ItemType ItemType.1 ObjItems
1: 113802 1 2013-03-24 000000000120 2.428985e-04 0 8.00 FM SM 12305
2: 113802 1 2013-03-24 000000000126 1.104030e-03 0 47.50 FM SM 12305
3: 113802 1 2013-03-24 000000000170 1.126004e-03 0 48.75 FM SM 12305
4: 113802 1 2013-03-24 000000000180 5.143034e-04 0 19.00 FM SM 12305
5: 113802 1 2013-03-24 000000000260 3.854306e-04 0 12.25 FM SM 12305
---
320338: 113802 7 2013-03-23 978125002327 5.902655e-07 27 1.00 SM FM 2326
320339: 113802 7 2013-03-23 978141970584 1.770796e-06 25 1.00 SM FM 2326
320340: 113802 7 2013-03-23 978145300697 1.180531e-06 26 1.00 SM FM 2326
320341: 113802 7 2013-03-23 978145552558 5.902655e-07 27 1.00 SM FM 2326
320342: 113802 7 2013-03-23 978160139536 5.902655e-07 27 1.00 SM FM 2326
TotFreq
table 。如果我也在
Count_SM_FM
上添加一个 key 加入工作:
> setkey(TotFreq,Legacy_Store_Number,WeekDay,ItemType)
> setkey(Count_SM_FM,Legacy_Store_Number,WeekDay,ItemType)
> ResultJoin <- TotFreq[Count_SM_FM]
>
> ResultJoin
Legacy_Store_Number WeekDay ItemType Date Item_Key Distr NoSellingDays meanUnits ObjItems
1: 113802 1 FM 2013-03-24 000000000120 2.428985e-04 0 8.00 1942
2: 113802 1 FM 2013-03-24 000000000126 1.104030e-03 0 47.50 1942
3: 113802 1 FM 2013-03-24 000000000170 1.126004e-03 0 48.75 1942
4: 113802 1 FM 2013-03-24 000000000180 5.143034e-04 0 19.00 1942
5: 113802 1 FM 2013-03-24 000000000260 3.854306e-04 0 12.25 1942
---
160167: 113802 7 SM 2013-03-23 978125002327 5.902655e-07 27 1.00 13098
160168: 113802 7 SM 2013-03-23 978141970584 1.770796e-06 25 1.00 13098
160169: 113802 7 SM 2013-03-23 978145300697 1.180531e-06 26 1.00 13098
160170: 113802 7 SM 2013-03-23 978145552558 5.902655e-07 27 1.00 13098
160171: 113802 7 SM 2013-03-23 978160139536 5.902655e-07 27 1.00 13098
TotFreq
的第一列。或者没有
Count_SM_FM
未排序但我无法重现错误
> daysType <- data.table(
+ key1=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1),
+ key2=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7),
+ key3=c("b","a","a","b","a","b","a","b","a","b","a","b","a","b"),
+ var1=c(2,4,6,8,4,5,7,3,7,9,6,3,5,6)
+ )
>
>
> detailData <- data.table(
+ key1=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
+ key2=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7,7,7),
+ var2=c(10,11,12,13,15,16,17,10,11,12,13,14,15,16,10,11,12,15,16,17,10,11,12,13,14,15,16,17,10,11,13,14,15,16,17,10,11,12,13,14,15,10,11,12,13,14,15,16,17),
+ var3=c(1,2,4,6,6,7,3,6,8,9,3,5,7,8,6,7,8,6,7,2,4,6,7,8,2,3,5,7,4,7,8,3,6,4,2,5,7,3,6,7,3,4,2,4,6,4,7,2,9),
+ key3=c("a","a","a","a","b","b","b","a","a","a","a","b","b","b","a","a","a","b","b","b","a","a","a","a","b","b","b","b","a","a","a","b","b","b","b","a","a","a","a","b","b","a","a","a","a","b","b","b","b")
+ )
>
> setkey(detailData,key1,key2,key3)
> JoinResult <- detailData[daysType]
Join of two data.tables fails
allow.cartesian
解决了这个问题。
Count_SM_FM
解决了吗?
最佳答案
2014 年 10 月更新: Arun 在 v1.9.5 中修复了它:
allow.cartesian
is now ignored wheni
has no duplicates, #742 and #508. Thanks to @nigmastar, @user3645882 and others for the reports.
allow.cartesian
部分。错误消息可能应该更改以指出即使您在
i
中没有重复项,您也可以获得大尺寸。 ,但您在左侧有重复项
data.table
.这是一个简单的例子:
dt1 = data.table(a = c(1,1), b = 1:2, key = 'a')
dt2 = data.table(a = c(1,2), c = 3:4)
dt1[dt2] # this gives an error, because join results in 3 rows, as seen below
dt1[dt2, allow.cartesian = TRUE]
# a b c
#1: 1 1 3
#2: 1 2 3
#3: 2 NA 4
i
设置 key ,它只会假设前几列是键。查看您的第一个连接结果,您可以看到它是
不是 加入
ItemType
并且您使用的是旧版
data.table
版本(我使用的是 1.9.3)。所以我的猜测是你实际上没有正确设置 key 并且没有包含
ItemType
或者从那时起修复了旧版本中的一些错误。
关于r - data.table join(vecseq 中的错误)是 X 和 i 上都需要的关键?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23809517/
我刚刚尝试在具有 64G 内存的计算机上合并 R 3.0.1 中的两个表,并收到以下错误。如有帮助,将不胜感激。 (data.table版本为1.8.8) 这是我的代码: library(parall
我是 R 和 data.table 的新手,我觉得它有用且快速。我正在尝试加入 2 个数据表: > TotFreq Legacy_Store_Number WeekDay
我是一名优秀的程序员,十分优秀!