gpt4 book ai didi

r - data.table 查找值并翻译

转载 作者:行者123 更新时间:2023-12-04 06:41:32 25 4
gpt4 key购买 nike

像许多人一样,我是 R 新手。我有一个大型数据集(500M+ 行),我已将其写入 data.table logStats,其中包含如下数据:

 head(logStats,15)

time pid mean
1: 2014-03-10 00:00:00 998 3.570000
2: 2014-03-10 00:00:00 11 4.090000
3: 2014-03-10 00:00:00 345 3.380000
4: 2014-03-10 00:05:00 998 4.866667
5: 2014-03-10 00:05:00 11 3.677778
6: 2014-03-10 00:05:00 345 4.487500
7: 2014-03-10 00:10:00 345 4.833333
8: 2014-03-10 00:10:00 998 4.333333
9: 2014-03-10 00:10:00 11 6.977778
10: 2014-03-10 00:15:00 345 3.900000
11: 2014-03-10 00:15:00 998 3.200000
12: 2014-03-10 00:15:00 11 6.030000
13: 2014-03-10 00:20:00 998 4.550000
14: 2014-03-10 00:20:00 11 4.030000
15: 2014-03-10 00:20:00 345 6.060000

还有第二个非常小的 data.table(360 行),它有两列将“pid”值解码为更易于阅读的内容。 'pid' 值可以是数字或字符。

例如:

pidLookupTable<-data.table(pid=c(998,11,345),pidName=c("Apple","Bannana","Cinnamon"))

产生:

   pid  pidName
1: 998 Apple
2: 11 Bannana
3: 345 Cinnamon

我希望一个表达式能够向 data.table logStats 添加一列,该列具有该行 pidpidName

我应该得到类似的东西:

                   time pid     mean pidNames
1: 2014-03-10 00:00:00 998 3.570000 Apple
2: 2014-03-10 00:00:00 11 4.090000 Banana
3: 2014-03-10 00:00:00 345 3.380000 Cinnamon
4: 2014-03-10 00:05:00 998 4.866667 Apple
5: 2014-03-10 00:05:00 11 3.677778 Banana
6: 2014-03-10 00:05:00 345 4.487500 Cinnamon
7: 2014-03-10 00:10:00 345 4.833333 Cinnamon
8: 2014-03-10 00:10:00 998 4.333333 Apple
9: 2014-03-10 00:10:00 11 6.977778 Banana
10: 2014-03-10 00:15:00 345 3.900000 Cinnamon
11: 2014-03-10 00:15:00 998 3.200000 Apple
12: 2014-03-10 00:15:00 11 6.030000 Banana
13: 2014-03-10 00:20:00 998 4.550000 Apple
14: 2014-03-10 00:20:00 11 4.030000 Banana
15: 2014-03-10 00:20:00 345 6.060000 Cinnamon

我写了一个函数:

pidNameLookup<-function(x) { 
return(pidLookupTable[pidLookupTable$pid==x,name])
}

然后跑:

logStats[,pidName:=pidNameLookup(pid)]

但这只会将前 3 个 puts NA 转换为其余值:

   logStats[1:1000]
date time pid value timestamp mean pidName
1: 10-03-2014 00:00:12 998 5.5 2014-03-10 00:00:12 3.57 Apple
2: 10-03-2014 00:00:17 11 2.1 2014-03-10 00:00:17 4.09 Bannana
3: 10-03-2014 00:00:22 345 5.7 2014-03-10 00:00:22 3.38 Cinnamon
4: 10-03-2014 00:00:47 998 1.0 2014-03-10 00:00:47 3.57 NA
5: 10-03-2014 00:00:55 11 0.3 2014-03-10 00:00:55 4.09 NA
---
996: 10-03-2014 02:49:37 345 0.7 2014-03-10 02:49:37 5.30 NA
997: 10-03-2014 02:50:01 998 9.9 2014-03-10 02:50:01 5.30 NA
998: 10-03-2014 02:50:08 11 7.0 2014-03-10 02:50:08 7.00 NA
999: 10-03-2014 02:50:18 345 2.4 2014-03-10 02:50:18 2.40 NA
1000: 10-03-2014 02:50:48 998 0.7 2014-03-10 02:50:48 5.30 NA

并给我警告信息:

Warning message:
In pidLookupTable$pid == x
longer object length is not a multiple of shorter object length

警告信息和不正确的结果意味着我做错了什么。

救命!!这让我精神崩溃

最佳答案

我建议你看一下 data.table (vignette("datatable-intro")) 的介绍小插图,因为这是 data.table 是明确构建的。

这将为您提供您想要的,并且应该更快:

setkey(logStats, "pid")
setkey(pidLookupTable, "pid")
logStats[pidLookupTable]

关于r - data.table 查找值并翻译,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22396652/

25 4 0