gpt4 book ai didi

scala - 按键触发多个rdd项

转载 作者:行者123 更新时间:2023-12-04 17:08:13 25 4
gpt4 key购买 nike

我有rdd项目,例如:

(3922774869,10,1)
(3922774869,11,1)
(3922774869,12,2)
(3922774869,13,2)
(1779744180,10,1)
(1779744180,11,1)
(3922774869,14,3)
(3922774869,15,2)
(1779744180,16,1)
(3922774869,12,1)
(3922774869,13,1)
(1779744180,14,1)
(1779744180,15,1)
(1779744180,16,1)
(3922774869,14,2)
(3922774869,15,1)
(1779744180,16,1)
(1779744180,17,1)
(3922774869,16,4)
...

代表 (id, age, count),我想对这些行进行分组以生成一个数据集,为此,每一行都代表每个id的年龄分布,如下所示( (id, age)是uniq):
(1779744180, (10,1), (11,1), (12,2), (13,2) ...)
(3922774869, (10,1), (11,1), (12,3), (13,4) ...)

这是 (id, (age, count), (age, count) ...)
有人可以给我一个提示吗?

最佳答案

您可以先按两个字段进行归约,然后使用groupBy:

rdd
.map { case (id, age, count) => ((id, age), count) }.reduceByKey(_ + _)
.map { case ((id, age), count) => (id, (age, count)) }.groupByKey()

对于上面的输入,它将返回 RDD[(Long, Iterable[(Int, Int)])],其中将包含以下两个记录:
(1779744180,CompactBuffer((16,3), (15,1), (14,1), (11,1), (10,1), (17,1)))
(3922774869,CompactBuffer((11,1), (12,3), (16,4), (13,3), (15,3), (10,1), (14,5)))

关于scala - 按键触发多个rdd项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36447057/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com