gpt4 book ai didi

python - PySpark 减少按键?添加键/元组

转载 作者:太空狗 更新时间:2023-10-29 21:31:21 25 4
gpt4 key购买 nike

我有以下数据,我要做的是

[(13, 'D'), (14, 'T'), (32, '6'), (45, 'T'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'T'), (53, '2'), (54, '0'), (13, 'A'), (14, 'T'), (32, '6'), (45, 'A'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'X')]

是为每个键计算值的实例(一个 1 字符串字符)。所以我先做了一张 map :

.map(lambda x: (x[0], [x[1], 1]))

现在将其作为键/元组:

[(13, ['D', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['T', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['T', 1]), (53, ['2', 1]), (54, ['0', 1]), (13, ['A', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['A', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['X', 1])]

我只是无法在最后一部分弄清楚如何为每个键计算该字母的实例。例如键 13 将有 1 个 D 和 1 个 A。而 14 将有 2 个 T,等等。

最佳答案

我更熟悉 Scala 中的 Spark,因此可能有比 Counter 更好的方法来计算 groupByKey 生成的可迭代对象中的字符数,但这里有一个选项:

from collections import Counter

rdd = sc.parallelize([(13, 'D'), (14, 'T'), (32, '6'), (45, 'T'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'T'), (53, '2'), (54, '0'), (13, 'A'), (14, 'T'), (32, '6'), (45, 'A'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'X')])
rdd.groupByKey().mapValues(Counter).collect()

[(48, Counter({'0': 2})),
(32, Counter({'6': 2})),
(49, Counter({'2': 2})),
(50, Counter({'0': 2})),
(51, Counter({'X': 1, 'T': 1})),
(53, Counter({'2': 1})),
(13, Counter({'A': 1, 'D': 1})),
(45, Counter({'A': 1, 'T': 1})),
(14, Counter({'T': 2})),
(54, Counter({'0': 1})),
(47, Counter({'2': 2}))]

关于python - PySpark 减少按键?添加键/元组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29833576/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com