gpt4 book ai didi

python - PySpark reducebykey 与字典

转载 作者:太空宇宙 更新时间:2023-11-04 08:36:52 30 4
gpt4 key购买 nike

为什么 Spark 强制从元组列表构建 RDD,以防进行 reducebykey 转换?

reduce_rdd = sc.parallelize([{'k1': 1}, {'k2': 2}, {'k1': -2}, {'k3': 4}, {'k2': -5}, {'k1': 4}])
print(reduce_rdd.reduceByKey(lambda x, y: x + y).take(100))

错误:

for k, v in iterator:
ValueError: need more than 1 value to unpack

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

如果 reduceByKey() 旨在处理一组键值对,那么对我来说很明显,每一对都应该驻留在用于键值对的 Python 对象类型中,字典而不是元组。

最佳答案

reducebykey 适用于 Pair RDD。 Pair RDDs 实际上是元组列表的分布式版本。由于这些数据结构可以轻松分区,因此它们是键值数据分布式计算的自然选择。

有些项目实现了 IndexedRDD,但在撰写本文时,这些项目尚未集成到 spark-core 代码中。如果您有兴趣,可以从这个 Github 安装一个 PySpark 版本的 IndexedRDD。存储库。

回到您的问题,无需 IndexedRDD 即可轻松解决:

reduce_rdd = sc.parallelize([{'k1': 1}, {'k2': 2}, {'k1': -2}, 
{'k3': 4}, {'k2': -5}, {'k1': 4}])
reduce_rdd.map(lambda x:x.items()[0]).reduceByKey(lambda x, y: x + y).collectAsMap()

这将返回以下输出:

{'k1': 3, 'k2': -3, 'k3': 4}

关于python - PySpark reducebykey 与字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48320449/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com