gpt4 book ai didi

python-2.7 - 如何使用map()将(键,值)对转换为仅在Pyspark中的值

转载 作者:行者123 更新时间:2023-12-03 20:26:00 26 4
gpt4 key购买 nike

我在 PySpark 中有这段代码。

wordsList = ['cat', 'elephant', 'rat', 'rat', 'cat']
wordsRDD = sc.parallelize(wordsList, 4)


wordCounts = wordPairs.reduceByKey(lambda x,y:x+y)
print wordCounts.collect()

#PRINTS--> [('rat', 2), ('elephant', 1), ('cat', 2)]

from operator import add
totalCount = (wordCounts
.map(<< FILL IN >>)
.reduce(<< FILL IN >>))

#SHOULD PRINT 5

#(wordCounts.values().sum()) // does the trick but I want to this with map() and reduce()


I need to use a reduce() action to sum the counts in wordCounts and then divide by the number of unique words.

* 但首先我需要将由(键,值)对组成的 RDD wordCounts 对映射()到值的 RDD。

这就是我被困住的地方。我尝试了下面类似的方法,但没有一个起作用:

.map(lambda x:x.values())
.reduce(lambda x:sum(x)))

AND,

.map(lambda d:d[k] for k in d)
.reduce(lambda x:sum(x)))

对此的任何帮助将不胜感激!

最佳答案

终于得到答案了,是这样的 -->

wordCounts
.map(lambda x:x[1])
.reduce(lambda x,y:x + y)

关于python-2.7 - 如何使用map()将(键,值)对转换为仅在Pyspark中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31178740/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com