gpt4 book ai didi

java - Apache Spark Map-Reduce 解释

转载 作者:行者123 更新时间:2023-12-01 22:17:45 24 4
gpt4 key购买 nike

我想知道这个小片段是如何工作的:

如果我有这样的文字:

Ut quis pretium tellus. Fusce quis suscipit ipsum. Morbi viverra elit ut malesuada pellentesque. Fusce eu ex quis urna lobortis finibus. Integer aliquam faucibus neque id cursus. Nulla non massa odio. Fusce pretium felis felis, at malesuada felis blandit nec. Praesent ligula enim, gravida sit amet scelerisque eget, porta non mi. Aenean vitae maximus tortor, ac facilisis orci.

这个片段代码计算了上面文本中每个单词的出现次数:

        // Load  input data.
JavaRDD<String> input = sc.textFile(inputFile);
// Split up into words.
JavaRDD<String> words = input.flatMap(new FlatMapFunction<String, String>() {
public Iterable<String> call(String x) {
return Arrays.asList(x.split(" "));
}
});
// Transform into word and count.
JavaPairRDD<String, Integer> counts = words.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String x) {
return new Tuple2(x, 1);
}
}).reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer x, Integer y) {
return x + y;
}
});

这行代码很容易理解

JavaRDD<String> words = input.flatMap(new FlatMapFunction<String, String>() {
public Iterable<String> call(String x) {
return Arrays.asList(x.split(" "));
}
});

创建一个包含按空格分割的整个单词的数据集

这一行为每个元组赋予值 1,例如:

JavaPairRDD<String, Integer> counts = words.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String x) {
return new Tuple2(x, 1);

Ut,1
quis,1 //go on

我对 reduceByKey 的工作原理以及它如何计算每个单词的出现次数感到困惑?

提前致谢。

最佳答案

reduceByKey 按键(每个元组中的第一个参数)对元组进行分组,并对每个组进行归约。

像这样:

(Ut, 1), (quis, 1), ..., (quis, 1), ..., (quis, 1), ... mapToPair

               \            /             |                           reduceByKey
+
(quis, 1+1) |
\ /
\ /
+
(quis, 2+1)

关于java - Apache Spark Map-Reduce 解释,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30644361/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com