gpt4 book ai didi

java - Spark : Combine two Java object RDDs into one

转载 作者:行者123 更新时间:2023-11-30 06:25:53 24 4
gpt4 key购买 nike

我有同一个对象的两个 JavaRDD,我想将数据合并为一个。这些是:

域名

public class User {
String name;
String email;
String profession;
Integer age;

// constructor

// setters and getters
}

RDD 1

User user1 = new User ("Name", "email@email.com");
User user2 = new User ("Name2", "email2@email.com");

List<User> userList = new ArrayList<>();
userList.add(user1);
userList.add(user2);

JavaRDD<User> leftUserJavaRDD = sc.parallelize(userList);

RDD 2

User user3 = new User ("email@email.com", "Software Engineer", 26);
User user4 = new User ("email2@email.com", "Lawyer", 35);

List<User> userList2 = new ArrayList<>();
userList.add(user3);
userList.add(user4);

JavaRDD<User> rightUserJavaRDD = sc.parallelize(userList2);

我想将两个 RDD 与共同的电子邮件地址结合起来。我想要的组合 RDD 是:

User user1and3 = new User (
"Name",
"email@email.com",
"Software Engineer",
26);

User user2and4 = new User (
"Name2",
"email2@email.com",
"Lawyer",
35);

如何使用 Java 在 Spark 中执行此操作?我尝试了 unioncartesian 但没有成功。

最佳答案

我得到了一位同事的帮助,这是我们得到的解决方案。

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function2;
import scala.Tuple2;

import java.util.List;

public JavaRDD<User> getCombinedUsers(JavaRDD<User> leftUserJavaRDD, JavaRDD<User> rightUserJavaRDD) {

JavaPairRDD<String, User> leftUserJavaPairRDD =
leftUserJavaRDD.mapToPair(user -> new Tuple2<>(user.getEmail(), user));

JavaPairRDD<String, User> rightUserJavaPairRDD =
rightUserJavaRDD.mapToPair(user -> new Tuple2<>(user.getEmail(), user));

return leftUserJavaPairRDD
.union(rightUserJavaPairRDD)
.reduceByKey(merge).values();
}

/**
* Reduce Function for merging User with no profession and age information with the one that has profession and age information.
*/
private static Function2<User, User, User> merge =
(User left, User right) ->
new User(left.getName(), left.getEmail(), right.getProfession(), right.getAge());

关于java - Spark : Combine two Java object RDDs into one,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47205560/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com