gpt4 book ai didi

java - 如何在rdd操作中读取数据帧

转载 作者:太空宇宙 更新时间:2023-11-04 12:07:14 25 4
gpt4 key购买 nike

场景我有两个包含文本文件路径的字符串列表,列表 a,列表 b。我想要列表 a,b 的笛卡尔积来实现笛卡尔数据帧比较。

我尝试的方法是首先做笛卡尔积, 将其传输到pairRdd,然后进行foreach应用操作。

 List<String> a = Lists.newList("/data/1.text",/data/2.text","/data/3.text");
List<String> b = Lists.newList("/data/4.text",/data/5.text","/data/6.text");

JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
List<Tuple2<String,String>> cartesian = cartesian(a,b);
jsc.parallelizePairs(cartesian).filter(new Function<Tuple2<String, String>, Boolean>() {
@Override public Boolean call(Tuple2<String, String> tup) throws Exception {
Dataset<Row> text1 = spark.read().text(tup._1); <-- this throw NullPointerException
Dataset<Row> text2 = spark.read().text(tup._2);
return text1.first()==text2.first(); <-- this is an indicative function only
});

即使我可以使用 Spark 进行笛卡尔计算

JavaRDD<Column> sourceRdd = jsc.parallelize(a);
JavaRDD<Column> allRdd = jsc.parallelize(b);

sourceRdd.cache().cartesian(allRdd).filter(new Function<Tuple2<String, String>, Boolean>() {
@Override public Boolean call(Tuple2<Column, Column> tup) throws Exception {
Dataset<Row> text1 = spark.read().text(tup._1); <-- same issue
Dataset<Row> text2 = spark.read().text(tup._2);
return text1.first()==text2.first();
}
});

请提出处理此问题的好方法。

最佳答案

不确定我是否完全理解您的问题。以下是使用 Spark 和 Java 的笛卡尔示例。

public class CartesianDemo {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("CartesianDemo").setMaster("local");
JavaSparkContext jsc = new JavaSparkContext(conf);
//list
List<String> listOne = Arrays.asList("one", "two", "three", "four", "five");
List<String> listTwo = Arrays.asList("ww", "xx", "yy", "zz");
//RDD
JavaRDD<String> rddOne = jsc.parallelize(listOne);
JavaRDD<String> rddTwo = jsc.parallelize(listTwo);
//Cartesian
JavaPairRDD<String, String> cartesianRDD = rddOne.cartesian(rddTwo);
//print
cartesianRDD.foreach(data -> {
System.out.println("X=" + data._1() + " Y=" + data._2());
});
//stop
jsc.stop();
jsc.close();
}
}

关于java - 如何在rdd操作中读取数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40230647/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com