gpt4 book ai didi

apache-spark - 如何将redis转成spark数据集或dataframe?

转载 作者:可可西里 更新时间:2023-11-01 10:55:59 27 4
gpt4 key购买 nike

我正在尝试使用 redis 作为 spark sql 的源,但对如何转换 rdd 感到困惑。以下是我的代码:

    RDD<Tuple2<String,String>> rdd1 = rc.fromRedisKV("user:*",3,redisConfig);

JavaRDD<Row> userRDD = rdd1.toJavaRDD().map(new Function<Tuple2<String,String>, Row>(){
public Row call(Tuple2<String, String> tuple2) throws Exception {
System.out.println(tuple2._2);
return RowFactory.create(tuple2._2().split(","));
}
});

List<StructField> structFields = new ArrayList<StructField>();
structFields.add(DataTypes.createStructField( "name", DataTypes.StringType, true ));
structFields.add(DataTypes.createStructField( "sex", DataTypes.StringType, false ));
structFields.add(DataTypes.createStructField( "age", DataTypes.IntegerType, false ));
StructType structType = DataTypes.createStructType(structFields);

Dataset ds = spark.createDataFrame(userRDD, structType);
ds.createOrReplaceTempView("user");
ds.printSchema();

String sql = "select name, sex, age from user ";

List<Row> list2 = spark.sql(sql).collectAsList();

我得到以下异常:

Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD

我不知道下一步该怎么做,请帮忙!

最佳答案

终于找到原因了:我的代码没有问题,但是我需要把我的应用程序的jar上传到Spark服务器。

关于apache-spark - 如何将redis转成spark数据集或dataframe?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41648209/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com