gpt4 book ai didi

java - 任务在 Spark 中不可序列化

转载 作者:塔克拉玛干 更新时间:2023-11-02 08:09:18 24 4
gpt4 key购买 nike

我有一个这样的转换:

JavaRDD<Tuple2<String, Long>> mappedRdd = myRDD.values().map(
new Function<Pageview, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> call(Pageview pageview) throws Exception {
String key = pageview.getUrl().toString();
Long value = getDay(pageview.getTimestamp());
return new Tuple2<>(key, value);
}
});

综合浏览量属于以下类型:Pageview.java

然后我将那个类注册到 Spark 中:

Class[] c = new Class[1];
c[0] = Pageview.class;
sparkConf.registerKryoClasses(c);

Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1623) at org.apache.spark.rdd.RDD.map(RDD.scala:286) at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:89) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46) at org.apache.gora.tutorial.log.ExampleSpark.run(ExampleSpark.java:100) at org.apache.gora.tutorial.log.ExampleSpark.main(ExampleSpark.java:53) Caused by: java.io.NotSerializableException: org.apache.gora.tutorial.log.ExampleSpark Serialization stack: - object not serializable (class: org.apache.gora.tutorial.log.ExampleSpark, value: org.apache.gora.tutorial.log.ExampleSpark@1a2b4497) - field (class: org.apache.gora.tutorial.log.ExampleSpark$1, name: this$0, type: class org.apache.gora.tutorial.log.ExampleSpark) - object (class org.apache.gora.tutorial.log.ExampleSpark$1, org.apache.gora.tutorial.log.ExampleSpark$1@4ab2775d) - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function) - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, ) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) ... 7 more

当我调试代码时,我看到 JavaSerializer.scala 被调用,即使有一个名为 KryoSerializer 的类也是如此。

PS 1:我不想使用 Java Serializer,但在 Pageview 上实现 Serializer 并不能解决问题。

PS 2:这并没有解决问题:

...
//String key = pageview.getUrl().toString();
//Long value = getDay(pageview.getTimestamp());
String key = "Dummy";
Long value = 1L;
return new Tuple2<>(key, value);
...

最佳答案

我在 Java 代码中多次遇到这个问题。尽管我使用的是 Java 序列化,但我会将包含该代码的类设为可序列化,或者如果您不想这样做,我会将 Function 设为该类的静态成员。

这是一个解决方案的代码片段。

public class Test {
private static Function s = new Function<Pageview, Tuple2<String, Long>>() {

@Override
public Tuple2<String, Long> call(Pageview pageview) throws Exception {
String key = pageview.getUrl().toString();
Long value = getDay(pageview.getTimestamp());
return new Tuple2<>(key, value);
}
};
}

关于java - 任务在 Spark 中不可序列化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31105400/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com