gpt4 book ai didi

apache-spark - Spark 平面 map 给出迭代器错误

转载 作者:行者123 更新时间:2023-12-03 05:53:46 26 4
gpt4 key购买 nike

如果我将 flatMap 通过 JSONArray 应用到 JSONObject,则会收到错误如果我从 Eclipse 在本地(笔记本电脑)上运行,它运行正常,但是当在集群(YARN)上运行时,它会给出奇怪的错误。Spark版本2.0.0

代码:-

JavaRDD<JSONObject> rdd7 = rdd6.flatMap(new FlatMapFunction<JSONArray, JSONObject>(){
@Override
public Iterable<JSONObject> call(JSONArray array) throws Exception {
List<JSONObject> list = new ArrayList<JSONObject>();
for (int i = 0; i < array.length();list.add(array.getJSONObject(i++)));
return list;
}
});

错误日志:-

java.lang.AbstractMethodError: com.pwc.spark.tifcretrolookup.TIFCRetroJob$2.call(Ljava/lang/Object;)Ljava/util/Iterator;
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
at com.pwc.spark.ElasticsearchClientLib.CommonESClient.index(CommonESClient.java:33)
at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:34)
at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:15)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

最佳答案

自 Spark 2.0.0 起,flatMap 调用中的函数必须返回 Iterator 而不是 Iterable,如发行说明所述:

Java RDD’s flatMap and mapPartitions functions used to require functions returning Java Iterable. They have been updated to require functions returning Java iterator so the functions do not need to materialize all the data.

这是the relevant Jira issue

关于apache-spark - Spark 平面 map 给出迭代器错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39275669/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com