gpt4 book ai didi

cassandra - Zeppelin spark RDD 命令失败但在 spark-shell 中有效

转载 作者:行者123 更新时间:2023-12-03 08:21:27 26 4
gpt4 key购买 nike

我已经设置了一个独立的单节点“集群”,运行以下内容:

  • Cassandra 2.2.2
  • Spark 1.5.1
  • 列表项
  • 为 Spark-Cassandra-Connector 1.5.0-M2 编译的 fat jar
  • 编译的 Zeppelin 0.6 快照
    编译:
    mvn -Pspark-1.5 -Dspark.version=1.5.1 -Dhadoop.version=2.6.0 -Phadoop-2.4 -DskipTests 干净包

  • 我可以很好地使用 spark shell 从 cassandra 检索数据

    我已经改变了 Zeppelin-env.sh 如下:
    export MASTER=spark://localhost:7077
    export SPARK_HOME=/root/spark-1.5.1-bin-hadoop2.6/
    export ZEPPELIN_PORT=8880
    export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/opt/sparkconnector/spark-cassandra-connector-assembly-1.5.0-M2-SNAPSHOT.jar -Dspark.cassandra.connection.host=localhost"
    export ZEPPELIN_NOTEBOOK_DIR="/root/gowalla-spark-demo/notebooks/zeppelin"
    export SPARK_SUBMIT_OPTIONS="--jars /opt/sparkconnector/spark-cassandra-connector-assembly-1.5.0-M2-SNAPSHOT.jar --deploy-mode cluster"
    export ZEPPELIN_INTP_JAVA_OPTS=$ZEPPELIN_JAVA_OPTS

    然后我开始向笔记本添加段落并首先导入以下内容:
    import com.datastax.spark.connector._
    import com.datastax.spark.connector.cql._
    import com.datastax.spark.connector.rdd.CassandraRDD
    import org.apache.spark.rdd.RDD
    import org.apache.spark.SparkContext
    import org.apache.spark.SparkConf

    不确定是否所有这些都是必要的。这一段运行良好。

    然后我执行以下操作:
    val checkins = sc.cassandraTable("lbsn", "checkins")

    这运行良好并返回:
    checkins: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15

    然后下一段 - 运行以下 2 条语句 - 第一个成功,第二个失败:
    checkins.count
    checkins.first

    结果:
    res13: Long = 138449
    com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
    at [Source: {"id":"4","name":"first"}; line: 1, column: 1]
    at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
    at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
    at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
    at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
    at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
    at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
    at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
    at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
    at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
    at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1582)
    at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1582)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.rdd.RDD.<init>(RDD.scala:1582)
    at com.datastax.spark.connector.rdd.CassandraRDD.<init>(CassandraRDD.scala:15)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.<init>(CassandraTableScanRDD.scala:59)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.copy(CassandraTableScanRDD.scala:92)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.copy(CassandraTableScanRDD.scala:59)
    at com.datastax.spark.connector.rdd.CassandraRDD.limit(CassandraRDD.scala:103)
    at com.datastax.spark.connector.rdd.CassandraRDD.take(CassandraRDD.scala:122)
    at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1312)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
    at org.apache.spark.rdd.RDD.first(RDD.scala:1311)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
    at $iwC$$iwC$$iwC.<init>(<console>:57)
    at $iwC$$iwC.<init>(<console>:59)
    at $iwC.<init>(<console>:61)
    at <init>(<console>:63)
    at .<init>(<console>:67)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:655)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:620)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:613)
    at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

    为什么调用 first 失败。诸如 sc.fromTextFile 之类的调用也会失败。

    以下也有效:
    checkins.where("year = 2010 and month=2 and day>12 and day<15").count()

    但这不会:
    checkins.where("year = 2010 and month=2 and day>12 and day<15").first()

    请提供帮助,因为这让我发疯。特别是因为 Spark shell 有效,但这并没有或至少似乎部分损坏。

    谢谢

    最佳答案

    com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
    at [Source: {"id":"4","name":"first"}; line: 1, column: 1]

    当类路径中有两个或更多版本的 jackson 库时,会发生此异常。

    确保您的 Spark Interpreter 进程在类路径中只有一个版本的 jackson 库。

    关于cassandra - Zeppelin spark RDD 命令失败但在 spark-shell 中有效,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33141550/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com