gpt4 book ai didi

apache-spark - 将 OFF_HEAP 存储与 Spark 1.4.0 和 Tachyon 0.6.4 一起使用时出现错误

转载 作者:行者123 更新时间:2023-12-03 05:24:28 24 4
gpt4 key购买 nike

我正在尝试在 Spark 1.4.0 和 tachyon 0.6.4 上使用堆外存储来持久化我的 RDD,如下所示:

val a = sqlContext.parquetFile("a1.parquet")
a.persist(org.apache.spark.storage.StorageLevel.OFF_HEAP)
a.count()

之后我收到以下异常。

对此有什么想法吗?

15/06/16 10:14:53 INFO : Tachyon client (version 0.6.4) is trying to connect master @ localhost/127.0.0.1:19998
15/06/16 10:14:53 INFO : User registered at the master localhost/127.0.0.1:19998 got UserId 3
15/06/16 10:14:53 INFO TachyonBlockManager: Created tachyon directory at /tmp_spark_tachyon/spark-6b2512ab-7bb8-47ca-b6e2-8023d3d7f7dc/driver/spark-tachyon-20150616101453-ded3
15/06/16 10:14:53 INFO BlockManagerInfo: Added rdd_10_3 on ExternalBlockStore on localhost:33548 (size: 0.0 B)
15/06/16 10:14:53 INFO BlockManagerInfo: Added rdd_10_1 on ExternalBlockStore on localhost:33548 (size: 0.0 B)
15/06/16 10:14:53 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 5710423667942934352
org.apache.spark.storage.BlockNotFoundException: Block rdd_10_3 not found
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:306)
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

我也对文本文件进行了相同的尝试,并且能够将其保留在快子中。问题在于持久化最初从 parquet 读取的 DataFrame。

最佳答案

似乎有一个相关的错误报告:https://issues.apache.org/jira/browse/SPARK-10314

由于似乎存在对此的拉取请求,因此可能有机会很快修复此问题。

从此线程中,https://groups.google.com/forum/#!topic/tachyon-users/xb8zwqIjIa4 ,看起来 Spark 正在使用 TRY_CACHE 模式写入 Tachyon,因此当从缓存中逐出时,数据似乎会丢失。

关于apache-spark - 将 OFF_HEAP 存储与 Spark 1.4.0 和 Tachyon 0.6.4 一起使用时出现错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30087056/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com