gpt4 book ai didi

apache-spark - 在Spark Streaming中随机获取LeaseExpiredException

转载 作者:行者123 更新时间:2023-12-02 20:35:51 26 4
gpt4 key购买 nike

我有一个 Spark 流(2.1.1与cloudera 5.12)。输入kafka和输出HDFS(拼花格式)
问题是,我随机获得LeaseExpiredException(不是在所有迷你批处理中)

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/qoe_fixe/data_tv/tmp/cleanData/_temporary/0/_temporary/attempt_20180629132202_0215_m_000000_0/year=2018/month=6/day=29/hour=11/source=LYO2/part-00000-c6f21a40-4088-4d97-ae0c-24fa463550ab.snappy.parquet (inode 135532024): File does not exist. Holder DFSClient_attempt_20180629132202_0215_m_000000_0_-1048963677_900 does not have any open files.



我正在使用数据集API写入HDFS
      if (!InputWithDatePartition.rdd.isEmpty() ) InputWithDatePartition.repartition(1).write.partitionBy("year", "month", "day","hour","source").mode("append").parquet(cleanPath)

几个小时后,我的工作因该错误而失败

最佳答案

写入同一目录的两个作业共享相同的_temporary文件夹。

因此,当第一个作业完成时,将执行以下代码( FileOutputCommitter类):

  public void cleanupJob(JobContext context) throws IOException {
if (hasOutputPath()) {
Path pendingJobAttemptsPath = getPendingJobAttemptsPath();
FileSystem fs = pendingJobAttemptsPath
.getFileSystem(context.getConfiguration());
// if job allow repeatable commit and pendingJobAttemptsPath could be
// deleted by previous AM, we should tolerate FileNotFoundException in
// this case.
try {
fs.delete(pendingJobAttemptsPath, true);
} catch (FileNotFoundException e) {
if (!isCommitJobRepeatable(context)) {
throw e;
}
}
} else {
LOG.warn("Output Path is null in cleanupJob()");
}
}


当第二个作业仍在运行时,它将删除 endingJobAttemptsPath (_temporary)
这可能会有所帮助:

Multiple spark jobs appending parquet data to same base path with partitioning

关于apache-spark - 在Spark Streaming中随机获取LeaseExpiredException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51138051/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com