作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
根据documentation可以告诉Spark跟踪“超出范围”的检查点-不再需要的检查点-并从磁盘中清除它们。
SparkSession.builder
...
.config("spark.cleaner.referenceTracking.cleanCheckpoints", "true")
.getOrCreate()
0c514fb8-498c-4455-b147-aff242bd7381
获取SparkContext
的方法与获取applicationId
最佳答案
我知道它的老问题,但是最近我正在探索checkpoint
并遇到类似的问题。希望分享调查结果。
Question :Is there any configuration I am missing to perform all cleanse?
spark.cleaner.referenceTracking.cleanCheckpoints=true
有时会起作用,但很难依靠它。官方文件说,通过设置此属性
clean checkpoint files if the reference is out of scope
If there isn't: Is there any way to get the name of the temporary folder created for a particular application so I can programatically delete it? I.e. Get 0c514fb8-498c-4455-b147-aff242bd7381 from SparkContext the same way you can get the applicationId
checkpointed
目录,如下所示:
//Set directory
scala> spark.sparkContext.setCheckpointDir("hdfs:///tmp/checkpoint/")
scala> spark.sparkContext.getCheckpointDir.get
res3: String = hdfs://<name-node:port>/tmp/checkpoint/625034b3-c6f1-4ab2-9524-e48dfde589c3
//It gives String so we can use org.apache.hadoop.fs to delete path
// Set directory
>>> spark.sparkContext.setCheckpointDir('hdfs:///tmp/checkpoint')
>>> t = sc._jsc.sc().getCheckpointDir().get()
>>> t
u'hdfs://<name-node:port>/tmp/checkpoint/dc99b595-f8fa-4a08-a109-23643e2325ca'
// notice 'u' at the start which means It returns unicode object
// Below are the steps to get hadoop file system object and delete
>>> fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(sc._jsc.hadoopConfiguration())
fs.exists(sc._jvm.org.apache.hadoop.fs.Path(str(t)))
True
>>> fs.delete(sc._jvm.org.apache.hadoop.fs.Path(str(t)))
True
关于apache-spark - PySpark:全面清洁检查站,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52630858/
我是一名优秀的程序员,十分优秀!