gpt4 book ai didi

python - RDD 沿袭/Spark 操作图的良好输出

转载 作者:行者123 更新时间:2023-12-01 02:00:13 31 4
gpt4 key购买 nike

我对用于教育目的的 Spark RDD 谱系或运算符图的清晰表示感兴趣。我尝试了 .toDebugString() 但我无法将其打印漂亮(包括换行符等)。这里出了什么问题?

Using Python version 3.6.4 (default, Mar  1 2018 18:36:42)

SparkSession available as 'spark'.
>>> sc
<SparkContext master=local[*] appName=PySparkShell>
>>> rdd = sc.parallelize(range(10000))
>>> rdd.toDebugString()
b'(4) PythonRDD[1] at RDD at PythonRDD.scala:48 []\n | ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:175 []'
>>> print(rdd.toDebugString())
b'(4) PythonRDD[1] at RDD at PythonRDD.scala:48 []\n | ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:175 []'
>>>

除了调试字符串之外,还有更好的方法来可视化图表吗?

最佳答案

but I'm having trouble getting it pretty-printed

因为它是bytes对象。只需解码结果即可:

>>> print(rdd.toDebugString().decode("utf-8"))
(4) PythonRDD[1] at RDD at PythonRDD.scala:48 []
| ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:489 []

are there even better ways of visualizing the graph?

Spark UI 中的 DAG 可视化通常就足够了。

关于python - RDD 沿袭/Spark 操作图的良好输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49754318/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com