gpt4 book ai didi

hadoop - 在 PySpark 中显示 Hive 查询的状态

转载 作者:可可西里 更新时间:2023-11-01 15:55:58 26 4
gpt4 key购买 nike

我正在从 sparksession (spark) 运行 Hive 查询

spark.sql('SELECT * FROM SOME_TABLE').show()

在 sql 函数中是否有一个参数,或者一个配置来打印类似于 Hive cli 中显示的状态?

Hadoop job information for Stage-1: number of mappers: 1193; number of reducers: 1099
2017-05-16 14:54:38,165 Stage-1 map = 0%, reduce = 0%
2017-05-16 14:54:49,625 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 213.84 sec
2017-05-16 14:54:50,678 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 4495.91 sec
2017-05-16 14:54:51,729 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 5081.18 sec
2017-05-16 14:54:52,778 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 5244.48 sec
2017-05-16 14:54:53,818 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 7186.78 sec
2017-05-16 14:54:54,851 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 7702.71 sec
2017-05-16 14:54:55,887 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 7968.09 sec
2017-05-16 14:54:56,919 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 8325.11 sec

最佳答案

是的,您可以通过多种方式查看状态。

1) 要在作业运行时查看作业的 [相当详细] 状态,请将 logLevel 更改为“INFO”:spark.sparkContext.setLogLevel("INFO")

2) 使用 Spark 或 YARN UI(通常 Spark 端口 18088 或本地端口 4040 和 YARN 端口 8088)

UI 中的事件日志将显示您需要了解的内容,或者进度条是更简单的视觉效果。

相关文档:https://spark.apache.org/docs/latest/monitoring.html

关于hadoop - 在 PySpark 中显示 Hive 查询的状态,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44009492/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com