gpt4 book ai didi

apache-spark - 用于 Spark 流统计的 API

转载 作者:行者123 更新时间:2023-12-05 05:20:21 25 4
gpt4 key购买 nike

我正在寻找允许访问 Spark Streaming Statistics 的 API,这些统计信息在历史服务器的“Streaming”选项卡中可用。

我主要对批处理时间值感兴趣,但至少根据文档,它不能通过 REST API 直接获得: https://spark.apache.org/docs/latest/monitoring.html#rest-api

enter image description here

关于如何获取各种信息(如“流”选项卡或在历史服务器中运行的作业)有什么想法吗?

最佳答案

在与驱动程序节点上的 Spark UI 相同的端口上有一个可用的指标端点。 http://<host>:<sparkUI-port>/metrics/json/

与流媒体相关的指标有一个 .StreamingMetrics以他们的名义:

来自本地测试作业的示例:

local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingDelay: {
value: 30
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingStartTime: {
value: 1498124090001
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_schedulingDelay: {
value: 1
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_submissionTime: {
value: 1498124090000
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_totalDelay: {
value: 31
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingStartTime: {
value: 1498124090001
}

要获得处理时间,我们需要 diff local- StreamingMetrics.streaming.lastCompletedBatch_processingEndTime -
StreamingMetrics.streaming.lastCompletedBatch_processingStartTime

关于apache-spark - 用于 Spark 流统计的 API,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44694424/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com