gpt4 book ai didi

apache-spark - 查找 writeStream 操作写入的记录数 : SparkListener OnTaskEnd always return 0 in Structured Streaming

转载 作者:行者123 更新时间:2023-12-05 06:32:03 24 4
gpt4 key购买 nike

我想获取writeStream操作写入的记录数。为此,我有这段代码。

spark.sparkContext.addSparkListener(new SparkListener() {
override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
val metrics = taskEnd.taskMetrics
if(metrics.inputMetrics != None){
inputRecords += metrics.inputMetrics.recordsRead
}
if(metrics.outputMetrics != None) {
println("OUTPUTMETRICIS NOT NONE")
recordsWritten += metrics.outputMetrics.recordsWritten
bytesWritten += metrics.outputMetrics.bytesWritten
}
numTasks += 1
println("recordsWritten = " + recordsWritten)
println("bytesWritten = " + bytesWritten)
println("numTasks = " + numTasks)
}
})

代码进入 block ,但值 recordsWritten byteswritten inputrecords 始终为 0。

编辑:升级到 2.3.1,因为有一个修复。还是给0

Streaming query made progress: {
"id" : "9c345af0-042c-4eeb-80db-828c5f69e442",
"runId" : "d309f7cf-624a-42e5-bb54-dfb4fa939228",
"name" : "WriteToSource",
"timestamp" : "2018-07-30T14:20:33.486Z",
"batchId" : 3,
"numInputRows" : 3511,
"inputRowsPerSecond" : 2113.786875376279,
"processedRowsPerSecond" : 3013.733905579399,
"durationMs" : {
"addBatch" : 1044,
"getBatch" : 29,
"getOffset" : 23,
"queryPlanning" : 25,
"triggerExecution" : 1165,
"walCommit" : 44
},
"stateOperators" : [ ],
"sources" : [ {
"description" : "KafkaSource[Subscribe[proto2-events-identification-carrier]]",
"startOffset" : {
"proto2-events-identification-carrier" : {
"2" : 22400403,
"1" : 22313194,
"0" : 22381260
}
},
"endOffset" : {
"proto2-events-identification-carrier" : {
"2" : 22403914,
"1" : 22313194,
"0" : 22381260
}
},
"numInputRows" : 3511,
"inputRowsPerSecond" : 2113.786875376279,
"processedRowsPerSecond" : 3013.733905579399
} ],
"sink" : {
"description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@1350f304"
}
}

显示了这个但是我无法在代码中得到它。

最佳答案

was a bug在 2.3.1 版本中修复的 Spark 结构化流的 FileStreamSink 中。

作为解决方法,您可以使用 accumulators就在将数据写入接收器之前。

关于apache-spark - 查找 writeStream 操作写入的记录数 : SparkListener OnTaskEnd always return 0 in Structured Streaming,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51513411/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com