java - Kafka流聚合: How to ignore intermediate aggregation results for a Window-6ren

java - Kafka流聚合: How to ignore intermediate aggregation results for a Window

转载作者：行者123 更新时间：2023-12-01 17:50:04

24

4

我们正在使用kafka-stream与时间窗口聚合以计算事件的最终总和。我们已经实现了我们的要求，但中间聚合结果存在问题。根据Kafka内存管理文档(https://kafka.apache.org/11/documentation/streams/developer-guide/memory-mgmt.html)，似乎没有办法丢弃这些影响最终结果的中间结果。请考虑以下摘自上述文档的解释。

Use the following example to understand the behaviors with and without record caching. In this example, the input is a KStream<String,Integer> with the records <K,V>: <A, 1>, <D, 5>, <A, 20>, <A, 300>. The focus in this example is on the records with key == A.

An aggregation computes the sum of record values, grouped by key, for the input and returns a KTable<String, Integer>.

Without caching: a sequence of output records is emitted for key A that represent changes in the resulting aggregation table. The parentheses (()) denote changes, the left number is the new aggregate value and the right number is the old aggregate value: <A, (1, null)>, <A, (21, 1)>, <A, (321, 21)>.

With caching: a single output record is emitted for key A that would likely be compacted in the cache, leading to a single output record of <A, (321, null)>. This record is written to the aggregation’s internal state store and forwarded to any downstream operations.

The cache size is specified through the cache.max.bytes.buffering parameter, which is a global setting per processing topology:

根据文档，在不缓存输出记录的情况下使用聚合时会产生增量结果。 (我们注意到，即使有缓存，有时也会发生这种情况)。我们的问题是我们有其他应用程序对这些输出聚合起作用并进行一些计算。因此，当输出具有中间聚合时，这些其他计算就会出错。例如，当我们有 <A (21,1)> 时，我们可能会开始计算其他内容事件(正确的计算应该在<A (321, null)>那个时间窗口上完成。

我们的要求是仅在该窗口上的最终聚合上进行其他计算。我们有以下关于kafka流聚合的问题

当kakfa输出中间结果时，这些输出不是已经聚合了数据吗？例如，考虑输出 <A, (1, null)>, <A, (21, 1)>, <A, (321, 21)> 。这里是第二个输出事件 <A, (21, 1)>是第三个输出 <A, (321, 21)>已经累计值(value)。这是正确的吗？
有没有办法识别窗口的中间结果？

最佳答案

要记住的另一件事是提交时间间隔和缓存大小控制着结果向下游转发的时间。

例如，如果您的提交间隔为 10 秒，则意味着无论缓存是否已满，缓存中的结果都会被转发(如果启用了日志记录，则写入更改日志主题)。

因此，如果您可以将内存设置得足够高以支持将提交间隔设置为所需的窗口时间，则您可能能够近似得到单个最终结果。当然，这是一种粗粒度的方法，会影响整个拓扑，因此您需要考虑并可能构建一个示例应用程序原型(prototype)，看看这种方法是否适合您。

关于java - Kafka流聚合: How to ignore intermediate aggregation results for a Window，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51298162/

24

4

0

文章推荐： java - 通过命令行使用mvn运行Java程序

文章推荐： java - Spring 启动 "Error creating bean with name"

文章推荐： java - 如何从java中的字符串中删除不同数量的空格？

文章推荐： Spring 集成 WebFlux 错误处理

javascript - mathjs 评估错误 : (intermediate value)(intermediate value)(intermediate value) is not a function
如果我运行一段代码: obj = {}; obj['number'] = 1; obj['expressionS'] = 'Sin(0.5 * c1)'; obj['c
javascript - Highcharts JS 未捕获类型错误 : x[(intermediate value)(intermediate value)(intermediate value)] is not a constructor
我正在构建一个条形图，从 ajax 响应中检索选项。但是当我将对象传递给 highcharts 构造函数时，我收到以下错误 Uncaught TypeError: x[(intermediate va
javascript - 高库存，错误 : Uncaught TypeError: w[(intermediate value)(intermediate value)(intermediate value)] is not a constructor
我正在尝试创建 highstock 图表，但出现以下错误: error: Uncaught TypeError: w[(intermediate value)(intermediate value)(
javascript - TypeError : {(intermediate value)(intermediate value)}. then is not a function
尝试使用 Axios 发帖时出现奇怪的错误。 JS: methods: { onSubmit: function () { axios.post('/us
javascript - TypeError : (intermediate value)(intermediate value). 成功不是一个函数(angular)
我很难理解这个错误...我不太明白为什么它不是函数... angular.module('mkApp').factory('mkService', function ($http, $log) {
javascript - TypeError : (intermediate value)(intermediate value). val(...).prop 不是函数( ionic )
在从$范围。请检查我的脚本有什么问题: HTML 模板: Product
ssl - Intermediate 签名的证书显示为自签名证书
我目前正在将我的服务托管从托管托管(运行 Lightspeed + Cpanel)迁移到我自己的托管托管，运行 Nginx。 Nginx 1.6.0 一切正常，但我的问题是我的证书显示为自签名。我安装
reactjs - Flux : Where should the intermediate, 错误被存储？
Flux 文档指出状态应该存储在 Stores 中。然后，与实体相关的加载、保存、错误消息是否应该存储在 Stores 中。由于 View 将从 Store 获取其初始状态，因此了解其加载/保存是否来
angularjs - 用户界面路由器 : intermediate templates
Final Edit: working plunker with the transcluded directive. Edit: I made a first plunker with the so
javascript - 在javascript中是否可以部分导入 "intermediate module"？
我创建了一个包含许多 Thing 文件的 Things 文件夹，然后在该 Thing 文件夹内创建了一个充当“中间模块”的索引。像这样... // things/thing1.js console.
C++ 试图创建一个 'Intermediate' 仿函数
我所说的“中间”仿函数的意思是:一个普通的仿函数，其中一个参数可以在调用时指定。问题是我有一个动画时间轴(本质上是特定帧的标量值)，并且它的输出需要通过管道传入和传出要动画的对象中的 getter/s
fortify - Fortify是否需要很长时间而 "Generating Intermediate Files"
我正在使用 Fortify 对我们的 cSharp 应用程序之一进行安全扫描，在“生成中间文件”时需要花费数小时，我不确定它是挂起还是真的在做某事。真的需要这么长时间吗？最佳答案在 .NET 构建
javascript - TypeError : (intermediate value)(. ..) 在自调用函数中未定义
所以我读了这个错误后，显然缺少一个分号？但我根本不知 Prop 体在哪里: (() => { fetch('/testmode') .then(response => {
Docker磁盘内存: can I remove intermediate images?
我的磁盘内存不足。如果我运行docker images，我会得到很多结果: app_mongodb latest 355f8f37c385 17 hour
numpy - "an intermediate result is being cached"是什么意思？
我有一组 n 个向量存储在 3 x n 矩阵 z 中。我使用np.einsum找到了外部产品。当我使用以下方法计时时: %timeit v=np.einsum('i...,j...->ij...',z
python - pandas窗口函数: accessing to intermediate results
我正在做一些日志分析并每隔几分钟检查一次队列的长度。我知道文件何时进入“队列”(一个简单的文件系统目录)以及何时离开。这样，我可以绘制给定时间间隔的队列长度。到目前为止一切顺利，尽管代码有点程序化:
javascript - TypeError : (intermediate value). 那么不是一个函数
我使用react 16、babel 7、webpack 4。另一个项目正在运行，但这个项目无法运行。error is (intermediate value).then 不是一个函数。我不知道有什么
java - 让我们加密 : Intermediate certificate for LetsEncrypt
我们目前正在使用 LetsEncrypt SSL 证书，并且运行良好。经过一些修改后，我们也能够将其拉入 Tomcat 和 Apache Web 服务器。目前，我们想将 LetsEncrypt 证书
Java : How to return intermediate results from a Thread
使用 Java 7我正在尝试构建一个监视数据存储(某种集合类型)的观察者，然后在某些时候从中返回某些项目。在这种情况下，它们是时间戳，当时间戳超过当前时间时，我希望它返回到起始线程。请看下面的代码。
C# 编译器 : get intermediate output
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题，以便用事实和引用来回答。关闭 2 年前。

首页

博学

6Ren·AI

商城

java - Kafka流聚合: How to ignore intermediate aggregation results for a Window