gpt4 book ai didi

apache-kafka - 如何使用复合键从主题创建 KSQL 表?

转载 作者:行者123 更新时间:2023-12-04 04:49:11 24 4
gpt4 key购买 nike

假设我有一个关于温度预测数据的主题,如下:

2018-10-25,Melbourne,21
2018-10-26,Melbourne,17
2018-10-27,Melbourne,21
2018-10-25,Sydney,22
2018-10-26,Sydney,20
2018-10-27,Sydney,23
2018-10-26,Melbourne,18
2018-10-27,Melbourne,22
2018-10-26,Sydney,21
2018-10-27,Sydney,24

每个条目都包含一个日期、一个城市和一个预报温度,并代表该日期对该城市预报的更新。我可以将其描述为这样的 KSQL 流:
CREATE STREAM forecasts_csv ( \
date VARCHAR, \
city VARCHAR, \
temperature INTEGER \
) WITH (kafka_topic='forecasts-csv', value_format='DELIMITED');

现在,我想要一个表格来表示每个城市的当前(即最新)预测温度,以及随着时间的推移该预测的最小值和最大值。所需输出的示例是:
{ date='2018-10-27', city='Melbourne', latest=22, min=21, max=22 }

我怎样才能做到这一点?

我设法获得如下聚合(最小/最大):
CREATE STREAM forecasts_keyed \
WITH (partitions=4, value_format='JSON') \
AS SELECT date + '/' + city AS forecast_key, * \
FROM forecasts_csv \
PARTITION BY forecast_key;

CREATE TABLE forecasts_minmax \
WITH (partitions=4, value_format='JSON') \
AS SELECT forecast_key, date, city, \
min(temperature) as min, max(temperature) as max \
FROM forecasts_keyed \
GROUP by forecast_key, date, city;

这给了我输出消息,如:
{"FORECAST_KEY":"2018-10-27/Melbourne","DATE":"2018-10-27","CITY":"Melbourne","MIN":21,"MAX":22}

但我不知道如何将其与“最新”阅读结合起来。

最佳答案

你需要实现一个UDAF,我们称之为LATEST ,保留给定列和键的最新值。这非常简单,您可以在 KSQL 文档中找到如何添加自定义 UDAF:https://docs.confluent.io/current/ksql/docs/developer-guide/udf.html#udafs

假设您有 LATEST UDAF 可用,您可以编写以下查询:

CREATE TABLE foo AS
SELECT
date,
city,
MIN(temperature) AS minValue,
MAX(temperature) AS maxValue,
LATEST(temperature) AS latestValue
FROM forecasts_csv
GROUP BY date, city;

关于apache-kafka - 如何使用复合键从主题创建 KSQL 表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52979600/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com