gpt4 book ai didi

apache-kafka - 汇合 4.1.0 -> KSQL : STREAM-TABLE join -> table data null

转载 作者:行者123 更新时间:2023-12-04 04:37:10 27 4
gpt4 key购买 nike

STEP 1:运行生产者创建样本数据

./bin/kafka-avro-console-producer \
--broker-list localhost:9092 --topic stream-test-topic \
--property schema.registry.url=http://localhost:8081 \
--property value.schema='{"type":"record","name":"dealRecord","fields":[{"name":"DEAL_ID","type":"string"},{"name":"DEAL_EXPENSE_CODE","type":"string"},{"name":"DEAL_BRANCH","type":"string"}]}'

样本数据 :
{"DEAL_ID":"deal002", "DEAL_EXPENSE_CODE":"EXP002", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal003", "DEAL_EXPENSE_CODE":"EXP003", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal004", "DEAL_EXPENSE_CODE":"EXP004", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal005", "DEAL_EXPENSE_CODE":"EXP005", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal006", "DEAL_EXPENSE_CODE":"EXP006", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal007", "DEAL_EXPENSE_CODE":"EXP001", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal008", "DEAL_EXPENSE_CODE":"EXP002", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal009", "DEAL_EXPENSE_CODE":"EXP003", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal010", "DEAL_EXPENSE_CODE":"EXP004", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal011", "DEAL_EXPENSE_CODE":"EXP005", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal012", "DEAL_EXPENSE_CODE":"EXP006", "DEAL_BRANCH":"AMSTERDAM"}

STEP 2:打开另一个终端,运行消费者测试数据。
./bin/kafka-avro-console-consumer --topic stream-test-topic \
--bootstrap-server localhost:9092 \
--property schema.registry.url=http://localhost:8081 \
--from-beginning

第 3 步:打开另一个终端并运行生产者。
./bin/kafka-avro-console-producer \
--broker-list localhost:9092 --topic expense-test-topic \
--property "parse.key=true" \
--property "key.separator=:" \
--property schema.registry.url=http://localhost:8081 \
--property key.schema='"string"' \
--property value.schema='{"type":"record","name":"dealRecord","fields":[{"name":"EXPENSE_CODE","type":"string"},{"name":"EXPENSE_DESC","type":"string"}]}'

数据:
"pk1":{"EXPENSE_CODE":"EXP001", "EXPENSE_DESC":"Regulatory Deposit"}
"pk2":{"EXPENSE_CODE":"EXP002", "EXPENSE_DESC":"ABC - Sofia"}
"pk3":{"EXPENSE_CODE":"EXP003", "EXPENSE_DESC":"Apple Corporation"}
"pk4":{"EXPENSE_CODE":"EXP004", "EXPENSE_DESC":"Confluent Europe"}
"pk5":{"EXPENSE_CODE":"EXP005", "EXPENSE_DESC":"Air India"}
"pk6":{"EXPENSE_CODE":"EXP006", "EXPENSE_DESC":"KLM International"}

第 4 步:打开另一个终端并运行消费者
./bin/kafka-avro-console-consumer --topic expense-test-topic \
--bootstrap-server localhost:9092 \
--property "parse.key=true" \
--property "key.separator=:" \
--property schema.registry.url=http://localhost:8081 \
--from-beginning

第五步:登录KSQL客户端。
./bin/ksql http://localhost:8088

创建以下流和表并运行连接查询。

KSQL:

溪流:
    CREATE STREAM SAMPLE_STREAM 
(DEAL_ID VARCHAR, DEAL_EXPENSE_CODE varchar, DEAL_BRANCH VARCHAR)
WITH (kafka_topic='stream-test-topic',value_format='AVRO', key = 'DEAL_ID');

table :
CREATE TABLE SAMPLE_TABLE 
(EXPENSE_CODE varchar, EXPENSE_DESC VARCHAR)
WITH (kafka_topic='expense-test-topic',value_format='AVRO', key = 'EXPENSE_CODE');

以下是输出:
ksql> SELECT STREAM1.DEAL_EXPENSE_CODE, TABLE1.EXPENSE_DESC 
from SAMPLE_STREAM STREAM1 LEFT JOIN SAMPLE_TABLE TABLE1
ON STREAM1.DEAL_EXPENSE_CODE = TABLE1.EXPENSE_CODE
WINDOW TUMBLING (SIZE 3 MINUTE)
GROUP BY STREAM1.DEAL_EXPENSE_CODE, TABLE1.EXPENSE_DESC;

EXP001 | null
EXP001 | null
EXP002 | null
EXP003 | null
EXP004 | null
EXP005 | null
EXP006 | null
EXP002 | null
EXP002 | null

最佳答案

tl;dr:您的表数据需要在您加入的列上键入。

使用上面的示例数据,以下是调查和修复的方法。

  • 使用 KSQL 检查主题中的数据(不需要 kafka-avro-console-consumer)。输出数据格式为时间戳、键、值
  • stream :
    ksql> print 'stream-test-topic' from beginning;
    Format:AVRO
    30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal002", "DEAL_EXPENSE_CODE": "EXP002", "DEAL_BRANCH": "AMSTERDAM"}
    30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal003", "DEAL_EXPENSE_CODE": "EXP003", "DEAL_BRANCH": "AMSTERDAM"}
    30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal004", "DEAL_EXPENSE_CODE": "EXP004", "DEAL_BRANCH": "AMSTERDAM"}
  • table :
    ksql> print 'expense-test-topic' from beginning;
    Format:AVRO
    30/04/18 16:10:52 BST, pk1, {"EXPENSE_CODE": "EXP001", "EXPENSE_DESC": "Regulatory Deposit"}
    30/04/18 16:10:52 BST, pk2, {"EXPENSE_CODE": "EXP002", "EXPENSE_DESC": "ABC - Sofia"}
    30/04/18 16:10:52 BST, pk3, {"EXPENSE_CODE": "EXP003", "EXPENSE_DESC": "Apple Corporation"}
    30/04/18 16:10:52 BST, pk4, {"EXPENSE_CODE": "EXP004", "EXPENSE_DESC": "Confluent Europe"}
    30/04/18 16:10:52 BST, pk5, {"EXPENSE_CODE": "EXP005", "EXPENSE_DESC": "Air India"}
    30/04/18 16:10:52 BST, pk6, {"EXPENSE_CODE": "EXP006", "EXPENSE_DESC": "KLM International"}

  • 此时,请注意键( pk<x> )与我们将要加入的列 不匹配
  • 注册两个主题:
    ksql> CREATE STREAM deals WITH (KAFKA_TOPIC='stream-test-topic', VALUE_FORMAT='AVRO');

    Message
    ----------------
    Stream created
    ----------------

    ksql> CREATE TABLE expense_codes_table WITH (KAFKA_TOPIC='expense-test-topic', VALUE_FORMAT='AVRO', KEY='EXPENSE_CODE');

    Message
    ---------------
    Table created
    ---------------
  • 告诉 KSQL 从每个主题的开头查询事件
    ksql> SET 'auto.offset.reset' = 'earliest';
    Successfully changed local property 'auto.offset.reset' from 'null' to 'earliest'
  • 验证表的每个 DDL ( KEY='EXPENSE_CODE' ) 声明的键是否与底层 Kafka 消息的实际键匹配(可通过 ROWKEY 系统列获得):
    ksql> SELECT ROWKEY, EXPENSE_CODE FROM expense_codes_table;
    pk1 | EXP001
    pk2 | EXP002
    pk3 | EXP003
    pk4 | EXP004
    pk5 | EXP005
    pk6 | EXP006

    键不匹配。我们的加入注定失败!
  • 神奇的解决方法——让我们使用 KSQL 重新设置主题!
  • 将表的源主题注册为 KSQL STREAM :
    ksql> CREATE STREAM expense_codes_stream WITH (KAFKA_TOPIC='expense-test-topic', VALUE_FORMAT='AVRO');

    Message
    ----------------
    Stream created
    ----------------
  • 创建派生流,键入正确的列。这是由重新加密的 Kafka 主题支持的。
    ksql> CREATE STREAM EXPENSE_CODES_REKEY AS SELECT * FROM expense_codes_stream PARTITION BY EXPENSE_CODE;

    Message
    ----------------------------
    Stream created and running
    ----------------------------
  • 重新注册KSQL _TABLE_在重新键入的主题之上:
    ksql> DROP TABLE expense_codes_table;

    Message
    ----------------------------------------
    Source EXPENSE_CODES_TABLE was dropped
    ----------------------------------------
    ksql> CREATE TABLE expense_codes_table WITH (KAFKA_TOPIC='EXPENSE_CODES_REKEY', VALUE_FORMAT='AVRO', KEY='EXPENSE_CODE');

    Message
    ---------------
    Table created
    ---------------
  • 检查新表上的键(声明 vs 消息)匹配:
    ksql> SELECT ROWKEY, EXPENSE_CODE FROM expense_codes_table;
    EXP005 | EXP005
    EXP001 | EXP001
    EXP002 | EXP002
    EXP003 | EXP003
    EXP006 | EXP006
    EXP004 | EXP004
  • 成功加入:
    ksql> SELECT D.DEAL_EXPENSE_CODE, E.EXPENSE_DESC \
    FROM deals D \
    LEFT JOIN expense_codes_table E \
    ON D.DEAL_EXPENSE_CODE = E.EXPENSE_CODE \
    WINDOW TUMBLING (SIZE 3 MINUTE) \
    GROUP BY D.DEAL_EXPENSE_CODE, E.EXPENSE_DESC;

    EXP006 | KLM International
    EXP003 | Apple Corporation
    EXP002 | ABC - Sofia
    EXP004 | Confluent Europe
    EXP001 | Regulatory Deposit
    EXP005 | Air India
  • 关于apache-kafka - 汇合 4.1.0 -> KSQL : STREAM-TABLE join -> table data null,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50102662/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com