gpt4 book ai didi

apache-kafka - Kafka 连接 partition.duration.ms 和刷新大小之间的属性关系?

转载 作者:行者123 更新时间:2023-12-04 13:20:59 31 4
gpt4 key购买 nike

有人可以解释以下配置中 partition.duration.ms 和 flushsize 的意义吗?
设置这些属性背后的想法是什么?

"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"s3.region": "eu-central-1",
"partition.duration.ms": "1000",
"topics.dir": "root_bucket",
"flush.size": "10",
"topics": "TEST_SRV",
"tasks.max": "1",
"s3.part.size": "5242880",
"timezone": "UTC",
"locale": "US",
"key.converter.schemas.enable": "true",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"s3.bucket.name": "events-dev-s3",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"path.format": "'year'-YYYY/'month'-MM/'day'-dd/'hour'-HH",
"timestamp.extractor": "RecordField",
"timestamp.field": "event_data.created_at"

最佳答案

a 1 second partition duration doesn't make sense because you've set the partitioner to only make hourly partitions.



分区器有 不是 已设置为仅进行每小时分区。
"path.format": "'year'-YYYY/'month'-MM/'day'-dd/'hour'-HH"
这设置了 目录结构粒度小时
"partition.duration.ms": "1000"
这将连接器配置为为每一“秒”的数据输出一个文件(..每个输入分区)

这些文件将被写入包含生成文件的“第二个”的“每小时”目录。

IE。每小时目录将包含该小时的所有数据(在这种情况下,所有每秒文件)

关于apache-kafka - Kafka 连接 partition.duration.ms 和刷新大小之间的属性关系?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52760883/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com