gpt4 book ai didi

hadoop - 合流:Hdfs转换为avro格式,但是在 hive 中读取avro文件时,我的时间比 “timezone”提前5:30小时: “Asia/Kolkata”

转载 作者:行者123 更新时间:2023-12-02 20:29:55 24 4
gpt4 key购买 nike

我的HDFS-Sink连接:

{
"name":"hdfs-sink1",
"config":{
"connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max":"3",
"topics":"mysql-prod-registrations-",
"hadoop.conf.dir":"/usr/hdp/current/hadoop-client/conf",
"hadoop.home":"/usr/hdp/current/hadoop-client",
"hdfs.url":"hdfs://HACluster:8020",
"topics.dir":"/topics",
"logs.dir":"/logs",
"flush.size":"100",
"rotate.interval.ms":"60000",
"format.class":"io.confluent.connect.hdfs.avro.AvroFormat",
"value.converter.schemas.enable": "true",
"partitioner.class":"io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner",
"partition.duration.ms":"1800000",
"path.format":"'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH/",
"locale":"kor",
"timezone":"Asia/Kolkata"
}
}

但是在阅读 hive 时,我比时区早5:30小时:“” Asia / Kolkata“。如何获取印度时区的时间戳值?

连接正常运行了两天,但是现在iam出现错误,如下所示:
 ERROR WorkerSinkTask{id=hdfs-sink1-2} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
java.lang.NullPointerException
at io.confluent.connect.hdfs.HdfsSinkTask.open(HdfsSinkTask.java:133)
at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:612)
at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1100(WorkerSinkTask.java:69)
at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:672)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:283)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:422)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:352)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:337)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:343)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1181)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:444)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:317)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2018-12-14 12:22:39,670] ERROR WorkerSinkTask{id=hdfs-sink1-2} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:178)

最佳答案

Asia/Kolkata在UTC之前是+05:30,所以这是有道理的... timezone配置仅适用于path.format值,不适用于您的Kafka记录的内部值。

我不确定要使用哪个工具进行查询,但是那里可能存在问题,因为我有一些工具假定数据仅以UTC时间写入,然后该工具将“移动”并“显示”格式化的本地时间戳...因此,我建议使HDFS接收器连接器实际以UTC时间写入,然后让您的SQL工具和操作系统自行处理实际的TZ转换。

关于hadoop - 合流:Hdfs转换为avro格式,但是在 hive 中读取avro文件时,我的时间比 “timezone”提前5:30小时: “Asia/Kolkata”,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53744619/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com