gpt4 book ai didi

hadoop - 使用 Hive Sink 将水槽输出保存到 Hive 表

转载 作者:可可西里 更新时间:2023-11-01 14:16:52 25 4
gpt4 key购买 nike

我正在尝试使用 Hive 配置 flume,以将 flume 输出保存到 Hive Sink 类型的 hive 表。我有单节点集群。我使用 mapr hadoop 发行版。

这是我的 flume.conf

agent1.sources = source1
agent1.channels = channel1
agent1.sinks = sink1

agent1.sources.source1.type = exec
agent1.sources.source1.command = cat /home/andrey/flume_test.data

agent1.sinks.sink1.type = hive
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.hive.metastore = thrift://127.0.0.1:9083
agent1.sinks.sink1.hive.database = default
agent1.sinks.sink1.hive.table = flume_test
agent1.sinks.sink1.useLocalTimeStamp = false
agent1.sinks.sink1.round = true
agent1.sinks.sink1.roundValue = 10
agent1.sinks.sink1.roundUnit = minute
agent1.sinks.sink1.serializer = DELIMITED
agent1.sinks.sink1.serializer.delimiter = ","
agent1.sinks.sink1.serializer.serdeSeparator = ','
agent1.sinks.sink1.serializer.fieldnames = id,message

agent1.channels.channel1.type = FILE
agent1.channels.channel1.transactionCapacity = 1000000
agent1.channels.channel1.checkpointInterval 30000
agent1.channels.channel1.maxFileSize = 2146435071
agent1.channels.channel1.capacity 10000000
agent1.sources.source1.channels = channel1

我的数据flume_test.data

1,AAAAAAAA
2,BBBBBBB
3,CCCCCCCC
4,DDDDDD
5,EEEEEEE
6,FFFFFFFFFFF
7,GGGGGG
8,HHHHHHH
9,IIIIII
10,JJJJJJ
11,KKKKKK
12,LLLLLLLL
13,MMMMMMMMM
14,NNNNNNNNN
15,OOOOOOOO
16,PPPPPPPPPP
17,QQQQQQQ
18,RRRRRRR
19,SSSSSSSS

这就是我在 Hive 中创建表的方式

create table flume_test(id string, message string)
clustered by (message) into 1 buckets
STORED AS ORC tblproperties ("orc.compress"="NONE");

当我只使用 1 个桶时,在 hive shell 中从 flume_test 命令中选择 * 只返回 OK 状态,没有数据。如果我使用超过 1 个桶,它会返回错误消息。

例如在配置单元表选择后有 5 个桶的错误:

hive> select * from flume_test;
OK
2015-06-18 10:04:57,6909 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,6941 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,6976 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,7044 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
Time taken: 0.914 seconds

Hive表数据保存在/user/hive/warehouse/flume_test目录下,不为空。

-rwxr-xr-x   3 andrey andrey          4 2015-06-17 16:28 /user/hive/warehouse/flume_test/_orc_acid_version
drwxr-xr-x - andrey andrey 2 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400

增量目录包含

-rw-r--r--   3 andrey andrey        991 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000
-rwxr-xr-x 3 andrey andrey 8 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000_flush_length

即使使用 pig,我也无法读取/user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000 orc 文件。

我还尝试在 hive 中创建表后设置此变量,但这没有给出结果。

set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads = 2;

我在网上找了几个例子,但都不是很全,而且我是flume新手,看不懂)

最佳答案

将这两行添加到我的配置中解决了我的问题,但是从配置单元读取表时我仍然有错误。我可以读取表格,它返回正确的结果但有错误

agent1.sinks.sink1.hive.txnsPerBatchAsk = 2
agent1.sinks.sink1.batchSize = 10

关于hadoop - 使用 Hive Sink 将水槽输出保存到 Hive 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30908641/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com