gpt4 book ai didi

regex - 使用水槽将csv数据加载到多列的hbase表中

转载 作者:行者123 更新时间:2023-12-04 15:57:04 25 4
gpt4 key购买 nike

假脱机目录 CSV 文件格式:sample.csv

8600000US00601,00601,006015-DigitZCTA,0063-DigitZCTA,11102
8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869
8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423
8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548
8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603

我的 Flume.Conf 代码:

agent.sources  = spool
agent.channels = fileChannel2
agent.sinks = sink2

agent.sources.spool.type = spooldir
agent.sources.spool.spoolDir = /home/cloudera/cloudera
agent.sources.spool.fileSuffix = .completed
agent.sources.spool.channels = fileChannel2
#agent.sources.spool.deletePolicy = immediate

agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.sink2.channel = fileChannel2
agent.sinks.sink2.table = sample
agent.sinks.sink2.columnFamily = s1
agent.sinks.sink2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.sink1.serializer.regex = ^([^,]+),([^,]+),([^,]+),([^,]+)$
#agent.sinks.sink2.serializer.regexIgnoreCase = true
agent.sinks.sink1.serializer.colNames =col1,col2,col3,col4
agent.sinks.sink2.batchSize = 100
agent.channels.fileChannel2.type=memory

我可以使用 flume 将数据加载到单个列中,但无法使用正则表达式将其加载到多个列中,任何帮助,以便我可以将其加载到 hbase 中的多个列中。谢谢。

最佳答案

这样的东西对我有用:

agent.sinks.s1.type = hbase 
agent.sinks.s1.table = test
agent.sinks.s1.columnFamily = r
agent.sinks.s1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.s1.serializer.rowKeyIndex = 0
agent.sinks.s1.serializer.regex = ^(\\S+),(\\d+),(\\d+),(\\d)$
agent.sinks.s1.serializer.colNames = ROW_KEY,r:colA,r:colB,r:colC

如果你想指定 rowkey 而不是随机的,你可以使用:

agent.sinks.s1.serializer.rowKeyIndex = 0 
agent.sinks.s1.serializer.colNames = ROW_KEY,r:colA,r:colB,r:colC

如果您想获得更大的灵 active ,请点击此处链接。 http://www.rittmanmead.com/2014/05/trickle-feeding-log-data-into-hbase-using-flume/

简而言之,我认为是因为正则表达式不正确。

关于regex - 使用水槽将csv数据加载到多列的hbase表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25976407/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com