gpt4 book ai didi

csv - 如何使用 Flume 将一组 csv 文件从我的本地目录复制到 HDFS

转载 作者:可可西里 更新时间:2023-11-01 15:15:07 29 4
gpt4 key购买 nike

如何使用 Flume 将一组 csv 文件从我的本地目录复制到 HDFS?我尝试使用假脱机目录作为我的来源,但未能复制。然后我使用以下水槽配置来获得我的结果:

agent1.sources = tail 
agent1.channels = MemoryChannel-2
agent1.sinks = HDFS
agent1.sources.tail.type = exec
agent1.sources.tail.command = tail -F /home/cloudera/runs/*
agent1.sources.tail.channels = MemoryChannel-2
agent1.sinks.HDFS.channel = MemoryChannel-2
agent1.sinks.HDFS.type = hdfs
agent1.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/cloudera/runs
agent1.sinks.HDFS.hdfs.file.Type = DataStream
agent1.channels.MemoryChannel-2.type = memory

我已将我的文件复制到 hdfs,但它们包含特殊字符,对我没有用。我的本地目录是/home/cloudera/runs,我的 HDFS 目标目录是/user/cloudera/runs。

最佳答案

I used the below flume configuration to get the job done.

#Flume Configuration Starts
# Define a file channel called fileChannel on agent_slave_1
agent_slave_1.channels.fileChannel1_1.type = file
# on linux FS
agent_slave_1.channels.fileChannel1_1.capacity = 200000
agent_slave_1.channels.fileChannel1_1.transactionCapacity = 1000
# Define a source for agent_slave_1
agent_slave_1.sources.source1_1.type = spooldir

# on linux FS
#Spooldir in my case is /home/cloudera/runs
agent_slave_1.sources.source1_1.spoolDir = /home/cloudera/runs/
agent_slave_1.sources.source1_1.fileHeader = false
agent_slave_1.sources.source1_1.fileSuffix = .COMPLETED
agent_slave_1.sinks.hdfs-sink1_1.type = hdfs

#Sink is /user/cloudera/runs_scored under hdfs
agent_slave_1.sinks.hdfs-sink1_1.hdfs.path = hdfs://localhost.localdomain:8020/user/cloudera/runs_scored/
agent_slave_1.sinks.hdfs-sink1_1.hdfs.batchSize = 1000
agent_slave_1.sinks.hdfs-sink1_1.hdfs.rollSize = 268435456
agent_slave_1.sinks.hdfs-sink1_1.hdfs.rollInterval = 0
agent_slave_1.sinks.hdfs-sink1_1.hdfs.rollCount = 50000000
agent_slave_1.sinks.hdfs-sink1_1.hdfs.writeFormat=Text

agent_slave_1.sinks.hdfs-sink1_1.hdfs.fileType = DataStream
agent_slave_1.sources.source1_1.channels = fileChannel1_1
agent_slave_1.sinks.hdfs-sink1_1.channel = fileChannel1_1

agent_slave_1.sinks = hdfs-sink1_1
agent_slave_1.sources = source1_1
agent_slave_1.channels = fileChannel1_1

关于csv - 如何使用 Flume 将一组 csv 文件从我的本地目录复制到 HDFS,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24847441/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com