gpt4 book ai didi

hadoop - Flume 代理 : add host to message, 然后发布到 kafka 主题

转载 作者:可可西里 更新时间:2023-11-01 14:22:26 25 4
gpt4 key购买 nike

我们开始通过向 Kafka 主题发布消息来整合应用程序的事件日志数据。虽然我们可以直接从应用程序写入 Kafka,但我们选择将其视为一般问题并使用 Flume 代理。这提供了一些灵 active :如果我们想从服务器捕获其他内容,我们可以拖尾不同的来源并发布到不同的 Kafka 主题。

我们创建了一个 Flume 代理配置文件来跟踪日志并发布到 Kafka 主题:

tier1.sources  = source1
tier1.channels = channel1
tier1.sinks = sink1

tier1.sources.source1.type = exec
tier1.sources.source1.command = tail -F /var/log/some_log.log
tier1.sources.source1.channels = channel1

tier1.channels.channel1.type = memory
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactionCapacity = 1000

tier1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink
tier1.sinks.sink1.topic = some_log
tier1.sinks.sink1.brokerList = hadoop01:9092,hadoop02.com:9092,hadoop03.com:9092
tier1.sinks.sink1.channel = channel1
tier1.sinks.sink1.batchSize = 20

不幸的是,消息本身并没有指定生成它们的主机。如果我们有一个应用程序在多个主机上运行并且发生错误,我们无法确定是哪个主机生成了消息。

我注意到,如果 Flume 直接写入 HDFS,我们可以 use a Flume interceptor写入特定的 HDFS 位置。虽然我们可以对 Kafka 做一些类似的事情,即为每个服务器创建一个新主题,但这可能会变得笨拙。我们最终会得到数千个主题。

Flume 可以在发布到 Kafka 主题时附加/包含原始主机的主机名吗?

最佳答案

您可以创建一个自定义 TCP 源,它读取客户端地址并将其添加到 header 。

@Override
public void configure(Context context) {
port = context.getInteger("port");
buffer = context.getInteger("buffer");

try{
serverSocket = new ServerSocket(port);
logger.info("FlumeTCP source initialized");
}catch(Exception e) {
logger.error("FlumeTCP source failed to initialize");
}
}

@Override
public void start() {
try {
clientSocket = serverSocket.accept();
receiveBuffer = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
logger.info("Connection established with client : " + clientSocket.getRemoteSocketAddress());
final ChannelProcessor channel = getChannelProcessor();
final Map<String, String> headers = new HashMap<String, String>();
headers.put("hostname", clientSocket.getRemoteSocketAddress().toString());
String line = "";
List<Event> events = new ArrayList<Event>();

while ((line = receiveBuffer.readLine()) != null) {
Event event = EventBuilder.withBody(
line, Charset.defaultCharset(),headers);

logger.info("Event created");
events.add(event);
if (events.size() == buffer) {
channel.processEventBatch(events);
}
}
} catch (Exception e) {

}
super.start();
}

flume-conf.properties 可以配置为:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.


# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

agent.sources = CustomTcpSource
agent.channels = memoryChannel
agent.sinks = loggerSink

# For each one of the sources, the type is defined
agent.sources.CustomTcpSource.type = com.vishnu.flume.source.CustomFlumeTCPSource
agent.sources.CustomTcpSource.port = 4443
agent.sources.CustomTcpSource.buffer = 1


# The channel can be defined as follows.
agent.sources.CustomTcpSource.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.loggerSink.type = logger

#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100

我发送了一条测试消息来测试它,它看起来像:

Event: { headers:{hostname=/127.0.0.1:50999} body: 74 65 73 74 20 6D 65 73 73 61 67 65             test message }

我已将项目上传到我的 github

关于hadoop - Flume 代理 : add host to message, 然后发布到 kafka 主题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33746103/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com