gpt4 book ai didi

hadoop - Flume Twitter 流媒体问题

转载 作者:行者123 更新时间:2023-12-02 21:05:22 25 4
gpt4 key购买 nike

我正在使用 水槽 1.6.0-cdh5.9.1 使用 Twitter 源流式传输推文。

配置文件如下:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = hadoop, cloudera

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/cloudera/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

对于 Cloudera .jar 依赖项,我构建了 flume-sources-1.0-SNAPSHOT.jar使用 Maven 使用以下依赖项:
<dependencies>
<!-- For the Twitter API -->
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>4.0.6</version>
</dependency>

<!-- Hadoop Dependencies -->
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.6.0-cdh5.9.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-sdk</artifactId>
<version>1.6.0-cdh5.9.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.9.1</version>
<scope>provided</scope>
</dependency>
</dependencies>

现在,当我运行 Flume 代理时,它成功启动,连接到 Twitter,但在最后一行之后停止( 接收状态流 ):
2017-02-08 21:55:12,556 (Twitter Stream consumer-1[initializing]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.
2017-02-08 21:55:46,474 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Connection established.
2017-02-08 21:55:46,474 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Receiving status stream.

在最后一行之后没有任何 react 。它不会终止,不会流式传输任何内容。我查看了 HDFS 位置,那里没有创建任何内容。

有人可以在这里帮助我吗?

最佳答案

问题在于配置TwitterAgent.sources.Twitter.keywords
Twitter Source 可以正常工作,只要在 Firehose 中找到数据,它就会不断拉取推文。我尝试了其他一些最近流行的关键字,效果非常好。

关于hadoop - Flume Twitter 流媒体问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42120670/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com