gpt4 book ai didi

java - Spark Streaming 收到警告 "replicated to only 0 peer(s) instead of 1 peers"

转载 作者:搜寻专家 更新时间:2023-10-30 21:45:50 27 4
gpt4 key购买 nike

我使用 Spark Streaming 从 Twitter 接收推文。我收到很多警告说:

replicated to only 0 peer(s) instead of 1 peers

这个警告有什么用?

我的代码是:

    SparkConf conf = new SparkConf().setAppName("Test");
JavaStreamingContext sc = new JavaStreamingContext(conf, Durations.seconds(5));
sc.checkpoint("/home/arman/Desktop/checkpoint");

ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setOAuthConsumerKey("****************")
.setOAuthConsumerSecret("**************")
.setOAuthAccessToken("*********************")
.setOAuthAccessTokenSecret("***************");


JavaReceiverInputDStream<twitter4j.Status> statuses = TwitterUtils.createStream(sc,
AuthorizationFactory.getInstance(cb.build()));

JavaPairDStream<String, Long> hashtags = statuses.flatMapToPair(new GetHashtags());
JavaPairDStream<String, Long> hashtagsCount = hashtags.updateStateByKey(new UpdateReduce());
hashtagsCount.foreachRDD(new saveText(args[0], true));

sc.start();
sc.awaitTerminationOrTimeout(Long.parseLong(args[1]));
sc.stop();

最佳答案

当使用 Spark Streaming 读取数据时,由于容错,传入的数据 block 至少被复制到另一个节点/工作线程。否则,如果运行时从流中读取数据然后失败,则可能会发生此特定数据片段将丢失(它已经从流中读取和删除,并且由于失败它也在工作端丢失)。

引用Spark documentation :

While a Spark Streaming driver program is running, the system receives data from various sources and and divides it into batches. Each batch of data is treated as an RDD, that is, an immutable parallel collection of data. These input RDDs are saved in memory and replicated to two nodes for fault-tolerance.

您的案例中的警告意味着根本不会复制来自流的传入数据。原因可能是您仅使用一个 Spark worker 实例或在本地模式下运行该应用程序。尝试启动更多的 Spark worker 并查看警告是否消失。

关于java - Spark Streaming 收到警告 "replicated to only 0 peer(s) instead of 1 peers",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32583273/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com