- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我想了解接收器如何在 Spark Streaming 中工作。根据我的理解,将有一个接收器任务在执行器中运行,用于收集数据并保存为 RDD。当调用 start() 时,接收器开始读取。需要澄清以下内容。
最佳答案
我将根据我使用 Kafka 接收器的经验来回答,这似乎或多或少类似于 Kinesis 中发生的事情。
How many receivers does the Spark Streaming job starts?. Multiple or One.
Is the receiver is implemented as push based or pull based?
StreamingContext
时指定)从 Kafka 中提取数据。
In any case does the receiver can become a bottleneck?
To achieve the degree of parallelism the data should be partitioned across the worker nodes. So for the streaming data the how the data is distributed across the nodes.
If new RDDs are formed on an new node based on batch time interval, how does SparkContext serialize the transform functions to the node after the Job is submitted.
Can the amount of receivers launch be controlled by a parameter?
// This may be your config parameter
val numStreams = 5
val kafkaStreams = (1 to numStreams).map { i => KafkaUtils.createStream(...) }
val unifiedStream = streamingContext.union(kafkaStreams)
unifiedStream.print()
关于apache-spark - Spark : Is receiver in spark streaming a bottleneck?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35980596/
我是一名优秀的程序员,十分优秀!