gpt4 book ai didi

hadoop - Spark:增加任务/分区的数量

转载 作者:行者123 更新时间:2023-12-02 20:51:40 25 4
gpt4 key购买 nike

The number of tasks in Spark is decided by the total number of RDD partitions at the beginning of stages. For example, when a Spark application is reading data from HDFS, the partition method for Hadoop RDD is inherited from FileInputFormat in MapReduce, which is affected by the size of HDFS blocks, the value of mapred.min.split.size and the compression method, etc.



屏幕截图中的任务花费了7、7、4秒,我想使它们保持平衡。另外,该阶段分为3个任务,是否有任何方法可以指定Spark的分区/任务数?

最佳答案

任务取决于分区。您可以为RDD设置分区程序,在分区程序中可以设置分区数。

关于hadoop - Spark:增加任务/分区的数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45792042/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com