gpt4 book ai didi

Hadoop 纱 : How to limit dynamic self allocation of resources with Spark?

转载 作者:可可西里 更新时间:2023-11-01 14:17:06 27 4
gpt4 key购买 nike

在我们在 Yarn 下运行的 Hadoop 集群中,我们遇到了一个问题,即一些“更聪明”的人能够通过在 pySpark Jupyter 笔记本中配置 Spark 作业来消耗大得多的资源 block ,例如:

conf = (SparkConf()
.setAppName("name")
.setMaster("yarn-client")
.set("spark.executor.instances", "1000")
.set("spark.executor.memory", "64g")
)

sc = SparkContext(conf=conf)

这导致了这些人从字面上排挤其他不那么“聪明”的人的情况。

有没有办法禁止用户自行分配资源,而将资源分配完全留给 Yarn?

最佳答案

YARN 对 Multi-Tenancy 集群中的队列容量规划有很好的支持,YARN ResourceManager 默认使用CapacityScheduler

这里我们在 spark 提交中将队列名称作为 alpha 用于演示目的。

$ ./bin/spark-submit --class path/to/class/file \
--master yarn-cluster \
--queue alpha \
jar/location \
args

设置队列:

CapacityScheduler 有一个名为 root 的预定义队列。系统中的所有队列都是根队列的子队列。在capacity-scheduler.xml中,参数yarn.scheduler.capacity.root.queues用于定义子队列;

例如,要创建 3 个队列,请在逗号分隔列表中指定队列的名称。

<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>alpha,beta,default</value>
<description>The queues at the this level (root is the root queue).</description>
</property>

这些是容量规划需要考虑的几个重要属性。

<property>
<name>yarn.scheduler.capacity.root.alpha.capacity</name>
<value>50</value>
<description>Queue capacity in percentage (%) as a float (e.g. 12.5). The sum of capacities for all queues, at each level, must be equal to 100. Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.</description>
</property>

<property>
<name>yarn.scheduler.capacity.root.alpha.maximum-capacity</name>
<value>80</value>
<description>Maximum queue capacity in percentage (%) as a float. This limits the elasticity for applications in the queue. Defaults to -1 which disables it.</description>
</property>

<property>
<name>yarn.scheduler.capacity.root.alpha.minimum-capacity</name>
<value>10</value>
<description>Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is demand for resources. The user limit can vary between a minimum and maximum value. The former (the minimum value) is set to this property value and the latter (the maximum value) depends on the number of users who have submitted applications. For e.g., suppose the value of this property is 25. If two users have submitted applications to a queue, no single user can use more than 50% of the queue resources. If a third user submits an application, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queues resources. A value of 100 implies no user limits are imposed. The default is 100. Value is specified as a integer.</description>
</property>

链接:YARN CapacityScheduler Queue Properties

关于Hadoop 纱 : How to limit dynamic self allocation of resources with Spark?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39999015/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com