gpt4 book ai didi

scala - Spark : Why execution is carried by a master node but not worker nodes?

转载 作者:行者123 更新时间:2023-12-05 04:47:52 25 4
gpt4 key购买 nike

我有一个由一个主节点和两个工作节点组成的 spark 集群。

当执行以下代码从数据库中提取数据时,实际执行是由 master 执行的,而不是 worker 之一。

    sparkSession.read
.format("jdbc")
.option("url", jdbcURL)
.option("user", user)
.option("query", query)
.option("driver", driverClass)
.option("fetchsize", fetchsize)
.option("numPartitions", numPartitions)
.option("queryTimeout", queryTimeout)
.options(options)
.load()

这是预期的行为吗?

有什么方法可以禁止这种行为吗?

最佳答案

Spark 应用程序有两种类型的运行器:驱动程序和执行程序,以及两种类型的操作:转换和操作。根据这个doc :

RDDs support two types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver program after running a computation on the dataset. For example, map is a transformation that passes each dataset element through a function and returns a new RDD representing the results. On the other hand, reduce is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program (although there is also a parallel reduceByKey that returns a distributed dataset).

...

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently. For example, we can realize that a dataset created through map will be used in a reduce and return only the result of the reduce to the driver, rather than the larger mapped dataset.

所以在Spark应用中,有些操作在executors中执行,有些操作在drivers中执行。在 Dataproc 上,执行程序始终位于工作节点上的 YARN 容器中。但是驱动程序可以在主节点或工作节点上。默认称为“客户端模式”,这意味着驱动程序在 YARN 之外的主节点上运行。但是您可以使用 gcloud dataproc jobs submit spark ... --properties spark.submit.deployMode=cluster 启用“集群模式”,这将在工作节点上的 YARN 容器中运行驱动程序。看这个doc了解更多详情。

关于scala - Spark : Why execution is carried by a master node but not worker nodes?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68345018/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com