gpt4 book ai didi

hadoop - 使用 Yarn 客户端在 Google Cloud 上的 Hadoop 中运行 JAR

转载 作者:可可西里 更新时间:2023-11-01 16:31:15 26 4
gpt4 key购买 nike

我想使用 Yarn 客户端在 Google Cloud 上的 Hadoop 中运行一个 JAR。

我在hadoop的master节点使用这个命令

spark-submit --class find --master yarn-client find.jar

但它返回这个错误

    15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

问题是什么?如果它有用,这是我的 yarn-site.xml

<?xml version="1.0" ?>
<!--
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/yarn-logs/</value>
<description>
The remote path, on the default FS, to store logs.
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-m-on8g</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5999</value>
<description>

最佳答案

在您的情况下,看起来 YARN ResourceManager 可能由于未知原因不健康;您可以尝试使用以下方法修复 yarn :

sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh

但是,看起来您正在使用“点击部署”解决方案;由于一些错误和缺少内存配置,Click-to-Deploy 的 Spark + Hadoop 2 部署目前实际上不支持 YARN 上的 Spark。如果你只是尝试使用开箱即用的 --master yarn-client 运行它,你通常会遇到这样的事情:

15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED

15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED

15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1434561664937
yarnAppState: RUNNING

15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}

得到良好支持的部署方式是使用 bdutil 在 Google Compute Engine 上使用 Hadoop 2 和 Spark 配置为能够在 YARN 上运行的集群。你会运行类似的东西:

./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d  \
-e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy

# Shorthand for logging in to the master
./bdutil -e my_custom_env.sh shell

# Handy way to run a socks proxy to make it easy to access the web UIs
./bdutil -e my_custom_env.sh socksproxy

# When done, delete your cluster
./bdutil -e my_custom_env.sh delete

使用 spark_on_yarn_env.sh Spark 应默认为 yarn-client,但您始终可以重新指定 --master yarn-client 如果你要。您可以使用 ./bdutil --help 查看 bdutil 中可用标志的更详细说明。以下是我在上面包含的标志的帮助条目:

-b, --bucket
Google Cloud Storage bucket used in deployment and by the cluster.

-d, --use_attached_pds
If true, uses additional non-boot volumes, optionally creating them on
deploy if they don't exist already and deleting them on cluster delete.

-e, --env_var_files
Comma-separated list of bash files that are sourced to configure the cluster
and installed software. Files are sourced in order with later files being
sourced last. bdutil_env.sh is always sourced first. Flag arguments are
set after all sourced files, but before the evaluate_late_variable_bindings
method of bdutil_env.sh. see bdutil_env.sh for more information.

-P, --prefix
Common prefix for cluster nodes.

-p, --project
The Google Cloud Platform project to use to create the cluster.

-z, --zone
Specify the Google Compute Engine zone to use.

关于hadoop - 使用 Yarn 客户端在 Google Cloud 上的 Hadoop 中运行 JAR,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30888753/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com