gpt4 book ai didi

r - Spark + rsparkling : Error while connecting to a cluster

转载 作者:可可西里 更新时间:2023-11-01 15:27:21 24 4
gpt4 key购买 nike

有一段时间我使用 sparklyr 包连接到公司的 Hadoop 集群,使用代码:

library(sparklyr)

Sys.setenv(SPARK_HOME="/opt/spark/")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/jre")

system('kinit -k -t user.keytab user@xyz')

sc <- spark_connect(master="yarn",
config = list(
default = list(
spark.submit.deployMode= "client",
spark.yarn.keytab= "user.keytab",
spark.yarn.principal= "user@xyz",
spark.executor.instances= 20,
spark.executor.memory= "4G",
spark.executor.cores= 4,
spark.driver.memory= "8G")))

一切正常,但是当我尝试使用类似代码添加 rsparkling 包时:

library(h2o)
library(rsparkling)
library(sparklyr)

options(rsparkling.sparklingwater.version = '2.0')

Sys.setenv(SPARK_HOME="/opt/spark/")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/jre")

system('kinit -k -t user.keytab user@xyz')

sc <- spark_connect(master="yarn",
config = list(
default = list(
spark.submit.deployMode= "client",
spark.yarn.keytab= "user.keytab",
spark.yarn.principal= "user@xyz",
spark.executor.instances= 20,
spark.executor.memory= "4G",
spark.executor.cores= 4,
spark.driver.memory= "8G")))

我遇到错误:

Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (9819): Sparklyr gateway did not respond while retrieving ports information after 60 seconds Path: /opt/spark-2.0.0-bin-hadoop2.6/bin/spark-submit Parameters: --class, sparklyr.Backend, --packages, 'ai.h2o:sparkling-water-core_2.11:2.0','ai.h2o:sparkling-water-ml_2.11:2.0','ai.h2o:sparkling-water-repl_2.11:2.0', '/usr/lib64/R/library/sparklyr/java/sparklyr-2.0-2.11.jar', 8880, 9819

---- Output Log ----
Ivy Default Cache set to: /opt/users/user/.ivy2/cache The jars for the packages stored in: /opt/users/user/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.0.0-bin-hadoop2.6/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml ai.h2o#sparkling-water-core_2.11 added as a dependency ai.h2o#sparkling-water-ml_2.11 added as a dependency ai.h2o#sparkling-water-repl_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]

---- Error Log ----
In addition: Warning messages: 1: In if (nchar(config[[e]]) == 0) found <- FALSE : the condition has length 1 and only the first element will be used 2: In if (nchar(config[[e]]) == 0) found <- FALSE : the condition has length 1 and only the first element will be used

我是 sparkclusters 的新手,不确定现在该做什么。任何帮助将不胜感激。我的第一个想法是在 cluster 端缺少用于 sparkling waterjar 文件,对吗?

最佳答案

您需要使用确切的 Sparkling Water 版本号:


选项(rsparkling.sparklingwater.version = '2.0.5')

或者你可以直接从http://h2o.ai/download下载二进制版的Sparkling Version , 解压缩并用以下语句替换上面的语句:


选项(rsparkling.sparklingwater.location =“/tmp/sparkling-water-assembly_2.11-2.0.99999-SNAPSHOT-all.jar”)

关于r - Spark + rsparkling : Error while connecting to a cluster,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42227531/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com