gpt4 book ai didi

java - Apache Spark 上的远程作业 (Java)

转载 作者:行者123 更新时间:2023-12-01 09:42:50 26 4
gpt4 key购买 nike

我在新安装的 Ubuntu 服务器上设置了一个新的独立 Apache Spark 服务器。我尝试将我的第一份工作发送到那里,但不太成功。

这是我在本地所做的事情:

    SparkConf conf = new SparkConf().setAppName("myFirstJob").setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
javaSparkContext.setLogLevel("WARN");
SQLContext sqlContext = new SQLContext(javaSparkContext);

System.out.println("Hello, Remote Spark v." + javaSparkContext.version());

DataFrame df;
df = sqlContext.read().option("dateFormat", "yyyy-mm-dd")
.json("./src/main/resources/north-carolina-school-performance-data.json");
df = df.withColumn("district", df.col("fields.district"));
df = df.groupBy("district").count().orderBy(df.col("district"));
df.show(150);

它有效:它显示北卡罗来纳州学区的名称以及该学区的学校数量:

Hello, Remote Spark v.1.6.1
+--------------------+-----+
| district|count|
+--------------------+-----+
|Alamance-Burlingt...| 34|
|Alexander County ...| 10|
|Alleghany County ...| 4|
|Anson County Schools| 10|
| Ashe County Schools| 5|
|Asheboro City Sch...| 8|
...

现在,如果我将第一行更改为:

SparkConf conf = new SparkConf().setAppName("myFirstJob").setMaster("spark://10.0.100.120:7077");

它很适合:

Hello, Remote Spark v.1.6.1
16/07/12 10:58:34 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/07/12 10:58:49 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

(对我来说)第一件奇怪的是服务器有 Spark 1.6.2。我有点期待看到 1.6.2 作为版本号。

然后在用户界面上,我去那里看到:

Spark UI

如果我点击 app-20160712105816-0011,我会得到:

Detail of app

任何点击其他链接都会将我带到本地 Apache Spark 实例。

点击后,我可以看到类似的内容:

App on the client

如果我查看服务器上的日志,我会看到:

16/07/12 10:37:00 INFO Master: Registered app myFirstJob with ID app-20160712103700-0009
16/07/12 10:37:03 INFO Master: Received unregister request from application app-20160712103700-0009
16/07/12 10:37:03 INFO Master: Removing app app-20160712103700-0009
16/07/12 10:37:03 INFO Master: 10.0.100.100:54396 got disassociated, removing it.
16/07/12 10:37:03 INFO Master: 10.0.100.100:54392 got disassociated, removing it.
16/07/12 10:50:44 INFO Master: Registering app myFirstJob
16/07/12 10:50:44 INFO Master: Registered app myFirstJob with ID app-20160712105044-0010
16/07/12 10:51:20 INFO Master: Received unregister request from application app-20160712105044-0010
16/07/12 10:51:20 INFO Master: Removing app app-20160712105044-0010
16/07/12 10:51:20 INFO Master: 10.0.100.100:54682 got disassociated, removing it.
16/07/12 10:51:20 INFO Master: 10.0.100.100:54680 got disassociated, removing it.
16/07/12 10:58:16 INFO Master: Registering app myFirstJob
16/07/12 10:58:16 INFO Master: Registered app myFirstJob with ID app-20160712105816-0011

这一切对我来说似乎都不错......

我之前有一个问题(未解决):Apache Spark Server installation requires Hadoop? Not automatically installed? ,具有相同的环境,但这是一个完全不同的 - 小得多的 - 应用程序。

有什么线索吗?

最佳答案

根据Web UI截图,您的服务器没有worker(slave)。 Spark提供了一些启动集群的脚本。

  • sbin/start-all.sh :启动 conf/slaves 中指定的 master 和worker
  • sbin/start-slaves.sh :仅启动 conf/slaves 中的从站
  • sbin/start-master.sh :在当前机器启动master
  • sbin/start-slave.sh:在当前机器启动slave

如果你的集群配置正确,那么你只需要在master机器上调用start-all.sh来启动一切。

关于java - Apache Spark 上的远程作业 (Java),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38332869/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com