gpt4 book ai didi

apache-spark - 通过Docker Compose将Spark Master连接到Spark Slave

转载 作者:行者123 更新时间:2023-12-02 19:39:01 24 4
gpt4 key购买 nike

我使用gettyimages作为Spark主容器,而与此同时,我有一个Spark镜像,它将启动一个从属节点。这是我对应的docker文件。

FROM debian:jessie

RUN apt-get update \
&& apt-get install -y locales \
&& dpkg-reconfigure -f noninteractive locales \
&& locale-gen C.UTF-8 \
&& /usr/sbin/update-locale LANG=C.UTF-8 \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Users with other locales should set this in their derivative image
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

RUN apt-get update \
&& apt-get install -y curl unzip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*


# JAVA
ARG JAVA_MAJOR_VERSION=8
ARG JAVA_UPDATE_VERSION=92
ARG JAVA_BUILD_NUMBER=14
ENV JAVA_HOME /usr/jdk1.${JAVA_MAJOR_VERSION}.0_${JAVA_UPDATE_VERSION}

ENV PATH $PATH:$JAVA_HOME/bin
RUN curl -sL --retry 3 --insecure \
--header "Cookie: oraclelicense=accept-securebackup-cookie;" \
"http://download.oracle.com/otn-pub/java/jdk/${JAVA_MAJOR_VERSION}u${JAVA_UPDATE_VERSION}-b${JAVA_BUILD_NUMBER}/server-jre-${JAVA_MAJOR_VERSION}u${JAVA_UPDATE_VERSION}-linux-x64.tar.gz" \
| gunzip \
| tar x -C /usr/ \
&& ln -s $JAVA_HOME /usr/java \
&& rm -rf $JAVA_HOME/man

# HADOOP
ENV HADOOP_VERSION 2.7.2
ENV HADOOP_HOME /usr/hadoop-$HADOOP_VERSION
ENV HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
ENV PATH $PATH:$HADOOP_HOME/bin
RUN curl -sL --retry 3 \
"http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz" \
| gunzip \
| tar -x -C /usr/ \
&& rm -rf $HADOOP_HOME/share/doc \
&& chown -R root:root $HADOOP_HOME

# SPARK
ENV SPARK_VERSION 2.0.1
ENV SPARK_PACKAGE spark-${SPARK_VERSION}-bin-without-hadoop
ENV SPARK_HOME /usr/spark-${SPARK_VERSION}
ENV SPARK_DIST_CLASSPATH="$HADOOP_HOME/etc/hadoop/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/tools/lib/*"
ENV PATH $PATH:${SPARK_HOME}/bin
RUN curl -sL --retry 3 \
"http://d3kbcqa49mib13.cloudfront.net/${SPARK_PACKAGE}.tgz" \
| gunzip \
| tar x -C /usr/ \
&& mv /usr/$SPARK_PACKAGE $SPARK_HOME \
&& chown -R root:root $SPARK_HOME

WORKDIR $SPARK_HOME
CMD ["bin/spark-class","org.apache.spark.deploy.worker.Worker", //TODO: Figure out what this should be]

我想知道如果我通过docker compose进行设置,如何使从属服务器能够访问主控主机和端口。

最佳答案

假设您有docker-compose之类的东西:

version: '2'
services:
spark-master:
image: spark-master
ports:
- "7077:7077"
- "8080:8080"
spark-slave1:
image: spark-slave
ports:
- "8081:8081"
depends_on:
- spark-master

在您的Spark slave的 Dockerfile中,您需要定义master,例如: ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT
但是使用IP地址不是一个好主意。因此,您可以使用docker-compose中的主机名( spark-master是主机名,它已添加到 /etc/hosts中):
CMD ["bin/spark-class","org.apache.spark.deploy.worker.Worker", "spark://spark-master:7077"]

现在您可以转到: DOCKER_IP:8080->在“ worker ”中看到1个 worker DOCKER_IP:8081->并查看worker的详细信息

如果您想拥有更多工作人员,可以向 docker-compose添加其他服务。跟随 docker-compose将创建2个worker-详细信息:1. 8081端口上的worker和 8082端口上的第二个。
version: '2'
services:
spark-master:
image: spark-master
ports:
- "7077:7077"
- "8080:8080"
spark-slave1:
image: spark-slave
ports:
- "8081:8081"
depends_on:
- spark-master
spark-slave2:
image: spark-slave
ports:
- "8082:8081"
depends_on:
- spark-master

关于apache-spark - 通过Docker Compose将Spark Master连接到Spark Slave,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40688398/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com