gpt4 book ai didi

docker - 发送端口号之前,Spark Docker Java网关进程已退出

转载 作者:行者123 更新时间:2023-12-02 18:33:48 25 4
gpt4 key购买 nike

我对Docker来说还很陌生,并且正在尝试使docker-compose文件同时与airflow和pyspark一起运行。以下是我到目前为止的内容:

version: '3.7'
services:
master:
image: gettyimages/spark
command: bin/spark-class org.apache.spark.deploy.master.Master -h master
hostname: master
environment:
MASTER: spark://master:7077
SPARK_CONF_DIR: /conf
SPARK_PUBLIC_DNS: localhost
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7077
- 6066
ports:
- 4040:4040
- 6066:6066
- 7077:7077
- 8080:8080
volumes:
- ./conf/master:/conf
- ./data:/tmp/data

worker:
image: gettyimages/spark
command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
hostname: worker
environment:
SPARK_CONF_DIR: /conf
SPARK_WORKER_CORES: 2
SPARK_WORKER_MEMORY: 1g
SPARK_WORKER_PORT: 8881
SPARK_WORKER_WEBUI_PORT: 8081
SPARK_PUBLIC_DNS: localhost
links:
- master
expose:
- 7012
- 7013
- 7014
- 7015
- 8881
ports:
- 8081:8081
volumes:
- ./conf/worker:/conf
- ./data:/tmp/data
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"

webserver:
image: puckel/docker-airflow:1.10.9
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=y
- EXECUTOR=Local
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- ./dags:/usr/local/airflow/dags
# Add this to have third party packages
- ./requirements.txt:/requirements.txt
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8082:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
我试图运行以下简单的DAG,只是为了确认pyspark是否正常运行:
import pyspark
from airflow.models import DAG
from airflow.utils.dates import days_ago, timedelta
from airflow.operators.python_operator import PythonOperator
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

import random

args = {
"owner": "ian",
"start_date": days_ago(1)
}

dag = DAG(dag_id="pysparkTest", default_args=args, schedule_interval=None)


def run_this_func(**context):
sc = pyspark.SparkContext()
print(sc)

with dag:
run_this_task = PythonOperator(
task_id='run_this',
python_callable=run_this_func,
provide_context=True,
retries=10,
retry_delay=timedelta(seconds=1)
)

当我这样做时,它失败并显示错误 Java gateway process exited before sending its port number。我发现有几条帖子说要运行 export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"命令,而我尝试将其作为命令运行:
version: '3.7'
services:
master:
image: gettyimages/spark
command: >
sh -c "bin/spark-class org.apache.spark.deploy.master.Master -h master
&& export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell""
hostname: master
...
但是我仍然遇到同样的错误。有什么想法我做错了吗?

最佳答案

我认为您不需要修改master的命令。像对待here一样保留它。
另外,您如何期望在不同容器上运行的python代码连接主容器。我认为您应该将其添加到spark-context中,例如:

def run_this_func(**context):
sc = pyspark.SparkContext("spark://master:7077")
print(sc)

关于docker - 发送端口号之前,Spark Docker Java网关进程已退出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64502364/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com