gpt4 book ai didi

amazon-ec2 - Ray 未在 EC2 上启动 worker

转载 作者:行者123 更新时间:2023-12-05 06:27:02 28 4
gpt4 key购买 nike

我正在使用 Ray 模块在 AWS EC2 上启动 Ubuntu (16.04) 集群。在配置中,我将 min_workers、max_workers 和 initial_workers 指定为 2,因为我不需要任何自动调整大小。我还想要一个 t2.micro 主节点和 c4.8xlarge worker 。集群启动,但只有主节点(以下终端输出是从 ray 安装开始的,......减去细节):-

2019-04-18 14:52:48,462 INFO updater.py:268 -- NodeUpdater: Running pip3 install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
Collecting ray==0.7.0.dev2 from https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl
Downloading https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl (56.2MB)
.....
.....
Successfully built pyyaml
Installing collected packages: click, colorama, six, redis, typing, filelock, flatbuffers, numpy, pyyaml, more-itertools, setuptools, attrs, atomicwrites, pluggy, py, pathlib2, pytest, funcsigs, ray
Successfully installed atomicwrites attrs click colorama filelock flatbuffers funcsigs more-itertools numpy pathlib2 pluggy py pytest pyyaml-3.11 ray redis setuptools-20.7.0 six-1.10.0 typing
You are using pip version 8.1.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
2019-04-18 14:53:32,656 INFO updater.py:268 -- NodeUpdater: Running pip3 install boto3==1.4.8 on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
Collecting boto3==1.4.8
Downloading https://files.pythonhosted.org/packages/7d/09/66fef826fb13a2cee74a1df56c269d2794a90ece49c3b77113b733e4b91d/boto3-1.4.8-
....
....
Installing collected packages: docutils, jmespath, six, python-dateutil, botocore, s3transfer, boto3
Successfully installed boto3-1.4.8 botocore-1.8.50 docutils-0.14 jmespath-0.9.4 python-dateutil-2.8.0 s3transfer-0.1.13 six-1.12.0
You are using pip version 8.1.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
2019-04-18 14:53:37,805 INFO updater.py:268 -- NodeUpdater: Running ray stop on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.
2019-04-18 14:53:39,775 INFO updater.py:268 -- NodeUpdater: Running ulimit -n 65536; ray start --head --redis-port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml on 54.226.178.23...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-04-18 18:53:40,167 INFO scripts.py:288 -- Using IP address 172.31.7.117 for this node.
2019-04-18 18:53:40,167 INFO node.py:469 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-04-18_18-53-40_7981/logs.
2019-04-18 18:53:40,271 INFO services.py:407 -- Waiting for redis server at 127.0.0.1:6379 to respond...
2019-04-18 18:53:40,389 INFO services.py:407 -- Waiting for redis server at 127.0.0.1:60491 to respond...
2019-04-18 18:53:40,390 INFO services.py:804 -- Starting Redis shard with 0.21 GB max memory.
2019-04-18 18:53:40,400 INFO node.py:483 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-04-18_18-53-40_7981/logs.
2019-04-18 18:53:40,410 INFO services.py:1439 -- Starting the Plasma object store with 0.31 GB memory using /dev/shm.
2019-04-18 18:53:40,421 WARNING services.py:907 -- Failed to start the reporter. The reporter requires 'pip install psutil'.
WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.
2019-04-18 18:53:40,425 INFO scripts.py:319 --
Started Ray on this node. You can add additional nodes to the cluster by calling

ray start --redis-address 172.31.7.117:6379

from the node you wish to add. You can connect a driver to the cluster from Python by running

import ray
ray.init(redis_address="172.31.7.117:6379")

If you have trouble connecting from a different machine, check that your firewall is configured properly. If you wish to terminate the processes that have been started, run

ray stop
2019-04-18 14:53:40,593 INFO log_timer.py:21 -- NodeUpdater: i-064f62badf69f8cee: Setup commands completed [LogTimer=115941ms]
2019-04-18 14:53:40,593 INFO log_timer.py:21 -- NodeUpdater: i-064f62badf69f8cee: Applied config 248f16e493ac5bcd753a673eb7202fa2b49e0f9f [LogTimer=173814ms]
2019-04-18 14:53:40,973 INFO log_timer.py:21 -- AWSNodeProvider: Set tag ray-node-status=up-to-date on ['i-064f62badf69f8cee'] [LogTimer=374ms]
2019-04-18 14:53:41,069 INFO commands.py:264 -- get_or_create_head_node: Head node up-to-date, IP address is: 54.226.178.23
To monitor auto-scaling activity, you can run:

ray exec ray_config.yaml 'tail -n 100 -f /tmp/ray/session_*/logs/monitor*'

To open a console on the cluster:

ray attach ray_config.yaml

To ssh manually to the cluster, run:

ssh -i /home/haines/.ssh/ray-autoscaler_us-east-1.pem ubuntu@54.226.178.23

2019-04-18 14:53:41,181 INFO log_timer.py:21 -- AWSNodeProvider: Set tag ray-runtime-config=248f16e493ac5bcd753a673eb7202fa2b49e0f9f on ['i-064f62badf69f8cee']

我使用标准配置 (example-full.yaml) 并进行了以下更改:-

min_workers: 2

initial_workers: 2

type: aws
region: us-east-1
availability_zone: us-east1a,us-east-1b


head_node:
InstanceType: t2.micro
ImageId: ami-0565af6e282977273 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20190212

worker_nodes:
InstanceType: c4.8xlarge
ImageId: ami-0f9cf087c1f27d9b1 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20181114

#MarketType: spot

setup_commands:

- echo 'export PATH="$HOME/anaconda3/envs/tensorflow_p36/bin:$PATH"' >> ~/.bashrc
- sudo apt-get update
- sudo apt-get install python3-pip
- pip3 install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl

- pip3 install boto3==1.4.8 # 1.4.8 adds InstanceMarketOptions

最近失败的设置:-

setup_commands:
- sudo apt-get update
- wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh || true 1>/dev/null
- bash Anaconda3-5.0.1-Linux-x86_64.sh -b -p $HOME/anaconda3 || true 1>/dev/null
- echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> ~/.bashrc
- sudo pkill -9 apt-get || true
- sudo pkill -9 dpkg || true
- sudo dpkg --configure -a
- sudo apt-get install python3-pip || true
- pip3 install --upgrade pip
- pip3 install --user psutil
- pip3 install --user proctitle
- pip3 install --user ray
- pip3 install --user boto3==1.4.8
- pip3 install --user https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.7.0.dev2-cp35-cp35m-manylinux1_x86_64.whl

最佳答案

我运行了你发布的配置的一个稍微修改过的版本,这对我有用

cluster_name: test

min_workers: 2

initial_workers: 2

provider:
type: aws
region: us-east-1
availability_zone: us-east1a,us-east-1b

head_node:
InstanceType: t2.micro
ImageId: ami-0565af6e282977273 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20190212

worker_nodes:
InstanceType: c4.8xlarge
ImageId: ami-0f9cf087c1f27d9b1 # ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20181114
#MarketType: spot

setup_commands:
- sudo apt-get update
# Install Anaconda.
- wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh || true
- bash Anaconda3-5.0.1-Linux-x86_64.sh -b -p $HOME/anaconda3 || true
- echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> ~/.bashrc
# Install Ray.
- pip install ray
- pip install boto3==1.4.8 # 1.4.8 adds InstanceMarketOptions

我认为唯一真正的区别是安装 Anaconda Python 并将其放入 PATH 以便 pip 正确找到它。我怀疑这个问题与找不到正确的 Python 版本有关。

关于amazon-ec2 - Ray 未在 EC2 上启动 worker ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55660635/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com