python - 分布式 tensorflow 如何工作？ (与tf.train.Server一起发出)-6ren

python - 分布式 tensorflow 如何工作？ (与tf.train.Server一起发出)

转载作者：行者123 更新时间：2023-12-03 16:28:30

25

4

我对使用tensorflow的新选项感到有些麻烦，该选项允许我们运行分布式tensorflow。

我只想运行2个任务的2 tf.constant，但是我的代码永无止境。它看起来像这样:

import tensorflow as tf

cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
server = tf.train.Server(cluster,
                        job_name="local",
                        task_index=0)

with tf.Session(server.target) as sess:
    with tf.device("/job:local/replica:0/task:0"):
        const1 = tf.constant("Hello I am the first constant")
    with tf.device("/job:local/replica:0/task:1"):
        const2 = tf.constant("Hello I am the second constant")
    print sess.run([const1, const2])

而且我有以下代码有效(仅在一个localhost:2222上运行):

import tensorflow as tf

cluster = tf.train.ClusterSpec({"local": ["localhost:2222"]})
server = tf.train.Server(cluster,
                        job_name="local",
                        task_index=0)

with tf.Session(server.target) as sess:
    with tf.device("/job:local/replica:0/task:0"):
        const1 = tf.constant("Hello I am the first constant")
        const2 = tf.constant("Hello I am the second constant")
    print sess.run([const1, const2])

out : ['Hello I am the first constant', 'Hello I am the second constant']

也许我不了解这些功能...所以，如果您有个主意，请告诉我。

谢谢 ;)。

编辑

好的，我发现使用ipython笔记本无法像我一样运行它。我需要一个python程序并在终端上执行它。
但是现在我在运行代码时遇到了一个新问题，现在服务器尝试连接到给定的2个端口，而我告诉他只能在一个端口上运行。
我的新代码如下所示:

import tensorflow as tf

tf.app.flags.DEFINE_string('job_name', '', 'One of local worker')
tf.app.flags.DEFINE_string('local', '', """Comma-separated list of hostname:port for the """)

tf.app.flags.DEFINE_integer('task_id', 0, 'Task ID of local/replica running the training')
tf.app.flags.DEFINE_integer('constant_id', 0, 'the constant we want to run')

FLAGS = tf.app.flags.FLAGS

local_host = FLAGS.local.split(',')

cluster = tf.train.ClusterSpec({"local": local_host})
server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_id)

with tf.Session(server.target) as sess:
    if(FLAGS.constant_id == 0):
        with tf.device('/job:local/task:'+str(FLAGS.task_id)):
            const1 = tf.constant("Hello I am the first constant")
            print sess.run(const1)
    if (FLAGS.constant_id == 1):
        with tf.device('/job:local/task:'+str(FLAGS.task_id)):
            const2 = tf.constant("Hello I am the second constant")
            print sess.run(const2)

我运行以下命令行

python test_distributed_tensorflow.py --local=localhost:3000,localhost:3001 --job_name=local --task_id=0 --constant_id=0

我得到以下日志

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970M, pci bus id: 0000:01:00.0)
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job local -> {localhost:3000, localhost:3001}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:3000
E0518 15:27:11.794873779   10884 tcp_client_posix.c:173]     failed to connect to 'ipv4:127.0.0.1:3001': socket error: connection refused
E0518 15:27:12.795184395   10884 tcp_client_posix.c:173]     failed to connect to 'ipv4:127.0.0.1:3001': socket error: connection refused
...

编辑2

我找到了解决方案。只需执行我们交给服务器的所有任务即可。所以我必须运行这个:

python test_distributed_tensorflow.py --local=localhost:2345,localhost:2346 --job_name=local --task_id=0 --constant_id=0 \
& \
python test_distributed_tensorflow.py --local=localhost:2345,localhost:2346 --job_name=local --task_id=1 --constant_id=1

我希望可以帮助某人;)

最佳答案

Tensorflow的最新版本提供distribution strategy以在多个系统上工作。
通过示例的分发策略进行了解释，看看这个link。

关于python - 分布式 tensorflow 如何工作？ (与tf.train.Server一起发出)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37294201/

25

4

0

文章推荐： atom-editor - Atom 编辑器 hyperclick 不起作用

文章推荐： machine-learning - 训练过程中出现Nans的常见原因

文章推荐： prolog - 序言中两点之间的距离

文章推荐： security - 使用 Sqlmap 设置特定类型的攻击

javascript - 对焦或模糊时的射击功能(一起)
我想要的是能够在输入获得焦点或失去焦点时执行某些操作(两个事件)。我尝试了以下方法，但这按事件单独工作(单独编码时):仅在焦点上，或仅在失去焦点时。另外，我希望它尽可能跨平台(包括触摸设备)，这是
JavaFX TableView 使用分页过滤(一起)
我分别研究了TableView的Filtering和Pagination。过滤: this帖子帮助我满足了我的需要分页: this , this帖子也帮助了我我想像这样将它们组合在一起: 详情-
TDD 和 UML 一起
我是 TDD 方法的新手，所以我想知道是否有人经历过这种机智可以启发我一点。我想获得一些关于如何一起使用 UML 和 TDD 方法的线索。我已经习惯了:用 UML 设计 --> 生成骨架类(然后保持
Docker 入口点和 cmd 一起
我尝试使用入口点和 cmd 设置 Docker。 FROM debian:stretch RUN apt-get update && \ apt install gnupg ca-certificat
Java 泛型与类和接口(interface) - 一起
我想要一个 Class 对象，但我想强制它所代表的任何类扩展类 A 并实现接口(interface) B。我能做到: Class 或者: Class 但我不能两者兼得。有办法做到这一点吗？最佳答案
javascript - WebStorm + RubyMine 一起(？)
我是 Rubymine 的长期用户。 Rubymine 非常适合基于 html 的 Rails 应用程序，但我现在正在做更多的 SPA 客户端工作(例如 javascript/react)。我发现我真
jquery - Prototype 和 jQuery 一起？
我注意到我使用的某个脚本依赖于原型(prototype)。 (Lightbox 2) 它会与 jQuery 在同一页面上一起工作吗？有没有办法确保它们不冲突？最佳答案可以，但你需要采取 speci
Jquery dataTables 和 tablesorter 一起
我需要对表中显示的数据进行分页并通过 ajax 调用获取它 - 这是我通过使用具有以下配置的 dataTables 插件来完成的 - bServerSide : true; sAjaxSource :
c - 归档和 gtk 一起 - 可能吗？
我是 gtk 新手，所以想知道在 C 语言中归档和 gtk 是否可以一起使用？例如，我可以从 .txt 文件中读取，然后在相同的代码中使用 gtk 在标签或其他内容中显示它吗？如果是，怎么办？谢谢!
java - Bck2Brwsr 与 JavaFX 一起？
有没有人设法得到Bck2Brwsr最近与 Java 8/JavaFX 8 一起工作？有没有兼容的机会？我找不到太多关于它的信息，也没有一个好的起点。使用给定的 Maven archetype我遇到了几
python - openid 和 oauth 一起？
在我的应用程序中，用户通过 openid(与 stackoverflow 相同)登录/注销。我想通过 oauth 向第三方应用程序开放我的应用程序。如何创建我的 openid-consumer 应
java - 与 Spring 一起 hibernate
我在启动和运行 Hibernate 和 Spring 时遇到一些问题。我有一个网络服务器项目，它使用了其他几个具有持久实体的项目。我遇到的问题是，对于存储在 WEB-INF/libs 内的另一个 ja
java - @ControllerAdvice 异常处理与@ResponseStatus 一起
我有 @ControllerAdvice 类，它处理一组异常。我们还有一些其他异常，这些异常用 @ResponseStatus 注释进行注释。为了结合这两种方法，我们使用博客文章中描述的技术:http
android - Progressbar 与 asyncTask 一起
我想在屏幕上使用进度条而不是 progressDialog。我在我的 XML View 文件中插入了一个进度条，我想让它在加载时显示并在不加载时禁用它。所以我使用的是可见的，但它发生了，所以其余的
mysql - CONCAT 与 IF ELSE 一起？
CREATE TABLE `users` ( `id` int(11) AUTO_INCREMENT, `academicdegree` varchar(255),
sql - MySQL - Where IN 与 GROUP_CONCAT 一起
IN() 中使用的查询返回:1, 2。然而，整个查询返回 0 行，这是不可能的，因为它们存在。我在这里做错了什么？ SELECT DISTINCT li.auto_id FROM links
javascript - Jade 和 jQuery 一起
亲们，我如何在使用 Jade 生成的表单上实现 jQuery 样式？我想做的是美化表单并使它们可点击。我在 UI 方面很糟糕。期间。我如何在表单上实现这个可选择的方法？ http://jquer
php - Yii 和 Knockout 一起？
按照目前的情况，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
c++ - auto 关键字和 smartpointers 一起？
我可以: auto o1 = new Content; 但不能: std::shared_ptr o1(new Content); std::unique_ptr o1(new Content); 我
java - Firebase 与 sqlite 一起
关闭。这个问题需要更多focused .它目前不接受答案。想改进这个问题吗？更新问题，使其只关注一个问题 editing this post . 关闭 4 年前。 Improve this qu

首页

博学

6Ren·AI

商城

python - 分布式 tensorflow 如何工作？ (与tf.train.Server一起发出)