java - 为什么推测执行对 Giraph 没有意义？-6ren

java - 为什么推测执行对 Giraph 没有意义？

转载作者：可可西里更新时间：2023-11-01 14:53:43

25

4

最近我正在运行一些基准测试来了解 Giraph 中的故障转移机制。

其实我很好奇；当工作中的一个 worker 变慢时，其他 worker 将等待它。后来在GiraphJob.java中发现了这样的东西:

// Speculative execution doesn't make sense for Giraph
giraphConfiguration.setBoolean("mapred.map.tasks.speculative.execution", false);

有谁知道为什么 Giraph 中没有启用推测执行？

谢谢

最佳答案

首先让我们回顾一下什么是推测执行。引自 Yahoo's Hadoop tutorial :

Speculative execution: One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes. By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first. Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively

如果我对 Giraph 的理解是正确的，他们不会使用推测执行，因为他们使用自己的迭代计算范式，但它不适合。这种范式的灵感来自 google 的 pregel，它提供了更多的图形以节点为中心的数据 View 。此外，容错是通过检查点创建的，这意味着每次迭代(也称为超步)计算每个图形节点的所有传入消息，然后消息在节点之间分发。

简单地说，MapReduce 并未以其原始方式使用，因此 giraph 的推测执行没有意义。

关于java - 为什么推测执行对 Giraph 没有意义？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26583340/

25

4

0

文章推荐： html - 使 HTML5 应用程序可离线使用

文章推荐： c++ - 未定义对 _Unwind_Resume 和 __gxx_personality_v0 的引用

文章推荐： html - 图像对齐

文章推荐： c++ - C++11 lambda 的参数/存储类型

maven - 在没有互联网连接的计算机上安装Apache Giraph
我想在RedHat集群上安装 Apache Giraph 1.1.0 ， Hadoop 2.7.1 已在集群上安装和配置。在官方教程http://giraph.apache.org/quick_sta
hadoop - Giraph 作业始终以本地模式运行
我在 Hadoop 2.6.0 上运行 Giraph 1.1.0。mapredsite.xml 看起来像这样 mapreduce.framework.name yarn The run
giraph 作业完成后，内存不会被释放
我在 Hadoop 版本 0.20.203 上使用 Apache Giraph 版本 1.0。它执行ConnectedComponentsVertex和SimpleShortetPathsVertex
java - Giraph 的工作永无止境
我正在尝试使用最新的 Giraph 代码和 Hadoop 2.5.2 运行 SimpleShortestPathsComputation 示例。我的命令行如下所示: hadoop jar /home/
java - Giraph:自定义作业上未找到类异常
我正在使用 Giraph 开发算法。我在 Hadoop 1.2.1 上使用 1.0.0 版。我是开发 Giraph 的新手，所以请保持温和 ;) 我的自定义作业分为三个包: io:包含输入和输出格式
apache - 无法运行 giraph SimpleInDegreeCountComputation
我正在尝试运行 Giraph 中包含的 SimpleInDegreeCountComputation 示例。我的做法如下: SimpleInDegreeCountComputation.java:
java - 运行 Giraph 作业显示以下错误
信息 zookeeper.ClientCnxn:打开与服务器 debashis-Pseudo-Hadoop/127.0.1.1:22181 的套接字连接。不会尝试使用 SASL 进行身份验证(未知错误
hadoop - 部署 Giraph 时出现的问题
我正在尝试部署 Giraph 以运行一些示例。我关注QuickStart guide ，跳过步骤部署 Hadoop，因为我已经在我的机器上将 hadoop 设置为单个节点。但是我收到以下错误: [ER
hadoop - Apache Giraph 中具有复杂值的顶点
我正在尝试将一些包含相关顶点信息的文本文件读入 Giraph:每一行都是 vertex_id attribute_1 attribute_2 .....attribute_n 其中每个属性都是一个字符
java - 为什么推测执行对 Giraph 没有意义？
最近我正在运行一些基准测试来了解 Giraph 中的故障转移机制。其实我很好奇；当工作中的一个 worker 变慢时，其他 worker 将等待它。后来在GiraphJob.java中发现了这样的东
git - 构建 Giraph 时出现编译错误
我正在尝试构建 Giraph。我有以下内容:java 版本“1.7.0_25”、Apache Maven 3.0.4、Hadoop 1.0.4。我正在按照此页面中的说明进行操作: https://cw
hadoop - Giraph 最短路径示例 ClassNotFoundException
我正在尝试从 giraph 孵化器 (https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example) 运行最短路径
org.apache.giraph.zk.ZookeeperConfig类的使用及代码示例
本文整理了Java中org.apache.giraph.zk.ZookeeperConfig类的一些代码示例，展示了ZookeeperConfig类的具体用法。这些代码示例主要来源于Github/St
neo4j - 哪种 Giraph I/O 格式可用于属性图？
Giraph 中有几种内置的输入输出格式，但所有这些格式都只支持数字 ID 和值。那么有没有一种方法可以处理属性图，使顶点和边都可以有多个键和值或任何接近的东西？我特别感兴趣的是 edge 是否可以
图遍历中的 Neo4j 与 Apache Giraph
Apache Giraph 与 Neo4j 对比:遍历算法在这两个图形处理系统中跨节点完全不同？如果我们要遍历使用 Giraph 和 Neo4j 对存储在单机(非分布式)中的数据的社交图，哪个会表现
hadoop - Giraph ShortestPath演示从未退出，补丁756已应用(我认为)
我是Hadoop和Giraph的新手。我试图在运行YARN的服务器上使用Giraph 1.1运行Giraph ShortestPaths示例。经过很多次拉毛后，我终于开始运转了。现在的问题是停止它。
Neo4j 或 GraphX/Giraph 选什么？
刚刚开始我对图形处理方法和工具的探索。我们基本上所做的 - 计算一些标准指标，例如页面排名、聚类系数、三角形计数、直径、连接性等。过去对 Octave 很满意，但是当我们开始处理具有 10^9 个节点
java - 如何编写和运行 apache Giraph 自定义代码？
过去 10 天我一直在研究 giraph。我得到了如何在 Giraph 中安装和执行给定示例的想法。但我想设计自己的自定义代码，所以我需要你的一些帮助。如果有人完成了这个，请告诉我并给出一些想法。最
java - Giraph 的 workers 在顶点接收消息时采用什么机制？
我很好奇，在 Giraph 的 worker API 文档中，我看到了关于这个方法的解释: public void storeCheckpoint() // Both the vertices and
hadoop - hadoop 是使用 Giraph 所必需的吗
我想使用 Giraph 作为我工作的图形处理工具。我熟悉 Mahout，我知道我可以在不使用 Hadoop 的情况下使用 Mahout 的某些部分，例如推荐系统。但是，我不知道这对于Giraph是否也

首页

博学

6Ren·AI

商城

java - 为什么推测执行对 Giraph 没有意义？