gpt4 book ai didi

hadoop - MRv2/YARN 特性

转载 作者:可可西里 更新时间:2023-11-01 16:22:36 25 4
gpt4 key购买 nike

我正在努力思考新 API 的实际用途,并且通过互联网阅读,我找到了对我正在处理的相同问题的不同答案。

我想知道答案的问题是:

1) 哪个 MRv2/YARN 守护进程负责启动应用程序容器和监控应用程序资源使用情况。

2) MRv2/YARN 旨在解决哪两个问题?

我会尝试通过指定资源和我的搜索中的实际数据来使这个线程对其他读者具有教育意义和建设性,所以我希望它不会看起来像我提供了太多信息,而我可以只问问题和缩短我的帖子。

对于第一个问题,阅读文档,我可以找到 3 个主要资源可以依赖:

来自 Hadoop 文档:

ApplicationMaster<-->NodeManager Launch containers. Communicate withNodeManagers by using NMClientAsync objects, handling container eventsby NMClientAsync.CallbackHandler

The ApplicationMaster communicates with YARN cluster, and handlesapplication execution. It performs operations in an asynchronousfashion. During application launch time, the main tasks of theApplicationMaster are:

a) communicating with the ResourceManager to negotiate and allocateresources for future containers, and

b) after container allocation,communicating YARN NodeManagers (NMs) to launch application containerson them.

来自 Hortonworks 文档

The ApplicationMaster is, in effect, an instance of aframework-specific library and is responsible for negotiatingresources from the ResourceManager and working with the NodeManager(s)to execute and monitor the containers and their resource consumption.It has the responsibility of negotiating appropriate resourcecontainers from the ResourceManager, tracking their status andmonitoring progress.

来自 Cloudera 文档:

MRv2 daemons -

ResourceManager – one per cluster – Starts ApplicationMasters, allocates resources on slave nodes

ApplicationMaster – one per job – Requests resources, manages individual Map and Reduce tasks

NodeManager – one per slave node – Manages resources on individual slave nodes

JobHistory – one per cluster – Archives jobs’ metrics and metadata

回到问题(哪些守护进程负责启动应用容器和监控应用资源使用)我问自己:

NodeManager 吗?是 ApplicationMaster 吗?

据我所知,ApplicationMaster 是让 NodeManager 真正完成工作的人,所以这就像问谁负责从地上举起一个箱子,是那些人实际举起了箱子控制 body 并让他们做举重的人...

我想这是一个棘手的问题,但必须只有一个答案。

对于第二个问题,网上阅读,我可以从许多资源中找到不同的答案,因此感到困惑,但我的主要来源是:

来自 Cloudera 文档:

MapReduce v2 (“MRv2”) – Built on top of YARN (Yet"Another Resource NegoGator)

– Uses ResourceManager/NodeManager architecture

– Increases scalability of cluster

– Node resources can be used for any type of task

– Improves cluster utilization

– Support for non/MR jobs

回到问题(MRv2/YARN 旨在解决哪两个问题?),我知道 MRv2 做了一些更改,例如防止 JobTracker 上的资源压力(在 MRv1 中,集群中的最大节点数可能在 4000 左右,而在 MRv2 中它是这个数字的 2 倍以上),我也知道它提供了运行 MapReduce 以外的框架的能力,例如 MPI。

来自文档:

The Application Master provides much of the functionality ofthe traditional ResourceManager so that the entire system can scalemore dramatically. In tests, we’ve already successfully simulated10,000 node clusters composed of modern hardware without significantissue.

和:

Moving all application framework specific code into theApplicationMaster generalizes the system so that we can now supportmultiple frameworks such as MapReduce, MPI and Graph Processing.

但我也认为它处理了 NameNode 是单点故障的事实,并且在新版本中有通过高可用性模式的备用 NameNode(我可能会混淆旧 API 和新 API 的功能,具有 MRv1 与 MRv2 的功能,这可能是我提出问题的原因):

Prior to Hadoop 2.0.0, the NameNode was a single point of failure(SPOF) in an HDFS cluster. Each cluster had a single NameNode, and ifthat machine or process became unavailable, the cluster as a wholewould be unavailable until the NameNode was either restarted orbrought up on a separate machine.

因此,如果您必须从 3 个中选择 2 个,那么 MRv2/YARN 旨在解决的两个问题是哪两个?

-JobTracker的资源压力

-能够运行 MapReduce 以外的框架,例如 MPI。

-NameNode 中的单点故障。

提前致谢!

最佳答案

Which of the MRv2/YARN daemons is the one responsible for launching application containers and monitoring application resource usage.

ResourceManager(RM) 负责为特定作业启动一次ApplicationMaster(AM),AM 已启动,AM 负责协商、分配和监控作业资源(容器)。

我建议您阅读 Hadoop Definitive Guide 中的 MapReduce 作业剖析第 6 章,深入解释了作业资源如何在 MR1 和 MR2 中分配。

Which two issues MRv2/YARN is designed to address?

YARN 尝试将 MR1 中 JobTracker 的功能(这是扩展的瓶颈)分离到自己的抽象中:

  • 集群资源管理 - 资源管理器
  • 应用程序生命周期管理 - 特定应用程序/工作的应用程序管理器

So if you would have to choose 2 of the 3, which ones would be the 2 that serve as the two issues MRv2/YARN is designed to address?

-Resource pressure on the JobTracker

-Ability to run frameworks other than MapReduce, such as MPI.

-Single point of failure in the NameNode.

从您的 2 个答案中,我会选择 1 和 2。

关于hadoop - MRv2/YARN 特性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27913632/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com