gpt4 book ai didi

Hadoop Datanode、namenode、secondary-namenode、job-tracker 和 task-tracker

转载 作者:可可西里 更新时间:2023-11-01 14:11:15 25 4
gpt4 key购买 nike

我是hadoop的新手,所以我有一些疑问。如果主节点发生故障,hadoop 集群会发生什么?我们能否在没有任何损失的情况下恢复该节点?是否可以保留一个辅助主节点在当前主节点发生故障时自动切换为主节点?

我们有namenode(Secondary namenode)的备份,所以我们可以在Secondary namenode发生故障时从中恢复namenode。这样,当datanode发生故障时,我们如何恢复datanode中的数据呢? Secondary namenode只是namenode的备份而不是datenode,对吧?如果一个节点在作业完成之前发生故障,那么作业跟踪器中有待处理的作业,该作业是继续还是从空闲节点中的第一个重新开始?

万一出事怎么恢复整个集群的数据呢?

最后一个问题,我们可以在 Mapreduce 中使用 C 程序吗(例如,在 mapreduce 中使用冒泡排序)?

提前致谢

最佳答案

虽然,现在回答您的问题为时已晚,但它可能会帮助其他人..

首先让我向您介绍辅助名称节点:

It Contains the name space image, edit log files' back up for past one hour (configurable). And its work is to merge latest Name Node NameSpaceImage and edit logs files to upload back to Name Node as replacement of the old one. To have a Secondary NN in a cluster is not mandatory.

现在解决您的问题..

  • 如果主节点发生故障,hadoop 集群会发生什么情况?

Supporting Frail's answer, Yes hadoop has single point of failure so whole of your currently running task like Map-Reduce or any other that is using the failed master node will stop. The whole cluster including client will stop working.

  • 我们能否在没有任何损失的情况下恢复该节点?

That is hypothetical, Without loss it is least possible, as all the data (block reports) will lost which has sent by Data nodes to Name node after last back up taken by secondary name node. Why I mentioned least, because If name node fails just after a successful back up run by secondary name node then it is in safe state.

  • 是否可以保持从主节点在当前主节点出现故障时自动切换为主节点?

It is staright possible by an Administrator (User). And to switch it automatically you have to write a native code out of the cluster, Code to moniter the cluster that will cofigure the secondary name node smartly and restart the cluster with new name node address.

  • 我们有namenode(Secondary namenode)的备份,所以我们可以在Secondary namenode发生故障时从中恢复namenode。像这样,当datanode发生故障时,如何恢复datanode中的数据?

It is about replication factor, We have 3 (default as best practice, configurable) replicas of each file block all in different data nodes. So in case of failure for time being we have 2 back up data nodes. Later Name node will create one more replica of the data that failed data node contained.

  • secondary namenode 只是 namenode 的备份,而不是 datenode,对吧?

Right. It just contains all the metadata of data nodes like data node address,properties including block report of each data node.

  • 如果一个节点在作业完成前发生故障,那么作业跟踪器中有待处理的作业,该作业是继续还是从空闲节点中的第一个重新开始?

HDFS will forcely try to continue the job. But again it depends on replication factor, rack awareness and other configuration made by admin. But if following Hadoop's best practices about HDFS then it will not get failed. JobTracker will get replicated node address to continnue.

  • 万一出现问题,如何恢复整个集群的数据?

By Restarting it.

  • 最后一个问题,我们可以在 Mapreduce 中使用 C 程序吗(例如,在 mapreduce 中使用冒泡排序)?

yes, you can use any programming language which support Standard file read write operations.

我刚刚试了一下。希望它能对您和其他人有所帮助。

*欢迎提出建议/改进。*

关于Hadoop Datanode、namenode、secondary-namenode、job-tracker 和 task-tracker,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7817391/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com