gpt4 book ai didi

Hadoop/Yarn (v0.23.3) 伪分布式模式设置::无作业节点

转载 作者:可可西里 更新时间:2023-11-01 16:19:04 26 4
gpt4 key购买 nike

我刚刚在伪分布式模式下设置了 Hadoop/Yarn 2.x(特别是 v0.23.3)。

我遵循了一些博客和网站的说明,它们或多或少提供了设置它的相同处方。我也关注了 O'reilly 的第 3 版Hadoop 书(具有讽刺意味的是,它是最没有帮助的)。

问题:

After running "start-dfs.sh" and then "start-yarn.sh", while all of the daemons
do start (as indicated by jps(1)), the Resource Manager web portal
(Here: http://localhost:8088/cluster/nodes) indicates 0 (zero) job-nodes in the
cluster. So while submitting the example/test Hadoop job indeed does get
scheduled, it pends forever because, I assume, the configuration doesn't see a
node to run it on.

Below are the steps I performed, including resultant configuration files.
Hopefully the community help me out... (And thank you in advance).

配置:

在我和 hadoop 的 UNIX 帐户配置文件中都设置了以下环境变量:~/.profile:

export HADOOP_HOME=/home/myself/APPS.d/APACHE_HADOOP.d/latest
# Note: /home/myself/APPS.d/APACHE_HADOOP.d/latest -> hadoop-0.23.3

export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_INSTALL=${HADOOP_HOME}
export HADOOP_CLASSPATH=${HADOOP_HOME}/lib
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop/conf
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop/conf
export JAVA_HOME=/usr/lib/jvm/jre

hadoop$ java -version

java version "1.7.0_06-icedtea<br>
OpenJDK Runtime Environment (fedora-2.3.1.fc17.2-x86_64)<br>
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)<br>

# Although the above shows OpenJDK, the same problem happens with Sun's JRE/JDK.

NAMENODE 和 DATANODE 目录,也在 etc/hadoop/conf/hdfs-site.xml 中指定:

/home/myself/APPS.d/APACHE_HADOOP.d/latest/YARN_DATA.d/HDFS.d/DATANODE.d/
/home/myself/APPS.d/APACHE_HADOOP.d/latest/YARN_DATA.d/HDFS.d/NAMENODE.d/

接下来,各种 XML 配置文件(这里还是 YARN/MRv2/v0.23.3):

hadoop$ pwd; ls -l
/home/myself/APPS.d/APACHE_HADOOP.d/latest/etc/hadoop/conf
lrwxrwxrwx 1 hadoop hadoop 16 Sep 20 13:14 core-site.xml -> ../core-site.xml
lrwxrwxrwx 1 hadoop hadoop 16 Sep 20 13:14 hdfs-site.xml -> ../hdfs-site.xml
lrwxrwxrwx 1 hadoop hadoop 18 Sep 20 13:14 httpfs-site.xml -> ../httpfs-site.xml
lrwxrwxrwx 1 hadoop hadoop 18 Sep 20 13:14 mapred-site.xml -> ../mapred-site.xml
-rw-rw-r-- 1 hadoop hadoop 10 Sep 20 15:36 slaves
lrwxrwxrwx 1 hadoop hadoop 16 Sep 20 13:14 yarn-site.xml -> ../yarn-site.xml

核心站点.xml

<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>

<!-- Same problem whether this (legacy) stanza is included or not. -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

hdfs-site.xml

<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/myself/APPS.d/APACHE_HADOOP.d/YARN_DATA.d/HDFS.d/NAMENODE.d</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/myself/APPS.d/APACHE_HADOOP.d/YARN_DATA.d/HDFS.d/DATANODE.d</value>
</property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!-- yarn-site.xml -->
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/myself/APPS.d/APACHE_HADOOP.d/YARN_DATA.d/TEMP.d</value>
</property>
</configuration>

etc/hadoop/conf/saves

localhost
# Community/friends, is this entry correct/needed for my psuedo-dist mode?

杂项总结说明:

(1) As you may have gleaned from above, all files/directories are owned
by the 'hadoop' UNIX user. There is a hadoop:hadoop, UNIX User and
Group, respectively.

(2) The following command was run after the NAMENODE & DATANODE directories
(listed above) were created (and whose paths were entered into
hdfs-site.xml):

hadoop$ hadoop namenode -format

(3) Next, I ran "start-dfs.sh", then "start-yarn.sh".
Here is jps(1) output:

hadoop@e6510$ jps
21979 DataNode
22253 ResourceManager
22384 NodeManager
22156 SecondaryNameNode
21829 NameNode
22742 Jps

谢谢!

最佳答案

在这个问题上付出了很多努力但没有成功(相信我,我尝试了所有方法)之后,我制定了hadoop 使用不同的解决方案。而在上面我下载了一个 gzip/tar ball来自其中一个下载镜像的 hadoop 发行版(同样是 v0.23.3),这个一次我使用 Caldera CDH 分发的 RPM 包,我通过他们的百胜 repo 协议(protocol)。希望这对某人有所帮助,这里是详细步骤。

第一步:

对于 Hadoop 0.20.x(MapReduce 版本 1):

  # rpm -Uvh http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm
# rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
# yum install hadoop-0.20-conf-pseudo

-或-

对于 Hadoop 0.23.x(MapReduce 版本 2):

  # rpm -Uvh http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.noarch.rpm
# rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
# yum install hadoop-conf-pseudo

在上述两种情况下,安装那个“psuedo”包(代表“pseudo-distributedHadoop”模式),将单独方便地触发安装您需要的所有其他必要包(通过依赖关系解析)。

第二步:

安装 Sun/Oracle 的 Java JRE(如果您尚未安装)。你可以通过他们提供的 RPM 或 gzip/tar ball portable 安装它版本。只要您设置并导出“JAVA_HOME”就没关系适当的环境,并确保 ${JAVA_HOME}/bin/java 在您的路径中。

  # echo $JAVA_HOME; which java
/home/myself/APPS.d/JAVA-JRE.d/jdk1.7.0_07
/home/myself/APPS.d/JAVA-JRE.d/jdk1.7.0_07/bin/java

注意:我实际上创建了一个名为“latest”的符号链接(symbolic link)并将其指向/重新指向 JAVA每当我更新 JAVA 时,版本特定目录。我在上面明确表示读者的理解。

第 3 步:将 hdfs 格式化为“hdfs”Unix 用户(在上面的“yum install”期间创建)。

  # sudo su hdfs -c "hadoop namenode -format"

第四步:

手动启动 hadoop 守护进程。

  for file in `ls /etc/init.d/hadoop*`
do
{
${file} start
}
done

第五步:

检查是否一切正常。以下是 MapReduce v1(在这个表面层面上,MapReduce v2 并没有太大的不同)。

  root# jps
23104 DataNode
23469 TaskTracker
23361 SecondaryNameNode
23187 JobTracker
23267 NameNode
24754 Jps

# Do the next commands as yourself (not as "root").
myself$ hadoop fs -mkdir /foo
myself$ hadoop fs -rmr /foo
myself$ hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u5-examples.jar pi 2 100000

希望对您有所帮助!

关于Hadoop/Yarn (v0.23.3) 伪分布式模式设置::无作业节点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12522412/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com