gpt4 book ai didi

performance - Hadoop yarn 单节点性能调整

转载 作者:行者123 更新时间:2023-12-02 18:53:32 25 4
gpt4 key购买 nike

我在Ubuntu VM上安装了hadoop 2.5.2单模,即:4核,每核3GHz; 4G内存。该虚拟机不用于生产,仅用于演示和学习。

然后,我使用python编写了一个vey简单的map-reduce应用程序,并使用该应用程序处理49个xml。所有这些xml文件都是小型文件,每个文件数百行。因此,我期望有一个快速的过程。但是,令big22感到惊讶的是,完成工作花了20多分钟(工作的输出是正确的)。以下是输出指标:

14/12/15 19:37:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/12/15 19:37:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/12/15 19:38:03 INFO mapred.FileInputFormat: Total input paths to process : 49
14/12/15 19:38:06 INFO mapreduce.JobSubmitter: number of splits:49
14/12/15 19:38:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1418368500264_0005
14/12/15 19:38:10 INFO impl.YarnClientImpl: Submitted application application_1418368500264_0005
14/12/15 19:38:10 INFO mapreduce.Job: Running job: job_1418368500264_0005
14/12/15 19:38:59 INFO mapreduce.Job: Job job_1418368500264_0005 running in uber mode : false
14/12/15 19:38:59 INFO mapreduce.Job: map 0% reduce 0%
14/12/15 19:39:42 INFO mapreduce.Job: map 2% reduce 0%
14/12/15 19:40:05 INFO mapreduce.Job: map 4% reduce 0%
14/12/15 19:40:28 INFO mapreduce.Job: map 6% reduce 0%
14/12/15 19:40:49 INFO mapreduce.Job: map 8% reduce 0%
14/12/15 19:41:10 INFO mapreduce.Job: map 10% reduce 0%
14/12/15 19:41:29 INFO mapreduce.Job: map 12% reduce 0%
14/12/15 19:41:50 INFO mapreduce.Job: map 14% reduce 0%
14/12/15 19:42:08 INFO mapreduce.Job: map 16% reduce 0%
14/12/15 19:42:28 INFO mapreduce.Job: map 18% reduce 0%
14/12/15 19:42:49 INFO mapreduce.Job: map 20% reduce 0%
14/12/15 19:43:08 INFO mapreduce.Job: map 22% reduce 0%
14/12/15 19:43:28 INFO mapreduce.Job: map 24% reduce 0%
14/12/15 19:43:48 INFO mapreduce.Job: map 27% reduce 0%
14/12/15 19:44:09 INFO mapreduce.Job: map 29% reduce 0%
14/12/15 19:44:29 INFO mapreduce.Job: map 31% reduce 0%
14/12/15 19:44:49 INFO mapreduce.Job: map 33% reduce 0%
14/12/15 19:45:09 INFO mapreduce.Job: map 35% reduce 0%
14/12/15 19:45:28 INFO mapreduce.Job: map 37% reduce 0%
14/12/15 19:45:49 INFO mapreduce.Job: map 39% reduce 0%
14/12/15 19:46:09 INFO mapreduce.Job: map 41% reduce 0%
14/12/15 19:46:29 INFO mapreduce.Job: map 43% reduce 0%
14/12/15 19:46:49 INFO mapreduce.Job: map 45% reduce 0%
14/12/15 19:47:09 INFO mapreduce.Job: map 47% reduce 0%
14/12/15 19:47:29 INFO mapreduce.Job: map 49% reduce 0%
14/12/15 19:47:49 INFO mapreduce.Job: map 51% reduce 0%
14/12/15 19:48:08 INFO mapreduce.Job: map 53% reduce 0%
14/12/15 19:48:28 INFO mapreduce.Job: map 55% reduce 0%
14/12/15 19:48:48 INFO mapreduce.Job: map 57% reduce 0%
14/12/15 19:49:09 INFO mapreduce.Job: map 59% reduce 0%
14/12/15 19:49:29 INFO mapreduce.Job: map 61% reduce 0%
14/12/15 19:49:55 INFO mapreduce.Job: map 63% reduce 0%
14/12/15 19:50:23 INFO mapreduce.Job: map 65% reduce 0%
14/12/15 19:50:53 INFO mapreduce.Job: map 67% reduce 0%
14/12/15 19:51:22 INFO mapreduce.Job: map 69% reduce 0%
14/12/15 19:51:50 INFO mapreduce.Job: map 71% reduce 0%
14/12/15 19:52:18 INFO mapreduce.Job: map 73% reduce 0%
14/12/15 19:52:48 INFO mapreduce.Job: map 76% reduce 0%
14/12/15 19:53:18 INFO mapreduce.Job: map 78% reduce 0%
14/12/15 19:53:48 INFO mapreduce.Job: map 80% reduce 0%
14/12/15 19:54:18 INFO mapreduce.Job: map 82% reduce 0%
14/12/15 19:54:48 INFO mapreduce.Job: map 84% reduce 0%
14/12/15 19:55:19 INFO mapreduce.Job: map 86% reduce 0%
14/12/15 19:55:48 INFO mapreduce.Job: map 88% reduce 0%
14/12/15 19:56:16 INFO mapreduce.Job: map 90% reduce 0%
14/12/15 19:56:44 INFO mapreduce.Job: map 92% reduce 0%
14/12/15 19:57:14 INFO mapreduce.Job: map 94% reduce 0%
14/12/15 19:57:45 INFO mapreduce.Job: map 96% reduce 0%
14/12/15 19:58:15 INFO mapreduce.Job: map 98% reduce 0%
14/12/15 19:58:46 INFO mapreduce.Job: map 100% reduce 0%
14/12/15 19:59:20 INFO mapreduce.Job: map 100% reduce 100%
14/12/15 19:59:28 INFO mapreduce.Job: Job job_1418368500264_0005 completed successfully
14/12/15 19:59:30 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=17856
FILE: Number of bytes written=5086434
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=499030
HDFS: Number of bytes written=10049
HDFS: Number of read operations=150
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=49
Launched reduce tasks=1
Data-local map tasks=49
Total time spent by all maps in occupied slots (ms)=8854232
Total time spent by all reduces in occupied slots (ms)=284672
Total time spent by all map tasks (ms)=1106779
Total time spent by all reduce tasks (ms)=35584
Total vcore-seconds taken by all map tasks=1106779
Total vcore-seconds taken by all reduce tasks=35584
Total megabyte-seconds taken by all map tasks=1133341696
Total megabyte-seconds taken by all reduce tasks=36438016
Map-Reduce Framework
Map input records=9352
Map output records=296
Map output bytes=17258
Map output materialized bytes=18144
Input split bytes=6772
Combine input records=0
Combine output records=0
Reduce input groups=53
Reduce shuffle bytes=18144
Reduce input records=296
Reduce output records=52
Spilled Records=592
Shuffled Maps =49
Failed Shuffles=0
Merged Map outputs=49
GC time elapsed (ms)=33590
CPU time spent (ms)=191390
Physical memory (bytes) snapshot=13738057728
Virtual memory (bytes) snapshot=66425016320
Total committed heap usage (bytes)=10799808512
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=492258
File Output Format Counters
Bytes Written=10049
14/12/15 19:59:30 INFO streaming.StreamJob: Output directory: /data_output/sb50projs_1_output



作为Hadoop的新手,对于这种疯狂的不合理表现,我有几个问题:
  • 如何配置我的hadoop / yarn / mapreduce以使整个环境更方便试用?

  • 我知道hadoop是为大数据和大文件设计的。但是对于试用环境,我的文件很小,数据非常有限,我应该更改哪些默认配置项?我已将hdfs-site.xml的“dfs.blocksize”更改为较小的值以匹配我的小文件,但似乎没有太大的增强。我知道yarn-site.xml和mapred-site.xml中有一些JVM配置项,但是我不确定如何调整它们。
  • 如何读取hadoop日志

  • 在logs文件夹下,有用于nodemanager / resourcemanager / namenode / datanode的单独的日志文件。我试图阅读这些文件以了解在此过程中如何花费20分钟,但是对于像我这样的新手来说并不容易。因此,我想知道是否有任何工具/ UI可以帮助我分析日志。
  • 基本性能调整工具

  • 实际上,我已经在Google上搜索了这个问题,并且得到了很多名字,例如Ganglia / Nagios / Vaidya / Ambari。我想知道,哪种工具最适合分析问题,例如“为什么要花20分钟才能完成这么简单的工作?”。
  • 大量的Hadoop进程

  • 即使我的hadoop上没有任何作业在运行,我也在VM上发现了大约100个hadoop进程,如下所示(我正在使用htop,并按内存对结果进行排序)。 Hadoop是正常的吗?还是我对某些环境配置不正确?

    最佳答案

  • 您无需更改任何内容。

  • 默认配置是在小型环境中完成的。如果您发展环境,则可以更改它。 Ant 有很多参数和很多时间进行微调。

    但是我承认您的配置比通常的测试要小。
  • 您必须阅读的日志不是服务日志,而是工作日志。在/ var / log / hadoop-yarn / containers /
  • 中找到它们

    如果要更好地查看MR,请使用 http://127.0.0.1:8088/上的Web界面。您将实时查看工作进度。
  • IMO,基本调整=使用hadoop Web界面。有很多本地可用的。
  • 我认为您找到了问题。这可以是正常的,也可以不是。

  • 但是很快,YARN启动MR以使用所有可用内存:
  • 可用内存是在yarn-site.xml中设置的:yarn.nodemanager.resource.memory-mb(默认为8 Gio)。
  • 任务的内存是在mapred-site.xml或任务本身中通过以下属性定义的:mapreduce.map.memory.mb(默认为1536 Mio)

  • 因此:
  • 更改节点管理器的可用内存(更改为3Gio,以便为系统保留1 Gio)
  • 更改可用于hadoop服务的内存(hadoop-env.sh中的-Xmx,yarn-env.sh)(系统+每个hadoop服务(namenode / datanode / ressourcemanager / nodemanager))<1 Gio。
  • 更改 map task 的内存(512 Mio?)。数量越少,可以同时执行更多任务。
  • 在yarn-site.xml中将yarn.scheduler.minimum-allocation-mb更改为512,以允许映射器的内存少于1 Gio。

  • 我希望这能帮到您。

    关于performance - Hadoop yarn 单节点性能调整,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27500128/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com