- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
hadoop的新手,仅设置3个debian服务器群集进行练习。
我正在研究hadoop的最佳做法,发现:
JBOD无RAID
文件系统:ext3,ext4,xfs-使用zfs和btrfs看不到任何花哨的COW内容
所以我提出了这些问题...
我读过JBOD的每个地方都比hadoop中的RAID更好,而且最好的文件系统是xfs,ext3和ext4。除了文件系统之类的东西,这些东西都是最好的,这是完全有道理的...您如何实现此JBOD?如果您用Google搜索自己的内容,您会看到我的困惑,JBOD暗示了线性附件或一堆磁盘的组合,就像逻辑卷一样,至少这就是某些人的解释方式,但是hadoop似乎想要一个不结合的JBOD。没有人会为此而扩张...
* TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD:
* disk1 2 and 3 used for datanode for hadoop
* disk1 is sda 100gb
* disk2 is sdb 200gb
* disk3 is sdc 300gb
* sda + sdb + sdc = jbod of name entity1
* JBOD MADE ANYWAY - WHO CARES - THATS NOT MY QUESTION: maybe we made the jbod of entity1 with lvm, or mdadm using linear concat, or hardware jbod drivers which combine disks and show them to the operating system as entity1, it doesn't matter, either way its still a jbod
* This is the type of JBOD I am used to and I keep coming across when I google search JBOD
* cat /proc/partitions would show sda,sdb,sdc and entity1 OR if we used hardware jbod maybe sda and sdb and sdc would not show and only entity1 would show, again who cares how it shows
* mount entity1 to /mnt/entity1
* running "df" would show that entity1 is 100+200+300=600gb big
* we then setup hadoop to run its datanodes on /mnt/entity1 so that datadir property points at /mnt/entity1 and the cluster just gained 600gb of capacity
TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD
* disk1 2 and 3 used for datanode for hadoop
* disk1 is sda 100gb
* disk2 is sdb 200gb
* disk3 is sdc 300gb
* WE DO NOT COMBINE THEM TO APPEAR AS ONE
* sda mounted to /mnt/a
* sdb mounted to /mnt/b
* sdc mounted to /mnt/c
* running a "df" would show that sda and sdb and sdc have the following sizes: 100,200,300 gb respectively
* we then setup hadoop via its config files to lay its hdfs on this node on the following "datadirs": /mnt/a and /mnt/b and /mnt/c.. gaining 100gb to the cluster from a, 200gb from b and 300gb from c... for a total gain of 600gb from this node... nobody using the cluster would tell the difference..
最佳答案
我可以尝试回答几个问题-告诉我您有何不同意见。
1.JBOD:只是一堆磁盘;一组驱动器,每个驱动器都可以作为独立驱动器直接访问。
在Hadoop Definitive Guide中,主题为什么不使用RAID? 表示,RAID读写性能受到阵列中最慢的磁盘的限制。
另外,在使用HDFS的情况下,数据复制发生在位于不同机架中的不同计算机之间。即使机架发生故障,这也可以处理潜在的数据丢失。因此,RAID不是必需的。 Namenode可以使用链接中提到的RAID。
2.是表示在每台计算机(例如/disk1,/disk2,/disk3等)中安装的独立磁盘(JBOD),但未分区。
3、4和5 阅读附录
6和7。 Check this link,以了解如何进行块复制
评论后的附录:
Q1。每个人都指的是哪种方法是最佳的实践,因为可以结合使用此组合jbod或磁盘分离-根据在线文档,这仍然是jbod?
可能的答案:
摘自Hadoop权威指南-
You should also set the dfs.data.dir property, which specifies a list of directories for a datanode to store its blocks. Unlike the namenode, which uses multiple directories for redundancy, a datanode round-robins writes between its storage directories, so for performance you should specify a storage directory for each local disk. Read performance also benefits from having multiple disks for storage, because blocks will be spread across them, and concurrent reads for distinct blocks will be correspondingly spread across disks.
For maximum performance, you should mount storage disks with the noatime option. This setting means that last accessed time information is not written on file reads, which gives significant performance gains.
Avoid RAID and LVM on TaskTracker and DataNode machines – it generally reduces performance.
关于hadoop - hadoop中使用哪种JBOD?和COD与hadoop?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17694395/
我是一名优秀的程序员,十分优秀!