gpt4 book ai didi

hbase - HBase中多列族的优势是什么?

转载 作者:行者123 更新时间:2023-12-04 06:18:03 25 4
gpt4 key购买 nike

我想使用 HBase 作为我的应用程序的数据库。我有一个包含多列的表。我现在需要决定我应该使用多少个列族,一个或多个。如果不止一个,会有什么好处和坏处。

最佳答案

它已经记录在 official HBase guide 中,看看加粗的语句:

  1. On the number of column families

HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed though the amount of data they carry is small. When many column families the flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by changing flushing and compaction to work on a per column family basis). For more information on compactions, see compaction.

Try to make do with one column family if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time.

33.1. Cardinality of ColumnFamilies

Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows). If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA’s data will likely be spread across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.



一个很好的例子是有一个包含 Daily、Monthly、Yearly 和 Total 列族的分析表,每个列族都有自己的 TTL 设置(到期)和每个日期范围(天、月、年...)的列,它们不同的范围,当您查询表时,您通常一次只获取一种类型的聚合,即:检索过去 30 天的每日统计信息

如果您想了解有关架构设计的更多信息,请查看伟大的 Introduction to HBase schema design阿曼迪普·库拉纳 (Amandeep Khurana)

关于hbase - HBase中多列族的优势是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28197410/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com