gpt4 book ai didi

mongodb - Hive 需要很长时间才能进行 limit 1 查询

转载 作者:可可西里 更新时间:2023-11-01 14:58:07 26 4
gpt4 key购买 nike

我最近安装了 Hive。我创建了一个外部表来访问 MongoDB 中存在的数据库。现在,如果我运行类似 SELECT id FROM users LIMIT 1; 的查询,执行该命令平均需要大约 18 秒。即使将 LIMIT 设置为 10、100、1000、10000,也将花费相同的时间。日志包含如下内容:

2015-08-24 09:19:37,918 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min=null, max= { "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}, max= { "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}, max= { "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}, max= { "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}, max= { "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}, max= { "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}, max= null

其实中间还有很多类似的行,我省略了。从日志中我只能猜测,即使我执行 limit 1 Hive 从 MongoDB 获取整个集合,然后选择 1 进行显示。有什么方法可以更改此设置,以便在我执行 limit 1 时 Hive 仅获得 1 行?

最佳答案

在 Hive 表的情况下(对于外部表可能也是如此)如果您使用 LIMIT 从数据库中选择一个特定字段,则 Map Reduce 任务(或您正在使用的任何执行引擎)启动,而如果您选择 * 那不需要 Map Reduce -> 它要快得多。这可能是速度缓慢的原因。

关于mongodb - Hive 需要很长时间才能进行 limit 1 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32185969/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com