gpt4 book ai didi

mongodb - Mongo $group 太慢了

转载 作者:可可西里 更新时间:2023-11-01 09:06:24 25 4
gpt4 key购买 nike

我有一个包含大约 168,200,000 个文档的 mongo 数据库集合。我正在尝试使用 $group 获取某个字段的平均值,并且我在管道中的 $group 之前使用 $match 来使用 client.city 上的索引。但是查询大约需要 5 分钟才能运行,这非常慢。

这是我尝试过的:

db.ar12.aggregate(
{$match:{'client.city':'New York'}},
{'$group':{'_id':'client.city', 'avg':{'$avg':'$length'}}}
)

db.ar12.aggregate(
{$match:{'client.city':'New York'}},
{'$group':{'_id':null, 'avg':{'$avg':'$length'}}}
)

db.ar12.aggregate(
{$match:{'client.city':'New York'}},
{$project: {'length':1}},
{'$group':{'_id':null, 'avg':{'$avg':'$length'}}}
)

所有 3 个查询花费的时间大致相同,client.city = to New York 的文档数量为 1,231,672,find({'client.city':'New York').count() 需要一秒钟才能运行

> db.version()
3.2.0

编辑

这是解释结果...至于添加带长度的复合索引的评论,这会有帮助吗,虽然我不是按长度搜索我想要所有长度...

{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"client.city" : "New York"
},
"fields" : {
"length" : 1,
"_id" : 1
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "clients.ar12",
"indexFilterSet" : false,
"parsedQuery" : {
"client.city" : {
"$eq" : "New York"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"client.city" : 1
},
"indexName" : "client.city_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"client.city" : [
"[\"New York\", \"New York\"]"
]
}
}
},
"rejectedPlans" : [ ]
}
}
},
{
"$project" : {
"length" : true
}
},
{
"$group" : {
"_id" : {
"$const" : null
},
"total" : {
"$avg" : "$length"
}
}
}
],
"ok" : 1
}

编辑 2

我加了client.city和length的复合索引,还是不行,速度还是太慢,我试了这2个查询:

db.ar12.aggregate(
{$match: {'client.city':'New York'}},
{$project: {'client.city':1, 'length':1}},
{'$group':{'_id':'$client.city', 'avg':{'$avg':'$length'}}}
)

上面的查询没有使用复合索引,所以我尝试强制使用它,但仍然没有任何改变:

db.ar12.aggregate(
{$match: { $and : [{'client.city':'New York'}, {'length':{'$gt':0}}]}},
{$project: {'client.city':1, 'length':1}},
{'$group':{'_id':'$client.city', 'avg':{'$avg':'$length'}}}
)

下面是最后一个查询的解释:

{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"$and" : [
{
"client.city" : "New York"
},
{
"length" : {
"$gt" : 0
}
}
]
},
"fields" : {
"client.city" : 1,
"length" : 1,
"_id" : 1
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "clients.ar12",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"client.city" : {
"$eq" : "New York"
}
},
{
"length" : {
"$gt" : 0
}
}
]
},
"winningPlan" : {
"stage" : "CACHED_PLAN",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"client.city" : 1,
"length" : 1
},
"indexName" : "client.city_1_length_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"client.city" : [
"[\"New York\", \"New York\"]"
],
"length" : [
"(0.0, inf.0]"
]
}
}
}
},
"rejectedPlans" : [ ]
}
}
},
{
"$project" : {
"client" : {
"city" : true
},
"length" : true
}
},
{
"$group" : {
"_id" : "$client.city",
"avg" : {
"$avg" : "$length"
}
}
}
],
"ok" : 1
}

最佳答案

我找到了解决方法,长度从 1 到 70。所以我在 python 中从 1 迭代到 70,找到每个城市的每个长度的计数,

db.ar12.find({'client.city':'New York', 'length':i}).count()

速度非常快,然后用python计算平均值,运行大约需要2秒。

这不是最好的解决方案,因为我还有其他查询要运行,我不知道我是否可以找到解决所有这些问题的方法...

关于mongodb - Mongo $group 太慢了,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34797416/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com