I have a nodes database extracted from OpenStreetMap and the data is structured like this:
我有一个从OpenStreetMap提取的节点数据库,数据的结构如下:
[
{_id: 0, location: {type: 'Point', coordinates: [long, lat]}},
{_id: 1, location: {type: 'Point', coordinates: [long, lat]}, ${key}: ${value}},
{_id: 2, location: {type: 'Point', coordinates: [long, lat]}, ${key}: ${value}},
{_id: 3, location: {type: 'Point', coordinates: [long, lat]}},
{_id: 4, location: {type: 'Point', coordinates: [long, lat]}, ${key}: ${value}},
]
As we know, OpenStreetMap has tons of tags containing keys and values.
I'm trying to query the closest pair of nodes that contains a specific tag in only one query. I couldn't go much further than an simple aggregation function matching the nodes data base. In the example bellow I use the tag power: tower, but it is not mandatory to be that one.
正如我们所知,OpenStreetMap有大量包含键和值的标记。我尝试在一个查询中查询包含特定标记的最接近的节点对。除了匹配节点数据库的简单聚合函数之外,我不能做更多的工作。在下面的示例中,我使用了标记power:Tower,但它不是强制性的。
const result = await client.nodes_collection
.aggregate([
{
$match: {
power: 'tower',
},
},
{
$lookup: {
from: 'nodes',
as: 'closestNode',
pipeline: [{ $match: { power: 'tower' } }],
},
},
{
$unwind: '$closestNode',
},
{
$match: {
'closestNode._id': { $ne: '$_id' },
},
}
]).toArray()
P.S: It is mandatory to be in only one query.
附注:必须只出现在一个查询中。
更多回答
EDIT:
编辑:
For mongoDB version 5.3 or higher (With the help of @Juliana Aragão):
对于MongoDB 5.3或更高版本(在@Juliana Aragão的帮助下):
db.nodes.aggregate([
{$match: {power: "tower"}},
{$lookup: {
from: "nodes",
as: "closestNode",
let: {coords: "$location.coordinates"},
pipeline: [
{$geoNear: {
near: {
type: "Point",
coordinates: "$$coords"
},
distanceField: "distFromMe",
spherical: true,
query: {power: "tower"}
}},
{$skip: 1},
{$limit: 1},
{$project: {distFromMe: 1}}
]
}},
{$set: {closestNode: {$first: "$closestNode"}}},
{$sort: {"closestNode.distFromMe": 1}},
{$limit: 1}
])
For older versions of mongoDB, $geoNear
does not support the coordinates
as a parameter, thus one option is to create the calculation by ourselves. This solution includes calculating the distance between all pairs: O(n^2):
对于较早版本的MongoDB,$GeoNear不支持将坐标作为参数,因此一种选择是自己创建计算。此解决方案包括计算所有对之间的距离:O(n^2):
db.nodes.aggregate([
{$match: {power: "tower"}},
{$lookup: {
from: "nodes",
as: "closestNode",
let: {
long: {$multiply: [{$first: "$location.coordinates"}, 0.017452778]},
lat: {$multiply: [{$last: "$location.coordinates"}, 0.017452778]}
},
pipeline: [
{$match: {power: "tower"}},
{$set: {
lat: {$multiply: [{$last: "$location.coordinates"}, 0.017452778]},
long: {$multiply: [{$first: "$location.coordinates"}, 0.017452778]}
}},
{$set: {distFromMe: {
$let: {
vars: {
dlon: {$subtract: ["$long", "$$long"]},
dlat: {$subtract: ["$lat", "$$lat"]},
rlon: {$divide: [
{$multiply: [6378137, {$cos: {$avg: ["$lat", "$$lat"]}}]},
{$sqrt: {$subtract: [
1,
{$multiply: [
0.00669437,
{$pow: [{$sin: {$avg: ["$lat", "$$lat"]}}, 2]}
]}
]}}
]},
rlat: {$divide: [
{$multiply: [6378137, {$subtract: [1, 0.00669437]}]},
{$pow: [
{$subtract: [
1,
{$multiply: [
0.00669437,
{$pow: [
{$sin: {$avg: ["$lat", "$$lat"]}},
2
]}
]}
]},
1.5
]}
]}
},
in: {$max: [
{$sqrt: [{$add: [
{$pow: [{$multiply: ["$$dlon", "$$rlon"]}, 2]},
{$pow: [{$multiply: ["$$dlat", "$$rlat"]}, 2]}
]}]},
1
]}
}}
}},
{$sort: {distFromMe: 1}},
{$skip: 1},
{$limit: 1},
{$project: {distFromMe: 1}}
]
}},
{$set: {closestNode: {$first: "$closestNode"}}},
{$sort: {"closestNode.distFromMe": 1}},
{$limit: 1}
])
See how it works on the playground example.
看看它是如何在操场上工作的例子。
The distance calculation is according to this ref by @jimirwin:
距离的计算是根据@jimirwin的参考:
更多回答
Thank you very much, I managed to query based on your solution, but using geoNear to use more mongodb native functions here it is. It won't work on the playground because it doesn't support 2dsphere
非常感谢您,我设法根据您的解决方案进行了查询,但是使用了GeoNear来使用更多的MongoDB原生函数。它在操场上不起作用,因为它不支持2D球体
@Juliana Aragão, Using $geonear inside the pipeline is much better. I was not aware of the option of newer mongoDB versions to get the coordinates
as a parameter. I updated the answer according to your suggestion, so other people will be able to see it easily.
@Juliana Aragão,在管道中使用$geonear要好得多。我不知道更新的MongoDB版本可以将坐标作为参数来获取。我根据你的建议更新了答案,这样其他人就可以很容易地看到它。
Hey @nimrod-serok. I hope this message finds you because I can't tag your username properly. I had to amplificate the query to get the k-closest pair, and I had some Issues with that. The normal behavior is showing both results of a pair based on distance back and forth one underneath other, but somehow it is breaking it in the middle. Here I wrote what I got and the data to reproduce the issue. Any thoughts on that?
嘿@Nimrod-serok。我希望这条消息能找到你,因为我不能正确地标记你的用户名。我必须放大查询才能得到k-最接近的对,我对此有一些问题。正常的行为是显示一对基于来回距离的两个结果,一个在另一个下面,但不知何故,它在中间打破了它。我在这里写下了我得到的东西和重现这一问题的数据。对此有什么想法吗?
I don't see any problem. The code is working as expected, as you can see here, with all the requirements inside the query. If your problem is that it not always pairs, this is a valid case. Consider a line with A-10m-B--20m--C
. In this case, the closest point to C
will be B
, but the closet point to B
will be A
. Is this what you mean by breaking? What is your expected results in such a case? Only A-C
pair? B-C
should be drooped?
我看不出有什么问题。正如您在这里看到的那样,代码按照预期工作,查询中包含所有需求。如果您的问题是它并不总是配对,这是一个有效的案例。考虑一条A-10M-B--20M--C的线路。在这种情况下,与C最接近的点将是B,但与B最接近的点将是A。这就是您所说的破裂吗?在这样的情况下,你预计会有什么结果?只有A-C对吗?B-C应该下垂吗?
My bad, the code is really running as expected. I'm running the same query in other databases as well and they are returning always the pair of nodes, because I run a CROSS JOIN and them I filter everything. Now I see you optimized this situation.
我的错,代码真的像预期的那样运行。我也在其他数据库中运行相同的查询,它们总是返回节点对,因为我运行交叉联接,它们会过滤所有内容。现在我看到你优化了这种情况。
我是一名优秀的程序员,十分优秀!