gpt4 book ai didi

MongoDB Map-Reduce : One document that needs to be incorporated into all others matching a condition?

转载 作者:行者123 更新时间:2023-12-02 13:26:52 25 4
gpt4 key购买 nike

甚至不确定用来问这个问题的正确术语,但我们开始吧。

我有一个集合,我正在使用 MapReduce 来执行聚合任务。我无法使用聚合管道,因为我需要在减少时执行自定义代码。

为了使问题更清晰,对此进行了稍微简化。

  • 我有一个集合,其中每个文档都包含一个位置(即网格单元 ID)和一个时间片(由该时间片开始处的时间戳表示),并包含诸如“汽车数量”之类的信息, ETC;每个位置可能有数千个此类文档,每个时间段也可能有多个此类文档。
  • 此外,对于每个位置,可能存在“时间片”属性为空的文档。其中包含有关静态特征等的信息:即没有与之关联的时间戳的数据。

我想要做的是运行一个映射缩减过程,其中输出文档由位置 ID 和时间片作为键控,最重要的是,我能够将非定时数据与定时数据合并。

这里是一些示例输入(数据方面非常简化,但 cell_idtimeslice 值正是我必须使用的值):

[
{
"cell_id": 100,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 5,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 100,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 4,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 100,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 1,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 100,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 7,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 100,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 2,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 100,
"timeslice": null,
"num_vehicles": null,
"num_residential_units": 30,
"num_commercial_units": 12
},
{
"cell_id": 101,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 5,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 101,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 1,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 101,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 2,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 101,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 1,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 101,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 0,
"num_residential_units": null,
"num_commercial_units": null
},
{
"cell_id": 101,
"timeslice": null,
"num_vehicles": null,
"num_residential_units": 8,
"num_commercial_units": 1
},
{
"cell_id": 100,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 10,
"num_residential_units": 30,
"num_commercial_units": 12
},
{
"cell_id": 100,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 9,
"num_residential_units": 30,
"num_commercial_units": 12
},
{
"cell_id": 101,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 5,
"num_residential_units": 8,
"num_commercial_units": 1
},
{
"cell_id": 101,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 4,
"num_residential_units": 8,
"num_commercial_units": 1
}
]

...以及我希望该输入产生的输出(我没有将其拆分为 _idvalue,但本质上是 cell_id timeslice 将是 _id:

[
{
"cell_id": 100,
"timeslice": null,
"num_vehicles": null,
"num_residential_units": 30,
"num_commercial_units": 12
},
{
"cell_id": 100,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 10,
"num_residential_units": 30,
"num_commercial_units": 12
},
{
"cell_id": 100,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 9,
"num_residential_units": 30,
"num_commercial_units": 12
},
{
"cell_id": 101,
"timeslice": null,
"num_vehicles": null,
"num_residential_units": 8,
"num_commercial_units": 1
},
{
"cell_id": 101,
"timeslice": "2019-03-20T00:00:00.000Z",
"num_vehicles": 5,
"num_residential_units": 8,
"num_commercial_units": 1
},
{
"cell_id": 101,
"timeslice": "2019-03-21T00:00:00.000Z",
"num_vehicles": 4,
"num_residential_units": 8,
"num_commercial_units": 1
}
]

如果 Emit 阶段按位置和时间对发出的文档进行键控,那么我就可以将所有定时数据正确地放入化简函数中,并且我可以自行减少非计时数据...但我需要以某种方式还将该非定时数据合并到每个简化的定时数据文档中。是否有某种方法可以在最终确定阶段执行此操作,或者是否有一些巧妙的方法来设置 key ......?我很困惑。坦率地说,该解决方案是否涉及映射缩减对我来说并不重要,但它必须在有限的硬件上大规模高效。

最佳答案

你可以尝试这样的事情。

下面的查询将获取非空时间戳行,后跟组以获取聚合值。获得聚合文档后,您将加入回同一集合以拉入未计时的行。

db.collection.aggregate([
{"$match":{"timeslice":{"$ne":null}}},
{"$group":{
"_id":{"cell_id":"$cell_id","timeslice":"$timeslice"},
"num_vehicles":{"$sum":"$num_vehicles"}
}},
{"$lookup":{
"from":"collection",
"localField":"_id.cell_id",
"foreignField":"cell_id",
"as":"untimed_doc"
}},
{"$unwind":"$untimed_doc"},
{"$match":{"untimed_doc.timeslice":{"$eq":null}}}
])

关于MongoDB Map-Reduce : One document that needs to be incorporated into all others matching a condition?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60269664/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com