gpt4 book ai didi

python - 删除mongodb中的重复值

转载 作者:可可西里 更新时间:2023-11-01 09:56:18 24 4
gpt4 key购买 nike

我正在使用 python 和 Tornado 学习 mongodb。我有一个 mongodb 集合,当我这样做的时候

db.cal.find()

{
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"Period" : "10-2015",
"AOs": [
"14-10-2015",
"15-10-2015",
"18-10-2015",
"14-10-2015",
"15-10-2015",
"18-10-2015"
],
"Booked": [
"5-10-2015",
"7-10-2015",
"8-10-2015",
"5-10-2015",
"7-10-2015",
"8-10-2015"
],
"NA": [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015",
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015"
],

"AOr": [
"23-10-2015",
"27-10-2015",
"23-10-2015",
"27-10-2015"
]
}

我需要一个操作来删除 Booked,NA,AOs,AOr 中的重复值。最后应该是

{
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"AOs": [
"14-10-2015",
"15-10-2015",
"18-10-2015",

],
"Booked": [
"5-10-2015",
"7-10-2015",
"8-10-2015",

],

"NA": [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015",

],

"AOr": [
"23-10-2015",
"27-10-2015",

]
}

我如何在 mongodb 中实现这一点?

最佳答案

工作解决方案

我已经创建了一个基于 JavaScript 的工作解决方案,它在 mongo shell 上可用:

var codes = ["AOs", "Booked", "NA", "AOr"]

// Use bulk operations for efficiency
var bulk = db.dupes.initializeUnorderedBulkOp()

db.dupes.find().forEach(
function(doc) {

// Needed to prevent unnecessary operatations
changed = false
codes.forEach(
function(code) {
var values = doc[code]
var uniq = []

for (var i = 0; i < values.length; i++) {
// If the current value can not be found, it is unique
// in the "uniq" array after insertion
if (uniq.indexOf(values[i]) == -1 ){
uniq.push(values[i])
}
}

doc[code] = uniq

if (uniq.length < values.length) {
changed = true
}

}
)

// Update the document only if something was changed
if (changed) {
bulk.find({"_id":doc._id}).updateOne(doc)
}
}
)

// Apply all changes
bulk.execute()

包含示例输入的结果文档:

replset:PRIMARY> db.dupes.find().pretty()
{
"_id" : ObjectId("567931aefefcd72d0523777b"),
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"Period" : "10-2015",
"AOs" : [
"14-10-2015",
"15-10-2015",
"18-10-2015"
],
"Booked" : [
"5-10-2015",
"7-10-2015",
"8-10-2015"
],
"NA" : [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015"
],
"AOr" : [
"23-10-2015",
"27-10-2015"
]
}

通过 dropDups 使用索引

这根本行不通。首先,根据 3.0 版,此选项不再存在。既然我们已经发布了 3.2,我们应该找到一种可移植的方式。

其次,即使有 dropDups,文档也明确指出:

dropDups boolean : MongoDB indexes only the first occurrence of a key and removes all documents from the collection that contain subsequent occurrences of that key.

因此,如果有另一个文档在其中一个帐单代码中具有与前一个相同的值,则整个文档将被删除。

关于python - 删除mongodb中的重复值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34414422/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com