gpt4 book ai didi

mongodb - MapReduce 子文档

转载 作者:可可西里 更新时间:2023-11-01 10:06:54 35 4
gpt4 key购买 nike

我正在尝试绘制我在 Mongo 数据库中记录的电子邮件事件的图表。每当我发送一封电子邮件时,我都会创建一条记录,然后,当电子邮件有事件(打开、点击、标记为垃圾邮件)时,我会通过添加到历史记录来更新文档。

这是一个示例文档:

{
"_id" : new BinData(3, "wbbS0lRI0ESx5DyStKq9pA=="),
"MemberId" : null,
"NewsletterId" : 4,
"NewsletterTypeId" : null,
"Contents" : "[message goes here]",
"History" : [{
"EmailActionType" : "spam",
"DateAdded" : new Date("Sat, 10 Dec 2011 04:17:26 GMT -08:00")
}, {
"EmailActionType" : "processed",
"DateAdded" : new Date("Sun, 11 Dec 2011 04:17:26 GMT -08:00")
}, {
"EmailActionType" : "deffered",
"DateAdded" : new Date("Mon, 12 Dec 2011 04:17:26 GMT -08:00")
}],
"DateAdded" : new Date("Mon, 01 Jan 0001 00:00:00 GMT -08:00")
}

我想做的是在数据库中查询特定的历史日期范围。最终结果应该是一个列表,其中包含每天有一项事件的项目以及每种事件类型的总数:

date: "20111210", spam: 1, processed: 0, deffered: 0
date: "20111211", spam: 0, processed: 1, deffered: 0
date: "20111212", spam: 0, processed: 0, deffered: 1

这是我目前拥有的:

db.runCommand({ mapreduce: Email, 
map : function Map() {
var key = this.NewsletterId;
emit(
key,
{ "history" : this.History }
);
}
reduce : function Reduce(key, history) {
var from = new Date (2011, 1, 1, 0, 0, 0, 0);
var to = new Date (2013, 05, 15, 23, 59, 59, 0);

// \/ determine # days in the date range \/
var ONE_DAY = 1000 * 60 * 60 * 24; // The number of milliseconds in one day
var from_ms = from.getTime(); // Convert both date1 to milliseconds
var to_ms = to.getTime(); // Convert both date1 to milliseconds

var difference_ms = Math.abs(from_ms - to_ms); // Calculate the difference in milliseconds
var numDays = Math.round(difference_ms/ONE_DAY); // Convert back to days and return
// /\ determine # days between the two days /\

var results = new Array(numDays); //array where we will store the results. We will have an entry for each day in the date range.

//initialize array that will contain our results for each type of emailActivity
for(var i=0; i < numDays; i++){
results[i] = {
numSpam: 0,
numProcessed: 0,
numDeffered: 0
}
}

//traverse the history records and count each type of event
for (var i = 0; i < history.length; i++){
var to_ms2 = history[i].DateAdded.getTime(); // Convert both date1 to milliseconds

var difference_ms2 = Math.abs(from_ms - to_ms2); // Calculate the difference in milliseconds
var resultsIndex = Math.round(difference_ms2/ONE_DAY); //determine which row in the results array this date corresponds to

switch(history[i].EmailActionType)
{
case 'spam':
results[resultsIndex].numSpam = ++results[resultsIndex].numSpam;
break;
case 'processed':
results[resultsIndex].numProcessed = ++results[resultsIndex].numProcessed;
break;
case 'deffered':
results[resultsIndex].numDeffered = ++results[resultsIndex].numDeffered;
break;
}
}
return results;
}
finalize : function Finalize(key, reduced) {
return {
"numSpam": reduced.numSpam,
"numProcessed": reduced.numProcessed,
"numDeffered": reduced.numDeffered,
};
}
out : { inline : 1 }
});

当我运行它时,我没有得到任何东西,但我也没有收到任何错误,所以不太确定去哪里找。

最佳答案

您的问题肯定出在您的 Map/Reduce 函数中。您的 emit 与预期输出之间存在脱节。

您的预期输出:

date: "20111210", spam: 1, processed: 0, deffered: 0

Map/Reduce 总是根据 keyvalue 输出。所以你的输出看起来像这样:

_id: "20111220", value: { spam: 1, processed: 0, deferred: 0 }

这是基本前提。您的 emit 需要输出正确格式的数据。所以如果你emit(key, value),那么你应该:

var key='20111220'
var value={spam:1, processed:0, deferred:0}

在您的情况下,当您循环遍历 History 时,每个文档会发出多次。这是正常的。

reduce 函数仅在同一个键有多个值时运行。所以如果你有这个:

_id: "20111220", value: { spam: 1, processed: 0, deferred: 0 }
_id: "20111220", value: { spam: 1, processed: 2, deferred: 0 }

然后 reduce 将把它们放在一起并给你这个:

_id: "20111220", value: { spam: **2**, processed: **2**, deferred: 0 }

这里是一个快速的答案:

map = function() {
for(var i in this.History) {
var key = get_date(this.History[i].DateAdded);
var value = {spam: 0, processed: 0, deffered: 0};

if(this.History[i].EmailActionType == "Spam") { value.spam++; }
else if(....)
...

emit(key, value);
}
}

reduce = function(key, values) {
// values is an array of these things {spam: 0, processed: 0, deffered: 0}
var returnValue = { spam: 1, processed: 0, deffered: 0 };
for(var i in values) {
returnValue.spam += values[i].spam;
returnValue.processed += values[i].processed;
returnValue.deffered += values[i].deffered;
}
return returnValue;
}

请记住,emit 的结构必须与最终值的结构相匹配。

关于mongodb - MapReduce 子文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8979228/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com