英文:
How to get the correct percentage from array of objects with aggregation pipeline
问题
I have translated the code part as requested. Here's the translated content:
我有一个包含对象数组的文档,我需要执行以下操作:
- 获取成员数组的总积分。在示例中,这2个文档的总积分为5500。
- 接下来,获取每个文档的成员总积分(对于Josh,是2000;对于Carl,是3500),然后计算它们相对于总积分的百分比。公式是:(成员积分 / 总积分)* 100,或**(2000 / 5500)* 100**对于Josh。
- 最后,返回字段 _id、leader、members 的积分和百分比字段。
应该看起来像下面这样:
{
_id: '00001',
leader: "Josh",
memberpoints: 2000,
percentage: 36.3636
},
{
_id: '00002',
leader: "Carl",
memberpoints: 3500,
percentage: 63.6363
}
这些是文档:
[{
"_id": { "$oid": "00001" },
"leader": "Josh",
"members": [
{
"name": "Person A",
"points": 500
},
{
"name": "Person B",
"points": 500
},
{
"name": "Person C",
"points": 1000
}]
},
{
"_id": { "$oid": "00002" },
"leader": "Carl",
"members": [
{
"name": "Person D",
"points": 1000
},
{
"name": "Person E",
"points": 1000
},
{
"name": "Person F",
"points": 1500
}]
}]
但是,我得到了错误的百分比,因为似乎只从成员数组中获取了一个对象。这可能与我的聚合管道中的 $set 部分有关,特别是 [{ $divide: ["$members.points", "$total"] }, 100],但我不确定。
错误的百分比,应该是 36.3636
{
_id: '00001',
leader: "Josh",
memberpoints: 2000,
percentage: 18.1818
}
这是我的 MongoDB 聚合管道:
db.users.aggregate([
{$unwind: "$members"},
{$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
{$set: { percentage: { $multiply: [{ $divide: ["$members.points", "$total"] }, 100] }}},
{$group: { _id: "$_id", leader: {$first: "$leader"}, memberpoints: {$sum: "$members.points"}, percentage: {$first: "$percentage"}}}])
英文:
I have documents with an array of objects, and I need to do the following:
- Get the total points of member arrays. On the example, the 2 documents have a total of 5500 points.
- Next, get each document's members total points (for Josh it's 2000, for Carl it's 3500) and then compute what percent it is against the total points. Formula is: (members' points / total points) * 100 or (2000 / 5500) * 100 for Josh.
- Finally, return the fields _id, leader, members' points, and percentage fields.
It should look something like below:
{
_id: '00001',
leader: "Josh",
memberpoints: 2000,
percentage: 36.3636
},
{
_id: '00002',
leader: "Carl",
memberpoints: 3500,
percentage: 63.6363
}
These are the documents:
[{
"_id": { "$oid": "00001" },
"leader": "Josh",
"members": [
{
"name": "Person A",
"points": 500
},
{
"name": "Person B",
"points": 500
},
{
"name": "Person C",
"points": 1000
}]
},
{
"_id": { "$oid": "00002" },
"leader": "Carl",
"members": [
{
"name": "Person D",
"points": 1000
},
{
"name": "Person E",
"points": 1000
},
{
"name": "Person F",
"points": 1500
}]
}]
But, I'm getting the wrong percentage because it seems like it's only getting a single object from the members array. It might have something to do with the $set part of my aggregation pipeline, specifically [{ $divide: ["$members.points", "$total"] }, 100], but I'm not sure.
Wrong percentage, should be 36.3636
{
_id: '00001',
leader: "Josh",
memberpoints: 2000,
percentage: 18.1818
}
Here is my MongoDB aggregation pipeline:
db.users.aggregate([
{$unwind: "$members"},
{$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
{$set: { percentage: { $multiply: [{ $divide: ["$members.points", "$total"] }, 100] }}},
{$group: { _id: "$_id", leader: {$first: "$leader"}, memberpoints: {$sum: "$members.points"}, percentage: {$first: "$percentage"}}}])
答案1
得分: 1
Your approach is almost correct; just the order of $set
and $group
are reversed. Here is a pipeline that produces the desired output with some comments:
你的方法几乎正确,只是`$set`和`$group`的顺序颠倒了。以下是一个带有一些注释的管道,可以生成所需的输出:
db.foo.aggregate([
{$unwind: "$members"},
// Great way to get the sum of everything in one pass without having
// $group get in the way:
{$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
// Basically home free. Now re-group, summing the points. The trick
// here is to NOT lose the total amount; we will need it after we $sum.
// We will call it 'percentage' here but only as a placeholder; we
// will overwrite it with the REAL percentage in the next stage:
{$group: {_id: "$_id",
leader: {$first: "$leader"},
memberpoints: {$sum: "$members.points"},
percentage: {$first: "$total"} // not really pct yet.
}},
// Now turn it into REAL pct. The overwrite trick allows us to
// to not have to unset a "temporary" total value.
{$addFields: {
percentage: {$multiply: [100, {$divide: ['$memberpoints', '$percentage']}]}
}}
)]
这是一个漂亮的版本,它通过让$reduce
来累加memberpoints
来避免了$unwind
和重新$group
。消除$unwind
/$group
阶段的影响不容小觑。对于一个包含1,000,000个文档的样本集,上述解决方案平均运行时间为13128ms;下面更紧凑的版本仅需6020ms,快了一倍:
```javascript
db.foo.aggregate([
{$setWindowFields: {
output: {
total: { $sum: {$reduce: {
input: "$members",
initialValue: 0,
in: {$add: [ "$$value", "$$this.points"]}
}}
}
}
}}
,{$project: {
_id: true,
leader: true,
memberpoints: {$sum:'$members.points'},
percentage: {$multiply:[100,{$divide: [{$sum:'$members.points'},'$total']}]}
}}
]);
这是一个更精简的版本,通过使用$reduce
来计算memberpoints
的总和,避免了$unwind
和重新$group
的步骤。这种更紧凑的版本在性能上更优越,对于包含1,000,000个文档的样本集,运行时间仅为6020ms,是之前版本的两倍快。
英文:
Your approach is almost correct; just the order of $set
and $group
are reversed. Here is a pipeline that produces the desired output with some comments:
db.foo.aggregate([
{$unwind: "$members"},
// Great way to get the sum of everything in one pass without having
// $group get in the way:
{$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
// Basically home free. Now re-group, summing the points. The trick
// here is to NOT lose the total amount; we will need it after we $sum.
// We will call it 'percentage' here but only as a placeholder; we
// will overwrite it with the REAL percentage in the next stage:
{$group: {_id: "$_id",
leader: {$first: "$leader"},
memberpoints: {$sum: "$members.points"},
percentage: {$first: "$total"} // not really pct yet.
}},
// Now turn it into REAL pct. The overwrite trick allows us to
// to not have to unset a "temporary" total value.
{$addFields: {
percentage: {$multiply: [100, {$divide: ['$memberpoints', '$percentage']}]}
}}
)]
Here is a fancy version that avoids the $unwind
and re-$group
by letting $reduce
sum the memberpoints. Eliminating $unwind
/$group
stages is not to be underestimated. With a sample set of 1,000,000
docs, the solution above runs in 13128ms (avg); the more compact version below runs in just 6020ms -- TWICE as fast:
db.foo.aggregate([
{$setWindowFields: {
output: {
total: { $sum: {$reduce: {
input: "$members",
initialValue: 0,
in: {$add: [ "$$value", "$$this.points"]}
}}
}
}
}}
,{$project: {
_id: true,
leader: true,
memberpoints: {$sum:'$members.points'},
percentage: {$multiply:[100,{$divide: [{$sum:'$members.points'},'$total']}]}
}}
]);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论