如何使用聚合管道从对象数组中获取正确的百分比。

huangapple go评论79阅读模式
英文:

How to get the correct percentage from array of objects with aggregation pipeline

问题

I have translated the code part as requested. Here's the translated content:

我有一个包含对象数组的文档,我需要执行以下操作:

  • 获取成员数组的总积分。在示例中,这2个文档的总积分为5500
  • 接下来,获取每个文档的成员总积分(对于Josh,是2000;对于Carl,是3500),然后计算它们相对于总积分的百分比。公式是:(成员积分 / 总积分)* 100,或**(2000 / 5500)* 100**对于Josh。
  • 最后,返回字段 _id、leader、members 的积分和百分比字段。

应该看起来像下面这样:

  1. {
  2. _id: '00001',
  3. leader: "Josh",
  4. memberpoints: 2000,
  5. percentage: 36.3636
  6. },
  7. {
  8. _id: '00002',
  9. leader: "Carl",
  10. memberpoints: 3500,
  11. percentage: 63.6363
  12. }

这些是文档:

  1. [{
  2. "_id": { "$oid": "00001" },
  3. "leader": "Josh",
  4. "members": [
  5. {
  6. "name": "Person A",
  7. "points": 500
  8. },
  9. {
  10. "name": "Person B",
  11. "points": 500
  12. },
  13. {
  14. "name": "Person C",
  15. "points": 1000
  16. }]
  17. },
  18. {
  19. "_id": { "$oid": "00002" },
  20. "leader": "Carl",
  21. "members": [
  22. {
  23. "name": "Person D",
  24. "points": 1000
  25. },
  26. {
  27. "name": "Person E",
  28. "points": 1000
  29. },
  30. {
  31. "name": "Person F",
  32. "points": 1500
  33. }]
  34. }]

但是,我得到了错误的百分比,因为似乎只从成员数组中获取了一个对象。这可能与我的聚合管道中的 $set 部分有关,特别是 [{ $divide: ["$members.points", "$total"] }, 100],但我不确定。

错误的百分比,应该是 36.3636

  1. {
  2. _id: '00001',
  3. leader: "Josh",
  4. memberpoints: 2000,
  5. percentage: 18.1818
  6. }

这是我的 MongoDB 聚合管道:

  1. db.users.aggregate([
  2. {$unwind: "$members"},
  3. {$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
  4. {$set: { percentage: { $multiply: [{ $divide: ["$members.points", "$total"] }, 100] }}},
  5. {$group: { _id: "$_id", leader: {$first: "$leader"}, memberpoints: {$sum: "$members.points"}, percentage: {$first: "$percentage"}}}])
英文:

I have documents with an array of objects, and I need to do the following:

  • Get the total points of member arrays. On the example, the 2 documents have a total of 5500 points.
  • Next, get each document's members total points (for Josh it's 2000, for Carl it's 3500) and then compute what percent it is against the total points. Formula is: (members' points / total points) * 100 or (2000 / 5500) * 100 for Josh.
  • Finally, return the fields _id, leader, members' points, and percentage fields.

It should look something like below:

  1. {
  2. _id: '00001',
  3. leader: "Josh",
  4. memberpoints: 2000,
  5. percentage: 36.3636
  6. },
  7. {
  8. _id: '00002',
  9. leader: "Carl",
  10. memberpoints: 3500,
  11. percentage: 63.6363
  12. }

These are the documents:

  1. [{
  2. "_id": { "$oid": "00001" },
  3. "leader": "Josh",
  4. "members": [
  5. {
  6. "name": "Person A",
  7. "points": 500
  8. },
  9. {
  10. "name": "Person B",
  11. "points": 500
  12. },
  13. {
  14. "name": "Person C",
  15. "points": 1000
  16. }]
  17. },
  18. {
  19. "_id": { "$oid": "00002" },
  20. "leader": "Carl",
  21. "members": [
  22. {
  23. "name": "Person D",
  24. "points": 1000
  25. },
  26. {
  27. "name": "Person E",
  28. "points": 1000
  29. },
  30. {
  31. "name": "Person F",
  32. "points": 1500
  33. }]
  34. }]

But, I'm getting the wrong percentage because it seems like it's only getting a single object from the members array. It might have something to do with the $set part of my aggregation pipeline, specifically [{ $divide: ["$members.points", "$total"] }, 100], but I'm not sure.

Wrong percentage, should be 36.3636

  1. {
  2. _id: '00001',
  3. leader: "Josh",
  4. memberpoints: 2000,
  5. percentage: 18.1818
  6. }

Here is my MongoDB aggregation pipeline:

  1. db.users.aggregate([
  2. {$unwind: "$members"},
  3. {$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
  4. {$set: { percentage: { $multiply: [{ $divide: ["$members.points", "$total"] }, 100] }}},
  5. {$group: { _id: "$_id", leader: {$first: "$leader"}, memberpoints: {$sum: "$members.points"}, percentage: {$first: "$percentage"}}}])

答案1

得分: 1

Your approach is almost correct; just the order of $set and $group are reversed. Here is a pipeline that produces the desired output with some comments:

  1. 你的方法几乎正确只是`$set``$group`的顺序颠倒了以下是一个带有一些注释的管道可以生成所需的输出
  1. db.foo.aggregate([
  2. {$unwind: "$members"},
  3. // Great way to get the sum of everything in one pass without having
  4. // $group get in the way:
  5. {$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
  6. // Basically home free. Now re-group, summing the points. The trick
  7. // here is to NOT lose the total amount; we will need it after we $sum.
  8. // We will call it 'percentage' here but only as a placeholder; we
  9. // will overwrite it with the REAL percentage in the next stage:
  10. {$group: {_id: "$_id",
  11. leader: {$first: "$leader"},
  12. memberpoints: {$sum: "$members.points"},
  13. percentage: {$first: "$total"} // not really pct yet.
  14. }},
  15. // Now turn it into REAL pct. The overwrite trick allows us to
  16. // to not have to unset a "temporary" total value.
  17. {$addFields: {
  18. percentage: {$multiply: [100, {$divide: ['$memberpoints', '$percentage']}]}
  19. }}
  20. )]

这是一个漂亮的版本,它通过让$reduce来累加memberpoints来避免了$unwind和重新$group。消除$unwind/$group阶段的影响不容小觑。对于一个包含1,000,000个文档的样本集,上述解决方案平均运行时间为13128ms;下面更紧凑的版本仅需6020ms,快了一倍:

  1. ```javascript
  2. db.foo.aggregate([
  3. {$setWindowFields: {
  4. output: {
  5. total: { $sum: {$reduce: {
  6. input: "$members",
  7. initialValue: 0,
  8. in: {$add: [ "$$value", "$$this.points"]}
  9. }}
  10. }
  11. }
  12. }}
  13. ,{$project: {
  14. _id: true,
  15. leader: true,
  16. memberpoints: {$sum:'$members.points'},
  17. percentage: {$multiply:[100,{$divide: [{$sum:'$members.points'},'$total']}]}
  18. }}
  19. ]);

这是一个更精简的版本,通过使用$reduce来计算memberpoints的总和,避免了$unwind和重新$group的步骤。这种更紧凑的版本在性能上更优越,对于包含1,000,000个文档的样本集,运行时间仅为6020ms,是之前版本的两倍快。

英文:

Your approach is almost correct; just the order of $set and $group are reversed. Here is a pipeline that produces the desired output with some comments:

  1. db.foo.aggregate([
  2. {$unwind: "$members"},
  3. // Great way to get the sum of everything in one pass without having
  4. // $group get in the way:
  5. {$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
  6. // Basically home free. Now re-group, summing the points. The trick
  7. // here is to NOT lose the total amount; we will need it after we $sum.
  8. // We will call it 'percentage' here but only as a placeholder; we
  9. // will overwrite it with the REAL percentage in the next stage:
  10. {$group: {_id: "$_id",
  11. leader: {$first: "$leader"},
  12. memberpoints: {$sum: "$members.points"},
  13. percentage: {$first: "$total"} // not really pct yet.
  14. }},
  15. // Now turn it into REAL pct. The overwrite trick allows us to
  16. // to not have to unset a "temporary" total value.
  17. {$addFields: {
  18. percentage: {$multiply: [100, {$divide: ['$memberpoints', '$percentage']}]}
  19. }}
  20. )]

Here is a fancy version that avoids the $unwind and re-$group by letting $reduce sum the memberpoints. Eliminating $unwind/$group stages is not to be underestimated. With a sample set of 1,000,000
docs, the solution above runs in 13128ms (avg); the more compact version below runs in just 6020ms -- TWICE as fast:

  1. db.foo.aggregate([
  2. {$setWindowFields: {
  3. output: {
  4. total: { $sum: {$reduce: {
  5. input: "$members",
  6. initialValue: 0,
  7. in: {$add: [ "$$value", "$$this.points"]}
  8. }}
  9. }
  10. }
  11. }}
  12. ,{$project: {
  13. _id: true,
  14. leader: true,
  15. memberpoints: {$sum:'$members.points'},
  16. percentage: {$multiply:[100,{$divide: [{$sum:'$members.points'},'$total']}]}
  17. }}
  18. ]);

huangapple
  • 本文由 发表于 2023年5月25日 21:27:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76332833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定