如何使用聚合管道从对象数组中获取正确的百分比。

huangapple go评论58阅读模式
英文:

How to get the correct percentage from array of objects with aggregation pipeline

问题

I have translated the code part as requested. Here's the translated content:

我有一个包含对象数组的文档,我需要执行以下操作:

  • 获取成员数组的总积分。在示例中,这2个文档的总积分为5500
  • 接下来,获取每个文档的成员总积分(对于Josh,是2000;对于Carl,是3500),然后计算它们相对于总积分的百分比。公式是:(成员积分 / 总积分)* 100,或**(2000 / 5500)* 100**对于Josh。
  • 最后,返回字段 _id、leader、members 的积分和百分比字段。

应该看起来像下面这样:

{
  _id: '00001',
  leader: "Josh",
  memberpoints: 2000,
  percentage: 36.3636
},
{
  _id: '00002',
  leader: "Carl",
  memberpoints: 3500,
  percentage: 63.6363
}

这些是文档:

[{
  "_id": { "$oid": "00001" },
  "leader": "Josh",
  "members": [
    {
      "name": "Person A",
      "points": 500
    },
    {
      "name": "Person B",
      "points": 500
    },
    {
      "name": "Person C",
      "points": 1000
    }]
  },
  {
  "_id": { "$oid": "00002" },
  "leader": "Carl",
  "members": [
    {
      "name": "Person D",
      "points": 1000
    },
    {
      "name": "Person E",
      "points": 1000
    },
    {
      "name": "Person F",
      "points": 1500
    }]
}]

但是,我得到了错误的百分比,因为似乎只从成员数组中获取了一个对象。这可能与我的聚合管道中的 $set 部分有关,特别是 [{ $divide: ["$members.points", "$total"] }, 100],但我不确定。

错误的百分比,应该是 36.3636

{
  _id: '00001',
  leader: "Josh",
  memberpoints: 2000,
  percentage: 18.1818
}

这是我的 MongoDB 聚合管道:

db.users.aggregate([
{$unwind: "$members"},
{$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
{$set: { percentage: { $multiply: [{ $divide: ["$members.points", "$total"] }, 100] }}},
{$group: { _id: "$_id", leader: {$first: "$leader"}, memberpoints: {$sum: "$members.points"}, percentage: {$first: "$percentage"}}}])
英文:

I have documents with an array of objects, and I need to do the following:

  • Get the total points of member arrays. On the example, the 2 documents have a total of 5500 points.
  • Next, get each document's members total points (for Josh it's 2000, for Carl it's 3500) and then compute what percent it is against the total points. Formula is: (members' points / total points) * 100 or (2000 / 5500) * 100 for Josh.
  • Finally, return the fields _id, leader, members' points, and percentage fields.

It should look something like below:

{
  _id: '00001',
  leader: "Josh",
  memberpoints: 2000,
  percentage: 36.3636
},
{
  _id: '00002',
  leader: "Carl",
  memberpoints: 3500,
  percentage: 63.6363
}

These are the documents:

[{
  "_id": { "$oid": "00001" },
  "leader": "Josh",
  "members": [
    {
      "name": "Person A",
      "points": 500
    },
    {
      "name": "Person B",
      "points": 500
    },
    {
      "name": "Person C",
      "points": 1000
    }]
  },
  {
  "_id": { "$oid": "00002" },
  "leader": "Carl",
  "members": [
    {
      "name": "Person D",
      "points": 1000
    },
    {
      "name": "Person E",
      "points": 1000
    },
    {
      "name": "Person F",
      "points": 1500
    }]
}]

But, I'm getting the wrong percentage because it seems like it's only getting a single object from the members array. It might have something to do with the $set part of my aggregation pipeline, specifically [{ $divide: ["$members.points", "$total"] }, 100], but I'm not sure.

Wrong percentage, should be 36.3636

{
  _id: '00001',
  leader: "Josh",
  memberpoints: 2000,
  percentage: 18.1818
}

Here is my MongoDB aggregation pipeline:

db.users.aggregate([
{$unwind: "$members"}, 
{$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
{$set: { percentage: { $multiply: [{ $divide: ["$members.points", "$total"] }, 100] }}},
{$group: { _id: "$_id", leader: {$first: "$leader"}, memberpoints: {$sum: "$members.points"}, percentage: {$first: "$percentage"}}}])

答案1

得分: 1

Your approach is almost correct; just the order of $set and $group are reversed. Here is a pipeline that produces the desired output with some comments:

你的方法几乎正确只是`$set``$group`的顺序颠倒了以下是一个带有一些注释的管道可以生成所需的输出
db.foo.aggregate([
    {$unwind: "$members"},

    // Great way to get the sum of everything in one pass without having
    // $group get in the way:
    {$setWindowFields: { output: { total: { $sum: "$members.points" }}}},

    // Basically home free. Now re-group, summing the points. The trick
    // here is to NOT lose the total amount; we will need it after we $sum.
    // We will call it 'percentage' here but only as a placeholder; we
    // will overwrite it with the REAL percentage in the next stage:
    {$group: {_id: "$_id",
              leader: {$first: "$leader"},
              memberpoints: {$sum: "$members.points"},
              percentage: {$first: "$total"}  // not really pct yet.
             }},

    // Now turn it into REAL pct. The overwrite trick allows us to
    // to not have to unset a "temporary" total value.
    {$addFields: {
        percentage: {$multiply: [100, {$divide: ['$memberpoints', '$percentage']}]}
    }}
)]

这是一个漂亮的版本,它通过让$reduce来累加memberpoints来避免了$unwind和重新$group。消除$unwind/$group阶段的影响不容小觑。对于一个包含1,000,000个文档的样本集,上述解决方案平均运行时间为13128ms;下面更紧凑的版本仅需6020ms,快了一倍:


```javascript
db.foo.aggregate([

    {$setWindowFields: {
        output: {
          total: { $sum: {$reduce: {
                         input: "$members",
                         initialValue: 0,
                         in: {$add: [ "$$value", "$$this.points"]}
	                 }}
                 }
	      }
    }}

    ,{$project: {
	    _id: true,
        leader:	true,
	    memberpoints: {$sum:'$members.points'},
	    percentage:	{$multiply:[100,{$divide: [{$sum:'$members.points'},'$total']}]}
    }}

]);

这是一个更精简的版本,通过使用$reduce来计算memberpoints的总和,避免了$unwind和重新$group的步骤。这种更紧凑的版本在性能上更优越,对于包含1,000,000个文档的样本集,运行时间仅为6020ms,是之前版本的两倍快。

英文:

Your approach is almost correct; just the order of $set and $group are reversed. Here is a pipeline that produces the desired output with some comments:

db.foo.aggregate([
    {$unwind: "$members"},

    // Great way to get the sum of everything in one pass without having
    // $group get in the way:
    {$setWindowFields: { output: { total: { $sum: "$members.points" }}}},

    // Basically home free.  Now re-group, summing the points.  The trick
    // here is to NOT lose the total amount; we will need it after we $sum.
    // We will call it 'percentage' here but only as a placeholder; we
    // will overwrite it with the REAL percentage in the next stage:
    {$group: {_id: "$_id",
              leader: {$first: "$leader"},
              memberpoints: {$sum: "$members.points"},
              percentage: {$first: "$total"}  // not really pct yet.
             }},

    // Now turn it into REAL pct.  The overwrite trick allows us to
    // to not have to unset a "temporary" total value.
    {$addFields: {
        percentage: {$multiply: [100, {$divide: ['$memberpoints', '$percentage']}]}
    }}
)]

Here is a fancy version that avoids the $unwind and re-$group by letting $reduce sum the memberpoints. Eliminating $unwind/$group stages is not to be underestimated. With a sample set of 1,000,000
docs, the solution above runs in 13128ms (avg); the more compact version below runs in just 6020ms -- TWICE as fast:

db.foo.aggregate([

    {$setWindowFields: {
        output: {
          total: { $sum: {$reduce: {
                         input: "$members",
                         initialValue: 0,
                         in: {$add: [ "$$value", "$$this.points"]}
	                 }}
                 }
	      }
    }}

    ,{$project: {
	    _id: true,
        leader:	true,
	    memberpoints: {$sum:'$members.points'},
	    percentage:	{$multiply:[100,{$divide: [{$sum:'$members.points'},'$total']}]}
    }}

]);

huangapple
  • 本文由 发表于 2023年5月25日 21:27:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76332833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定