MongoDB聚合以多个字段分组。

huangapple go评论106阅读模式
英文:

MongoDB aggregation to group by multiple fields

问题

以下是您要翻译的内容:

我在我的集合中有以下数据

{
  “name": “test”,
  "data": {
    “statusOne”: “enabled”,
    “statusTwo”: “active”
  }
}
{
  “name": “test”,
  "data": {
    “statusOne”: “disabled”,
    “statusTwo”: “active”
  }
}
{
  “name": “another-test”,
  "data": {
    “statusOne”: “disabled”,
    “statusTwo”: “active”
  }
}

如何编写聚合查询以显示以下数据。需要按名称,statusOne和statusTwo进行分组。但首先基于名称字段对结果进行分隔。然后计算statusOne和statusTwo的出现次数。这两个结果都需要放入相同的输出字段"data"中。

“output”: [
{
	“name”: “test”,
	"data": [
		{
			“status”: “active”,
			“count”: 2
		},
		{
			“status”: “disabled”,
			”count”: 1
		},
		{
			“status”: “enabled”,
			”count”: 1
		}
	]
},
{
	“name”: “another-test”,
	"data": [
		{
			“status”: “active”,
			”count”: 1
		},
		{
			“status”: “disabled”,
			”count”: 1
		}
	]
}
]

尝试使用如下所述的分组序列,但没有成功

[
  {
    $group: {
      _id: {
        appName: "$name”,
        cs: "$data.statusOne”,
        ps: "$data.statusTwo,
      },
      total: {
        $sum: 1,
      },
    },
  },
  {
    $group: {
      _id: "$_id.name”,
      total: { $sum: "$total" },
      ps: {
        $addToSet: {
          name: "$_id.ps",
          count: "$total",
        },
      },
      cs: {
        $addToSet: {
          name: "$_id.cs",
          count: "$total",
        },
      },
    },
  },
  {
    $project: {
      _id: 0,
      appName: "$_id",
      items: {
        $concatArrays: ["$ps", "$cs"],
      },
    },
  },
]

请注意,上述代码示例中的引号可能需要根据您的编程环境进行调整。

英文:

I have following data in my collection

{
  “name": “test”,
  "data": {
    “statusOne”: “enabled”,
    “statusTwo”: “active”
  }
}
{
  “name": “test”,
  "data": {
    “statusOne”: “disabled”,
    “statusTwo”: “active”
  }
}
{
  “name": “another-test”,
  "data": {
    “statusOne”: “disabled”,
    “statusTwo”: “active”
  }
}

How to write an aggregation query to display the data like below. Need to group by name, statusOne, statusTwo. But segregate the result first based on name field. Then calculate the number of occurrences of statusOne and statusTwo. Both the results need to go into the same output field "data"

“output”: [
{
	“name”: “test”,
	"data": [
		{
			“status”: “active”,
			“count”: 2
		},
		{
			“status”: “disabled”,
			”count”: 1
		},
		{
			“status”: “enabled”,
			”count”: 1
		}
	]
},
{
	“name”: “another-test”,
	"data": [
		{
			“status”: “active”,
			”count”: 1
		},
		{
			“status”: “disabled”,
			”count”: 1
		}
	]
}
]

Tried to use group by sequence as mentioned here but no luck

[
  {
    $group: {
      _id: {
        appName: "$name”,
        cs: "$data.statusOne”,
        ps: "$data.statusTwo,
      },
      total: {
        $sum: 1,
      },
    },
  },
  {
    $group: {
      _id: "$_id.name”,
      total: { $sum: "$total" },
      ps: {
        $addToSet: {
          name: "$_id.ps",
          count: "$total",
        },
      },
      cs: {
        $addToSet: {
          name: "$_id.cs",
          count: "$total",
        },
      },
    },
  },
  {
    $project: {
      _id: 0,
      appName: "$_id",
      items: {
        $concatArrays: ["$ps", "$cs"],
      },
    },
  },
]

答案1

得分: 2

db.collection('表名').aggregate([
  {
    $group: {
      _id: { name: "$name", status: "$data.statusOne" },
      count: { $sum: 1 }
    }
  },
  {
    $group: {
      _id: "$_id.name",
      data: {
        $push: {
          status: "$_id.status",
          count: "$count"
        }
      }
    }
  },
  {
    $project: {
      _id: 0,
      name: "$_id",
      data: 1
    }
  }
])
英文:
db.collection('tablename').aggregate([
  {
    $group: {
      _id: { name: "$name", status: "$data.statusOne" },
      count: { $sum: 1 }
    }
  },
  {
    $group: {
      _id: "$_id.name",
      data: {
        $push: {
          status: "$_id.status",
          count: "$count"
        }
      }
    }
  },
  {
    $project: {
      _id: 0,
      name: "$_id",
      data: 1
    }
  }
])

use this and change the collection name .

答案2

得分: 1

这是一个通用的解决方案:

db.foo.aggregate([
    {$project: {
        name: true,

        /*                                                                       
        我们可以通过$objectToArray的输出进行更高级的操作,例如仅获取statusOne和statusTwo:                                 
                                                                                  
           Z: {$filter: {                                                        
               input: {$objectToArray: "$data"},                                 
               cond: {$in: ['$$this.k', ['statusOne','statusTwo']]}              
               }}                                                                
                                                                                  
           或者仅查找以'status'开头的键:                    
                                                                                  
           Z: {$filter: {                                                        
               input: {$objectToArray: "$data"},                                 
               cond: {$eq: ['status', {$substr:['$$this.k',0,6]}]}               
               }}                                                                
                                                                                  
          现在让我们简单点,获取整个内容而不进行过滤。                                                     
        */

	    Z: {$objectToArray: "$data"}
    }}
    ,{$unwind: "$Z"}

    // 现在我们有了这样的文档:                                               
    //{                                                                           
    //  _id: ObjectId("64d651283d8bc34d3928366d"),                               
    //  name: 'another-test',                                                    
    //  Z: {                                                                     
    //    k: 'statusOne',                                                        
    //    v: 'disabled'                                                          
    //  }                                                                        
    //}                                                                           

    // 现在只是分组和重新组织的问题:                     
    ,{$group: {_id: {"name":"$name", "v":"$Z.v"}, N:{$sum:1}}}
    ,{$group: {_id: "$_id.name", data: {$push: {status:"$_id.v", count:"$N"}} }}
]);

产生的结果为:

{
  _id: 'test',
  data: [
    {
      status: 'active',
      count: 2
    },
    {
      status: 'enabled',
      count: 1
    },
    {
      status: 'disabled',
      count: 1
    }
  ]
}
{
  _id: 'another-test',
  data: [
    {
      status: 'disabled',
      count: 1
    },
    {
      status: 'active',
      count: 1
    }
  ]
}

如果有成千上万的测试,我们不想使用$unwind怎么办?

在管道中使用$unwind时,必须小心。如果每个测试的平均状态数量变大(比如100),那么在管道中将会有大量文档。以下是一种利用$reduce的替代解决方案:

c=db.foo.aggregate([
    {$project: {
	name: true,
	Z: {$objectToArray: "$data"}
    }}

    /*
     不使用$unwind并使用$group-$sum来计数,让我们自己来做。
     我们将使用新的Z覆盖旧的Z。
     此外,由于这是一个中间步骤,使用较短的变量名称以增加清晰度。
    
     这个if-then-else构造基本上是这样的:
         对于每种状态类型,status_type_count += 1
     这允许输入具有相同类型的多个状态,例如:

	   "name": "test",
	   "data": {
	     "statusOne": "enabled",
	     "statusTwo": "enabled"
   	   }

   在MQL中,$reduce循环中,我们不说
       object.key = object.key + 1
   而是说:
       {$mergeObjects: [ "$$value", {key: {$add:["$$value.key",1]}} ]}
    */
    
    ,{$addFields: {Z: {$reduce: {
	input: "$Z",
	initialValue: {"A":0,"E":0,"D":0},
	    in: {$cond: {
		if: {$eq:["$$this.v","active"]},
		then: {$mergeObjects: [ "$$value", {"A": {$add:["$$value.A",1]}} ]},
		else: {$cond: {
		    if: {$eq:["$$this.v","disabled"]},
		    then: {$mergeObjects: [ "$$value", {"D": {$add:["$$value.D",1]}} ]},
		    else: {$cond: {
			if: {$eq:["$$this.v","enabled"]},
			then: {$mergeObjects: [ "$$value", {"E": {$add:["$$value.E",1]}} ]},
			else: "$$value"
		    }}
		}}
	    }}
    }}
		  }}


    // 将名称组合在一起并收集计数:
    ,{$group: {
	_id: "$name", X: {$push: "$Z"}
    }}

    // 现在,再次运行$reduce以汇总计数并恢复大变量名称:
    ,{$project: {data: {$reduce: {
	input: "$X",
	initialValue: {"active":0,"enabled":0,"disabled":0},
	    in: {"active": {$add:["$$value.active","$$this.A"]},
		 "disabled": {$add:["$$value.disabled","$$this.D"]},
		 "enabled": {$add:["$$value.enabled","$$this.E"]}
		}
    }}
		}}


    // 此时,我们“完成”了

<details>
<summary>英文:</summary>

Here is a generalized solution:
```javascript
db.foo.aggregate([
    {$project: {
        name: true,

        /*                                                                       
        We could get fancy with the output of $objectToArray, like               
        only going after statusOne and statusTwo:                                
                                                                                 
           Z: {$filter: {                                                        
               input: {$objectToArray: &quot;$data&quot;},                                 
               cond: {$in: [&#39;$$this.k&#39;, [&#39;statusOne&#39;,&#39;statusTwo&#39;]]}              
               }}                                                                
                                                                                 
           Or only looking for keys that start with &#39;status&#39;:                    
                                                                                 
           Z: {$filter: {                                                        
               input: {$objectToArray: &quot;$data&quot;},                                 
               cond: {$eq: [&#39;status&#39;, {$substr:[&#39;$$this.k&#39;,0,6]}]}               
               }}                                                                
                                                                                 
          Let&#39;s keep it simple now and take the whole thing                      
          without any filtering.                                                 
        */

	    Z: {$objectToArray: &quot;$data&quot;}
    }}
    ,{$unwind: &quot;$Z&quot;}

    // Now we have docs like this:                                               
    //{                                                                          
    //  _id: ObjectId(&quot;64d651283d8bc34d3928366d&quot;),                               
    //  name: &#39;another-test&#39;,                                                    
    //  Z: {                                                                     
    //    k: &#39;statusOne&#39;,                                                        
    //    v: &#39;disabled&#39;                                                          
    //  }                                                                        
    //}                                                                          

    // Now it is just a matter of grouping and reorganizing:                     
    ,{$group: {_id: {&quot;name&quot;:&quot;$name&quot;, &quot;v&quot;:&quot;$Z.v&quot;}, N:{$sum:1}}}
    ,{$group: {_id: &quot;$_id.name&quot;, data: {$push: {status:&quot;$_id.v&quot;, count:&quot;$N&quot;}} }}
]);

yields

{
_id: &#39;test&#39;,
data: [
{
status: &#39;active&#39;,
count: 2
},
{
status: &#39;enabled&#39;,
count: 1
},
{
status: &#39;disabled&#39;,
count: 1
}
]
}
{
_id: &#39;another-test&#39;,
data: [
{
status: &#39;disabled&#39;,
count: 1
},
{
status: &#39;active&#39;,
count: 1
}
]
}

What if there are thousands of tests and we don't want to $unwind?

You must be careful when throwing $unwind into a pipeline. If the average number of statuses per test gets large (say, 100) then there will be a LOT of docs in the pipeline. Below is an alternate solution that exploits $reduce:

c=db.foo.aggregate([
    {$project: {
	name: true,
	Z: {$objectToArray: &quot;$data&quot;}
    }}

    /*
     Instead of $unwind and using $group-$sum to count things, let&#39;s
     do it ourselves.  We will overwrite Z with a new Z.
     Also, since this is an iterim step, use shorter variable names for
     clarity.
    
     This if-then-else construction basically says:
         For each status type, status_type_count += 1
     This permits the input to have more than 1 status of the same type, e.g.

	   &quot;name&quot;: &quot;test&quot;,
	   &quot;data&quot;: {
	     &quot;statusOne&quot;: &quot;enabled&quot;,
	     &quot;statusTwo&quot;: &quot;enabled&quot;
   	   }

   In MQL in a $reduce loop, we don&#39;t say
       object.key = object.key + 1
   Instead we say:
       {$mergeObjects: [ &quot;$$value&quot;, {key: {$add:[&quot;$$value.key&quot;,1]}} ]}
    */
    
    ,{$addFields: {Z: {$reduce: {
	input: &quot;$Z&quot;,
	initialValue: {&quot;A&quot;:0,&quot;E&quot;:0,&quot;D&quot;:0},
	    in: {$cond: {
		if: {$eq:[&quot;$$this.v&quot;,&quot;active&quot;]},
		then: {$mergeObjects: [ &quot;$$value&quot;, {&quot;A&quot;: {$add:[&quot;$$value.A&quot;,1]}} ]},
		else: {$cond: {
		    if: {$eq:[&quot;$$this.v&quot;,&quot;disabled&quot;]},
		    then: {$mergeObjects: [ &quot;$$value&quot;, {&quot;D&quot;: {$add:[&quot;$$value.D&quot;,1]}} ]},
		    else: {$cond: {
			if: {$eq:[&quot;$$this.v&quot;,&quot;enabled&quot;]},
			then: {$mergeObjects: [ &quot;$$value&quot;, {&quot;E&quot;: {$add:[&quot;$$value.E&quot;,1]}} ]},
			else: &quot;$$value&quot;
		    }}
		}}
	    }}
    }}
		  }}


    // Bring the names together and collect the counts:
    ,{$group: {
	_id: &quot;$name&quot;, X: {$push: &quot;$Z&quot;}
    }}

    // Now, run a $reduce again to sum the counts AND put back the
    // big variable names:
    ,{$project: {data: {$reduce: {
	input: &quot;$X&quot;,
	initialValue: {&quot;active&quot;:0,&quot;enabled&quot;:0,&quot;disabled&quot;:0},
	    in: {&quot;active&quot;: {$add:[&quot;$$value.active&quot;,&quot;$$this.A&quot;]},
		 &quot;disabled&quot;: {$add:[&quot;$$value.disabled&quot;,&quot;$$this.D&quot;]},
		 &quot;enabled&quot;: {$add:[&quot;$$value.enabled&quot;,&quot;$$this.E&quot;]}
		}
    }}
		}}


    // At this point we are &quot;done&quot; information-wise but the OP was
    // looking for an array of status as an RVAL not a key (e.g. &quot;A&quot;)
    // so post-process:
    ,{$project: {
	_id:0,
	name:&quot;$_id&quot;,
	data: {$map: {
	    input: {$objectToArray: &quot;$data&quot;},
	    in: {
		&quot;status&quot;:&quot;$$this.k&quot;,
		&quot;count&quot;:&quot;$$this.v&quot;,
	    }
	}}
    }}
  
]);

huangapple
  • 本文由 发表于 2023年8月10日 20:11:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76875629.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定