使用聚合和分组的MGO(MongoDB Object)

huangapple go评论94阅读模式
英文:

mgo with aggregation and grouping

问题

我正在尝试使用golang mgo执行查询,以有效地从连接中获取不同的值,我理解这可能不是在Mongo中使用的最佳范例。

类似这样的代码:

pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}

err := d.Pipe(pipe).All(&result)

如果我注释掉$group部分,查询将按预期返回连接结果。

如果按原样运行,我会得到NULL

如果我将$group移到管道的底部,我会得到一个包含Null值的数组响应。

是否可能使用$group进行聚合(目的是模拟DISTINCT)?

英文:

I am trying to perform a query using golang mgo
to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.

Something like this:

pipe := []bson.M{

	{
		"$group": bson.M{
			"_id":  bson.M{"user": "$user"},

		},
	},

	{
		"$match": bson.M{
			"_id":  bson.M{"$exists": 1},
			"user": bson.M{"$exists": 1},
			"date_updated": bson.M{
				"$gt": durationDays,
			},
		},

	},

	{
		"$lookup": bson.M{
			"from":         "users",
			"localField":   "user",
			"foreignField": "_id",
			"as":           "user_details",
		},
	},
	{
		"$lookup": bson.M{
			"from":         "organizations",
			"localField":   "organization",
			"foreignField": "_id",
			"as":           "organization_details",
		},
	},

}

err := d.Pipe(pipe).All(&result)

If I comment out the $group section, the query returns the join as expected.

If I run as is, I get NULL

If I move the $group to the bottom of the pipe I get an array response with Null values

Is it possible to do do an aggregation with a $group (with the goal of simulating DISTINCT) ?

答案1

得分: 2

你得到NULL的原因是因为你的$match过滤器在$group阶段之后过滤掉了所有文档。

在第一个$group阶段之后,文档只有以下示例:

  {"_id": {"user": "foo"}},
  {"_id": {"user": "bar"}},
  {"_id": {"user": "baz"}}

它们不再包含其他字段,例如userdate_updatedorganization。如果你想保留它们的值,可以利用Group Accumulator Operator。根据你的用例,你可能还可以使用Aggregation Expression Variables

mongo shell为例,让我们使用$first operator,它基本上选择第一次出现的值。这对于organization可能是有意义的,但对于date_updated可能不是一个更合适的累加器操作符。请选择一个更合适的累加器操作符。

{"$group": { 
          "_id":"$user", 
          "date_updated": {"$first":"$date_updated"}, 
          "organization": {"$first":"$organization"}
         }
}

请注意,上述示例还将{"_id":{"user":"$user"}}替换为更简单的{"_id":"$user"}

接下来,我们将添加$project stage,将分组操作的结果中的_id字段重命名为user。同时保留其他字段而不进行修改。

{"$project": {
              "user": "$_id", 
              "date_updated": 1, 
              "organization": 1
             }
 }

你的$match stage可以简化,只需列出date_updated过滤器。首先,我们可以删除_id,因为在管道的这一点上它不再相关,而且如果你希望确保只处理具有user值的文档,应该将$match放在$group之前。有关更多信息,请参见Aggregation Pipeline Optimization

因此,所有这些组合起来将如下所示:

[
 {"$group":{ 
             "_id": "$user", 
             "date_updated": { "$first": "$date_updated"}, 
             "organization": { $first: "$organization"} 
           }
 },
 {"$project":{ 
               "user": "$_id", 
               "date_updated": 1, 
               "organization": 1
             }
 }, 
 {"$match":{
          "date_updated": {"$gt": durationDays } }
 }, 
 {"$lookup":{
             "from": "users", 
             "localField": "user", 
             "foreignField": "_id", 
             "as": "user_details"
            }
 }, 
 {"$lookup":{
            "from": "organizations", 
            "localField": "organization", 
            "foreignField": "_id", 
            "as": "organization_details"
            }
 }
]

(我知道你已经意识到了)最后,根据上述具有usersorganizations集合的数据库模式,根据你的应用程序用例,你可能需要重新考虑嵌入一些值。你可能会发现6 Rules of Thumb for MongoDB Schema Design有用。

英文:

The reason you're getting NULL is because your $match filter is filtering out all of documents after the $group phase.

After your first stage of $group the documents are only as below example:

  {"_id": { "user": "foo"}},
  {"_id": { "user": "bar"}},
  {"_id": { "user": "baz"}}

They no longer contains the other fields i.e. user, date_updated and organization. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables

As an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for organization but not for date_updated. Please choose a more appropriate accumulator operator.

{"$group": { 
          "_id":"$user", 
          "date_updated": {"$first":"$date_updated"}, 
          "organization": {"$first":"$organization"}
         }
}

Note that the above also replaces {"_id":{"user":"$user"}} with simpler {"_id":"$user"}.

Next we'll add $project stage to rename our result of _id field from the group operation back to user. Also carry along the other fields without modifications.

{"$project": {
              "user": "$_id", 
              "date_updated": 1, 
              "organization": 1
             }
 }

Your $match stage can be simplified, by just listing the date_updated filter. First we can remove _id as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user value you should placed $match before the $group. See Aggregation Pipeline Optimization for more.

So, all of those combined will look something as below:

[
 {"$group":{ 
             "_id": "$user", 
             "date_updated": { "$first": "$date_updated"}, 
             "organization": { $first: "$organization"} 
           }
 },
 {"$project":{ 
               "user": "$_id", 
               "date_updated": 1, 
               "organization": 1
             }
 }, 
 {"$match":{
          "date_updated": {"$gt": durationDays } }
 }, 
 {"$lookup":{
             "from": "users", 
             "localField": "user", 
             "foreignField": "_id", 
             "as": "user_details"
            }
 }, 
 {"$lookup":{
            "from": "organizations", 
            "localField": "organization", 
            "foreignField": "_id", 
            "as": "organization_details"
            }
 }
]

(I know you're aware of it) Lastly, based on the database schema above with users and organizations collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.

huangapple
  • 本文由 发表于 2017年9月16日 00:34:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/46244098.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定