英文:
mgo with aggregation and grouping
问题
我正在尝试使用golang mgo执行查询,以有效地从连接中获取不同的值,我理解这可能不是在Mongo中使用的最佳范例。
类似这样的代码:
pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}
err := d.Pipe(pipe).All(&result)
如果我注释掉$group
部分,查询将按预期返回连接结果。
如果按原样运行,我会得到NULL
。
如果我将$group
移到管道的底部,我会得到一个包含Null值的数组响应。
是否可能使用$group
进行聚合(目的是模拟DISTINCT
)?
英文:
I am trying to perform a query using golang mgo
to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.
Something like this:
pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}
err := d.Pipe(pipe).All(&result)
If I comment out the $group
section, the query returns the join as expected.
If I run as is, I get NULL
If I move the $group
to the bottom of the pipe I get an array response with Null values
Is it possible to do do an aggregation with a $group
(with the goal of simulating DISTINCT
) ?
答案1
得分: 2
你得到NULL的原因是因为你的$match
过滤器在$group
阶段之后过滤掉了所有文档。
在第一个$group
阶段之后,文档只有以下示例:
{"_id": {"user": "foo"}},
{"_id": {"user": "bar"}},
{"_id": {"user": "baz"}}
它们不再包含其他字段,例如user
、date_updated
和organization
。如果你想保留它们的值,可以利用Group Accumulator Operator。根据你的用例,你可能还可以使用Aggregation Expression Variables。
以mongo shell为例,让我们使用$first operator,它基本上选择第一次出现的值。这对于organization
可能是有意义的,但对于date_updated
可能不是一个更合适的累加器操作符。请选择一个更合适的累加器操作符。
{"$group": {
"_id":"$user",
"date_updated": {"$first":"$date_updated"},
"organization": {"$first":"$organization"}
}
}
请注意,上述示例还将{"_id":{"user":"$user"}}
替换为更简单的{"_id":"$user"}
。
接下来,我们将添加$project stage,将分组操作的结果中的_id
字段重命名为user
。同时保留其他字段而不进行修改。
{"$project": {
"user": "$_id",
"date_updated": 1,
"organization": 1
}
}
你的$match stage可以简化,只需列出date_updated
过滤器。首先,我们可以删除_id
,因为在管道的这一点上它不再相关,而且如果你希望确保只处理具有user
值的文档,应该将$match
放在$group
之前。有关更多信息,请参见Aggregation Pipeline Optimization。
因此,所有这些组合起来将如下所示:
[
{"$group":{
"_id": "$user",
"date_updated": { "$first": "$date_updated"},
"organization": { $first: "$organization"}
}
},
{"$project":{
"user": "$_id",
"date_updated": 1,
"organization": 1
}
},
{"$match":{
"date_updated": {"$gt": durationDays } }
},
{"$lookup":{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details"
}
},
{"$lookup":{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details"
}
}
]
(我知道你已经意识到了)最后,根据上述具有users
和organizations
集合的数据库模式,根据你的应用程序用例,你可能需要重新考虑嵌入一些值。你可能会发现6 Rules of Thumb for MongoDB Schema Design有用。
英文:
The reason you're getting NULL is because your $match
filter is filtering out all of documents after the $group
phase.
After your first stage of $group
the documents are only as below example:
{"_id": { "user": "foo"}},
{"_id": { "user": "bar"}},
{"_id": { "user": "baz"}}
They no longer contains the other fields i.e. user
, date_updated
and organization
. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables
As an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for organization
but not for date_updated
. Please choose a more appropriate accumulator operator.
{"$group": {
"_id":"$user",
"date_updated": {"$first":"$date_updated"},
"organization": {"$first":"$organization"}
}
}
Note that the above also replaces {"_id":{"user":"$user"}}
with simpler {"_id":"$user"}
.
Next we'll add $project stage to rename our result of _id
field from the group operation back to user
. Also carry along the other fields without modifications.
{"$project": {
"user": "$_id",
"date_updated": 1,
"organization": 1
}
}
Your $match stage can be simplified, by just listing the date_updated
filter. First we can remove _id
as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user
value you should placed $match
before the $group
. See Aggregation Pipeline Optimization for more.
So, all of those combined will look something as below:
[
{"$group":{
"_id": "$user",
"date_updated": { "$first": "$date_updated"},
"organization": { $first: "$organization"}
}
},
{"$project":{
"user": "$_id",
"date_updated": 1,
"organization": 1
}
},
{"$match":{
"date_updated": {"$gt": durationDays } }
},
{"$lookup":{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details"
}
},
{"$lookup":{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details"
}
}
]
(I know you're aware of it) Lastly, based on the database schema above with users
and organizations
collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论