2017年9月16日 00:34:42go评论116阅读模式

英文:

mgo with aggregation and grouping

问题

我正在尝试使用golang mgo执行查询，以有效地从连接中获取不同的值，我理解这可能不是在Mongo中使用的最佳范例。

类似这样的代码：

pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}

err := d.Pipe(pipe).All(&result)

如果我注释掉$group部分，查询将按预期返回连接结果。

如果按原样运行，我会得到NULL。

如果我将$group移到管道的底部，我会得到一个包含Null值的数组响应。

是否可能使用$group进行聚合（目的是模拟DISTINCT）？

英文:

I am trying to perform a query using golang mgo
to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.

Something like this:

pipe := []bson.M{
	{
		&quot;$group&quot;: bson.M{
			&quot;_id&quot;:  bson.M{&quot;user&quot;: &quot;$user&quot;},
		},
	},
	{
		&quot;$match&quot;: bson.M{
			&quot;_id&quot;:  bson.M{&quot;$exists&quot;: 1},
			&quot;user&quot;: bson.M{&quot;$exists&quot;: 1},
			&quot;date_updated&quot;: bson.M{
				&quot;$gt&quot;: durationDays,
			},
		},
	},
	{
		&quot;$lookup&quot;: bson.M{
			&quot;from&quot;:         &quot;users&quot;,
			&quot;localField&quot;:   &quot;user&quot;,
			&quot;foreignField&quot;: &quot;_id&quot;,
			&quot;as&quot;:           &quot;user_details&quot;,
		},
	},
	{
		&quot;$lookup&quot;: bson.M{
			&quot;from&quot;:         &quot;organizations&quot;,
			&quot;localField&quot;:   &quot;organization&quot;,
			&quot;foreignField&quot;: &quot;_id&quot;,
			&quot;as&quot;:           &quot;organization_details&quot;,
		},
	},
}
err := d.Pipe(pipe).All(&amp;result)

If I comment out the $group section, the query returns the join as expected.

If I run as is, I get NULL

If I move the $group to the bottom of the pipe I get an array response with Null values

Is it possible to do do an aggregation with a $group (with the goal of simulating DISTINCT) ?

答案1

得分: 2

你得到NULL的原因是因为你的$match过滤器在$group阶段之后过滤掉了所有文档。

在第一个$group阶段之后，文档只有以下示例：

  {"_id": {"user": "foo"}},
  {"_id": {"user": "bar"}},
  {"_id": {"user": "baz"}}

它们不再包含其他字段，例如user、date_updated和organization。如果你想保留它们的值，可以利用Group Accumulator Operator。根据你的用例，你可能还可以使用Aggregation Expression Variables。

以mongo shell为例，让我们使用$first operator，它基本上选择第一次出现的值。这对于organization可能是有意义的，但对于date_updated可能不是一个更合适的累加器操作符。请选择一个更合适的累加器操作符。

{"$group": { 
          "_id":"$user", 
          "date_updated": {"$first":"$date_updated"}, 
          "organization": {"$first":"$organization"}
         }
}

请注意，上述示例还将{"_id":{"user":"$user"}}替换为更简单的{"_id":"$user"}。

接下来，我们将添加$project stage，将分组操作的结果中的_id字段重命名为user。同时保留其他字段而不进行修改。

{"$project": {
              "user": "$_id", 
              "date_updated": 1, 
              "organization": 1
             }
 }

你的$match stage可以简化，只需列出date_updated过滤器。首先，我们可以删除_id，因为在管道的这一点上它不再相关，而且如果你希望确保只处理具有user值的文档，应该将$match放在$group之前。有关更多信息，请参见Aggregation Pipeline Optimization。

因此，所有这些组合起来将如下所示：

[
 {"$group":{ 
             "_id": "$user", 
             "date_updated": { "$first": "$date_updated"}, 
             "organization": { $first: "$organization"} 
           }
 },
 {"$project":{ 
               "user": "$_id", 
               "date_updated": 1, 
               "organization": 1
             }
 }, 
 {"$match":{
          "date_updated": {"$gt": durationDays } }
 }, 
 {"$lookup":{
             "from": "users", 
             "localField": "user", 
             "foreignField": "_id", 
             "as": "user_details"
            }
 }, 
 {"$lookup":{
            "from": "organizations", 
            "localField": "organization", 
            "foreignField": "_id", 
            "as": "organization_details"
            }
 }
]

（我知道你已经意识到了）最后，根据上述具有users和organizations集合的数据库模式，根据你的应用程序用例，你可能需要重新考虑嵌入一些值。你可能会发现6 Rules of Thumb for MongoDB Schema Design有用。

英文:

The reason you're getting NULL is because your $match filter is filtering out all of documents after the $group phase.

After your first stage of $group the documents are only as below example:

  {&quot;_id&quot;: { &quot;user&quot;: &quot;foo&quot;}},
  {&quot;_id&quot;: { &quot;user&quot;: &quot;bar&quot;}},
  {&quot;_id&quot;: { &quot;user&quot;: &quot;baz&quot;}}

They no longer contains the other fields i.e. user, date_updated and organization. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables

As an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for organization but not for date_updated. Please choose a more appropriate accumulator operator.

{&quot;$group&quot;: { 
          &quot;_id&quot;:&quot;$user&quot;, 
          &quot;date_updated&quot;: {&quot;$first&quot;:&quot;$date_updated&quot;}, 
          &quot;organization&quot;: {&quot;$first&quot;:&quot;$organization&quot;}
         }
}

Note that the above also replaces {"_id":{"user":"$user"}} with simpler {"_id":"$user"}.

Next we'll add $project stage to rename our result of _id field from the group operation back to user. Also carry along the other fields without modifications.

{&quot;$project&quot;: {
              &quot;user&quot;: &quot;$_id&quot;, 
              &quot;date_updated&quot;: 1, 
              &quot;organization&quot;: 1
             }
 }

Your $match stage can be simplified, by just listing the date_updated filter. First we can remove _id as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user value you should placed $match before the $group. See Aggregation Pipeline Optimization for more.

So, all of those combined will look something as below:

[
 {&quot;$group&quot;:{ 
             &quot;_id&quot;: &quot;$user&quot;, 
             &quot;date_updated&quot;: { &quot;$first&quot;: &quot;$date_updated&quot;}, 
             &quot;organization&quot;: { $first: &quot;$organization&quot;} 
           }
 },
 {&quot;$project&quot;:{ 
               &quot;user&quot;: &quot;$_id&quot;, 
               &quot;date_updated&quot;: 1, 
               &quot;organization&quot;: 1
             }
 }, 
 {&quot;$match&quot;:{
          &quot;date_updated&quot;: {&quot;$gt&quot;: durationDays } }
 }, 
 {&quot;$lookup&quot;:{
             &quot;from&quot;: &quot;users&quot;, 
             &quot;localField&quot;: &quot;user&quot;, 
             &quot;foreignField&quot;: &quot;_id&quot;, 
             &quot;as&quot;: &quot;user_details&quot;
            }
 }, 
 {&quot;$lookup&quot;:{
            &quot;from&quot;: &quot;organizations&quot;, 
            &quot;localField&quot;: &quot;organization&quot;, 
            &quot;foreignField&quot;: &quot;_id&quot;, 
            &quot;as&quot;: &quot;organization_details&quot;
            }
 }
]

(I know you're aware of it) Lastly, based on the database schema above with users and organizations collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用聚合和分组的MGO（MongoDB Object）

问题

答案1

如果键始终是唯一的，那么在并发写入 Golang map 是否安全？

优先队列和堆

从另一个函数创建全局变量

将其解组为接口类型

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。