2023年6月9日 01:17:28go评论64阅读模式

英文:

MongoDB - Count number of objects where nested array contains each value

问题

我有一组文档，我正在尝试计算每个数组元素在嵌套文档数组中的多少个文档中出现的数量。

文档示例：

[
{
  "title": "fruit mix 1",
  "fruit_details": [
    {
      "fruit_names": [
        "strawberry",
        "banana"
      ],
      "category": "berries"
    },
    {
      "fruit_names": [
        "strawberry",
        "apple",
        "mango
      ],
      "category": "red",
    }
  ]
},
{
  "title": "fruit mix 2",
  "fruit_details": [
    {
      "fruit_names": [
        "banana",
        "mango"
      ],
      "category": "tropical",
    }
  ]
},
{
  "title": "fruit mix 3",
  "fruit_details": [
    {
      "fruit_names": [
        "banana",
        "lemon"
      ],
      "category": "yellow",
    },
    {
      "fruit_names": [
        "banana"
      ],
      "category": "long",
    },
  ]
},
]

我想要一个聚合管道，返回：

[
    {
      "_id": "banana",
      "count": 3
    },
    {
      "_id": "mango",
      "count": 2
    },
    {
      "_id": "lemon",
      "count": 1
    },
    {
      "_id": "strawberry",
      "count": 1
    },
    {
      "_id": "apple",
      "count": 1
    },
]

香蕉出现在所有3个顶层文档中，因此计数为3，芒果只出现在2个文档中，等等。

我不确定如何在不重复计算出现在fruit_details数组中多个子文档中的元素的情况下完成此操作。例如，草莓在fruit mix 1文档中出现两次，但应该只计算一次。

到目前为止，我尝试使用$setUnion和$group来计算唯一出现次数。这种方法的问题是$setUnion会联合每个fruit_names数组，而不是该数组中的每个元素。它也不能解决重复计数的问题。

英文:

I have a collection of documents and I am trying to count how many documents contain each element in an array that is itself a part of a nested document in an array.

Example of documents:

[
{
  &quot;title&quot;: &quot;fruit mix 1&quot;,
  &quot;fruit_details&quot;: [
    {
      &quot;fruit_names&quot;: [
        &quot;strawberry&quot;,
        &quot;banana&quot;
      ],
      &quot;category&quot;: &quot;berries&quot;
    },
    {
      &quot;fruit_names&quot;: [
        &quot;strawberry&quot;,
        &quot;apple&quot;,
        &quot;mango
      ],
      &quot;category&quot;: &quot;red&quot;,
    }
  ]
},
{
  &quot;title&quot;: &quot;fruit mix 2&quot;,
  &quot;fruit_details&quot;: [
    {
      &quot;fruit_names&quot;: [
        &quot;banana&quot;,
        &quot;mango&quot;
      ],
      &quot;category&quot;: &quot;tropical&quot;,
    }
  ]
},
{
  &quot;title&quot;: &quot;fruit mix 3&quot;,
  &quot;fruit_details&quot;: [
    {
      &quot;fruit_names&quot;: [
        &quot;banana&quot;,
        &quot;lemon&quot;
      ],
      &quot;category&quot;: &quot;yellow&quot;,
    },
    {
      &quot;fruit_names&quot;: [
        &quot;banana&quot;
      ],
      &quot;category&quot;: &quot;long&quot;,
    },
  ]
},
]

I want an aggregation pipeline that returns:

[
    {
      &quot;_id&quot;: &quot;banana&quot;,
      &quot;count&quot;: 3
    },
    {
      &quot;_id&quot;: &quot;mango&quot;,
      &quot;count&quot;: 2
    },
    {
      &quot;_id&quot;: &quot;lemon&quot;,
      &quot;count&quot;: 1
    },
    {
      &quot;_id&quot;: &quot;strawberry&quot;,
      &quot;count&quot;: 1
    },
    {
      &quot;_id&quot;: &quot;apple&quot;,
      &quot;count&quot;: 1
    },
]

Banana occurs in all 3 top level documents so the count is 3, mango only occurs in 2, etc.

I'm not sure how to accomplish this without double counting elements that appear in multiple sub-documents in the fruit_details array. For example, strawberry appears twice in the fruit mix 1 document but should only be counted once.

So far I have tried using $setUnion and $group to count unique occurrences. The problem with this is $setUnion unions each fruit_names array and not each element in that array. It also doesn't solve the issue with double counting.

db.collection.aggregate([
  {
    $project: {
      fruit: {
        $setUnion: [
          &quot;$fruit_details.fruit_names&quot;,
        ],
      },
    },
  },
  {
    $unwind: &quot;$fruit&quot;,
  },
  {
    $group: {
      _id: &quot;$fruit&quot;,
      count: {
        $sum: 1,
      },
    },
  },
  {
    $sort: {
      _id: 1,
    },
  },
])

答案1

得分: 1

我认为这里的关键，至少有一种方法，是使用$reduce运算符。如果我们修改您的初始管道阶段，使其看起来像这样：

{
  "$project": {
    "fruit": {
      "$reduce": {
        "input": "$fruit_details",
        "initialValue": [],
        "in": {
          "$setUnion": [
            "$$value",
            "$$this.fruit_names"
          ]
        }
      }
    }
  }
}

然后它会产生您期望的输出。在这里查看演示。

(我还更改了 $sort，以使输出与您的要求相匹配。)

英文:

I think the key here, at least for one approach, is to use the $reduce operator. If we modify your initial pipeline stage to look like this:

  {
    $project: {
      fruit: {
        &quot;$reduce&quot;: {
          &quot;input&quot;: &quot;$fruit_details&quot;,
          &quot;initialValue&quot;: [],
          &quot;in&quot;: {
            &quot;$setUnion&quot;: [
              &quot;$$value&quot;,
              &quot;$$this.fruit_names&quot;,
              
            ]
          }
        }
      }
    }
  },

Then it produces your desired output. Playground demonstration here.

(I also changed the $sort so that the output matched what you were requesting.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MongoDB – 统计包含每个值的嵌套数组中的对象数量

问题

答案1

Why does my MongoDB Atlas cluster go into sleep mode after 30 minutes of Mean stack app inactivity on Render.com?

MongoDB Morphia兼容矩阵

FastAPI查询参数转换为MongoDB聚合

Iterate over json array in Go to extract values

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论