MongoDB – 统计包含每个值的嵌套数组中的对象数量

huangapple go评论60阅读模式
英文:

MongoDB - Count number of objects where nested array contains each value

问题

我有一组文档,我正在尝试计算每个数组元素在嵌套文档数组中的多少个文档中出现的数量。

文档示例:

[
{
  "title": "fruit mix 1",
  "fruit_details": [
    {
      "fruit_names": [
        "strawberry",
        "banana"
      ],
      "category": "berries"
    },
    {
      "fruit_names": [
        "strawberry",
        "apple",
        "mango
      ],
      "category": "red",
    }
  ]
},
{
  "title": "fruit mix 2",
  "fruit_details": [
    {
      "fruit_names": [
        "banana",
        "mango"
      ],
      "category": "tropical",
    }
  ]
},
{
  "title": "fruit mix 3",
  "fruit_details": [
    {
      "fruit_names": [
        "banana",
        "lemon"
      ],
      "category": "yellow",
    },
    {
      "fruit_names": [
        "banana"
      ],
      "category": "long",
    },
  ]
},
]

我想要一个聚合管道,返回:

[
    {
      "_id": "banana",
      "count": 3
    },
    {
      "_id": "mango",
      "count": 2
    },
    {
      "_id": "lemon",
      "count": 1
    },
    {
      "_id": "strawberry",
      "count": 1
    },
    {
      "_id": "apple",
      "count": 1
    },
]

香蕉出现在所有3个顶层文档中,因此计数为3,芒果只出现在2个文档中,等等。

我不确定如何在不重复计算出现在fruit_details数组中多个子文档中的元素的情况下完成此操作。例如,草莓在fruit mix 1文档中出现两次,但应该只计算一次。

到目前为止,我尝试使用$setUnion和$group来计算唯一出现次数。这种方法的问题是$setUnion会联合每个fruit_names数组,而不是该数组中的每个元素。它也不能解决重复计数的问题。

英文:

I have a collection of documents and I am trying to count how many documents contain each element in an array that is itself a part of a nested document in an array.

Example of documents:

[
{
  "title": "fruit mix 1",
  "fruit_details": [
    {
      "fruit_names": [
        "strawberry",
        "banana"
      ],
      "category": "berries"
    },
    {
      "fruit_names": [
        "strawberry",
        "apple",
        "mango
      ],
      "category": "red",
    }
  ]
},
{
  "title": "fruit mix 2",
  "fruit_details": [
    {
      "fruit_names": [
        "banana",
        "mango"
      ],
      "category": "tropical",
    }
  ]
},
{
  "title": "fruit mix 3",
  "fruit_details": [
    {
      "fruit_names": [
        "banana",
        "lemon"
      ],
      "category": "yellow",
    },
    {
      "fruit_names": [
        "banana"
      ],
      "category": "long",
    },
  ]
},
]

I want an aggregation pipeline that returns:

[
    {
      "_id": "banana",
      "count": 3
    },
    {
      "_id": "mango",
      "count": 2
    },
    {
      "_id": "lemon",
      "count": 1
    },
    {
      "_id": "strawberry",
      "count": 1
    },
    {
      "_id": "apple",
      "count": 1
    },
]

Banana occurs in all 3 top level documents so the count is 3, mango only occurs in 2, etc.

I'm not sure how to accomplish this without double counting elements that appear in multiple sub-documents in the fruit_details array. For example, strawberry appears twice in the fruit mix 1 document but should only be counted once.

So far I have tried using $setUnion and $group to count unique occurrences. The problem with this is $setUnion unions each fruit_names array and not each element in that array. It also doesn't solve the issue with double counting.

db.collection.aggregate([
  {
    $project: {
      fruit: {
        $setUnion: [
          "$fruit_details.fruit_names",
        ],
      },
    },
  },
  {
    $unwind: "$fruit",
  },
  {
    $group: {
      _id: "$fruit",
      count: {
        $sum: 1,
      },
    },
  },
  {
    $sort: {
      _id: 1,
    },
  },
])

答案1

得分: 1

我认为这里的关键,至少有一种方法,是使用$reduce运算符。如果我们修改您的初始管道阶段,使其看起来像这样:

{
  "$project": {
    "fruit": {
      "$reduce": {
        "input": "$fruit_details",
        "initialValue": [],
        "in": {
          "$setUnion": [
            "$$value",
            "$$this.fruit_names"
          ]
        }
      }
    }
  }
}

然后它会产生您期望的输出。在这里查看演示

(我还更改了 $sort,以使输出与您的要求相匹配。)

英文:

I think the key here, at least for one approach, is to use the $reduce operator. If we modify your initial pipeline stage to look like this:

  {
    $project: {
      fruit: {
        "$reduce": {
          "input": "$fruit_details",
          "initialValue": [],
          "in": {
            "$setUnion": [
              "$$value",
              "$$this.fruit_names",
              
            ]
          }
        }
      }
    }
  },

Then it produces your desired output. Playground demonstration here.

(I also changed the $sort so that the output matched what you were requesting.)

huangapple
  • 本文由 发表于 2023年6月9日 01:17:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76434263.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定