英文:
MongoDB - Count number of objects where nested array contains each value
问题
我有一组文档,我正在尝试计算每个数组元素在嵌套文档数组中的多少个文档中出现的数量。
文档示例:
[
{
"title": "fruit mix 1",
"fruit_details": [
{
"fruit_names": [
"strawberry",
"banana"
],
"category": "berries"
},
{
"fruit_names": [
"strawberry",
"apple",
"mango
],
"category": "red",
}
]
},
{
"title": "fruit mix 2",
"fruit_details": [
{
"fruit_names": [
"banana",
"mango"
],
"category": "tropical",
}
]
},
{
"title": "fruit mix 3",
"fruit_details": [
{
"fruit_names": [
"banana",
"lemon"
],
"category": "yellow",
},
{
"fruit_names": [
"banana"
],
"category": "long",
},
]
},
]
我想要一个聚合管道,返回:
[
{
"_id": "banana",
"count": 3
},
{
"_id": "mango",
"count": 2
},
{
"_id": "lemon",
"count": 1
},
{
"_id": "strawberry",
"count": 1
},
{
"_id": "apple",
"count": 1
},
]
香蕉出现在所有3个顶层文档中,因此计数为3,芒果只出现在2个文档中,等等。
我不确定如何在不重复计算出现在fruit_details数组中多个子文档中的元素的情况下完成此操作。例如,草莓在fruit mix 1文档中出现两次,但应该只计算一次。
到目前为止,我尝试使用$setUnion和$group来计算唯一出现次数。这种方法的问题是$setUnion会联合每个fruit_names数组,而不是该数组中的每个元素。它也不能解决重复计数的问题。
英文:
I have a collection of documents and I am trying to count how many documents contain each element in an array that is itself a part of a nested document in an array.
Example of documents:
[
{
"title": "fruit mix 1",
"fruit_details": [
{
"fruit_names": [
"strawberry",
"banana"
],
"category": "berries"
},
{
"fruit_names": [
"strawberry",
"apple",
"mango
],
"category": "red",
}
]
},
{
"title": "fruit mix 2",
"fruit_details": [
{
"fruit_names": [
"banana",
"mango"
],
"category": "tropical",
}
]
},
{
"title": "fruit mix 3",
"fruit_details": [
{
"fruit_names": [
"banana",
"lemon"
],
"category": "yellow",
},
{
"fruit_names": [
"banana"
],
"category": "long",
},
]
},
]
I want an aggregation pipeline that returns:
[
{
"_id": "banana",
"count": 3
},
{
"_id": "mango",
"count": 2
},
{
"_id": "lemon",
"count": 1
},
{
"_id": "strawberry",
"count": 1
},
{
"_id": "apple",
"count": 1
},
]
Banana occurs in all 3 top level documents so the count is 3, mango only occurs in 2, etc.
I'm not sure how to accomplish this without double counting elements that appear in multiple sub-documents in the fruit_details array. For example, strawberry appears twice in the fruit mix 1 document but should only be counted once.
So far I have tried using $setUnion and $group to count unique occurrences. The problem with this is $setUnion unions each fruit_names array and not each element in that array. It also doesn't solve the issue with double counting.
db.collection.aggregate([
{
$project: {
fruit: {
$setUnion: [
"$fruit_details.fruit_names",
],
},
},
},
{
$unwind: "$fruit",
},
{
$group: {
_id: "$fruit",
count: {
$sum: 1,
},
},
},
{
$sort: {
_id: 1,
},
},
])
答案1
得分: 1
我认为这里的关键,至少有一种方法,是使用$reduce
运算符。如果我们修改您的初始管道阶段,使其看起来像这样:
{
"$project": {
"fruit": {
"$reduce": {
"input": "$fruit_details",
"initialValue": [],
"in": {
"$setUnion": [
"$$value",
"$$this.fruit_names"
]
}
}
}
}
}
然后它会产生您期望的输出。在这里查看演示。
(我还更改了 $sort
,以使输出与您的要求相匹配。)
英文:
I think the key here, at least for one approach, is to use the $reduce
operator. If we modify your initial pipeline stage to look like this:
{
$project: {
fruit: {
"$reduce": {
"input": "$fruit_details",
"initialValue": [],
"in": {
"$setUnion": [
"$$value",
"$$this.fruit_names",
]
}
}
}
}
},
Then it produces your desired output. Playground demonstration here.
(I also changed the $sort
so that the output matched what you were requesting.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论