英文:
MongoDB Aggregate groupBy date and count on subdocs
问题
[
{
"date": "2022-04",
"authors": { "Johnson": 2, "Smith": 1}
},
{
"date": "2022-05",
"authors": {"Johnson": 1, "Smith": 2, "Brooks": 1}
}
]
英文:
I know theres other versions of this question on here, but I'm struggling to get this to line up. I have docs that are similar to:
[
{_id: 1, date: "2022-04-08T23:30:12.000Z", books: [{author: "Johnson", title: "First Title"}, {author: "Smith", title: "Second Title}]},
{_id: 2, date: "2022-04-22T23:30:12.000Z", books: [{author: "Johnson", title: "Some Other Title"}]},
{_id: 3, date: "2022-05-05T23:30:12.000Z", books: [{author: "Smith", title: "Title Round 2"}]},
{_id: 4, date: "2022-05-15T23:30:12.000Z", books: [{author: "Johnson", title: "Found a Title", {author: "Smith", title: "Wrote again"}, {author: "Brooks", title: "New Title"}]}
]
I'm trying to group the documents by month-year and then run a count on the times a distinct value shows on the author field. So far I have a pipeline that looks like:
{
"$unwind": "$books"
},
{
$project: {
_id: 1,
books: 1,
month: {
"$month": "$date"
},
year: {
"$year": "$date"
}
}
},
{
$project: {
_id: 1,
books: 1,
date: {
$concat: [
{
$substr: [
"$year",
0,
4
]
},
"-",
{
$substr: [
"$month",
0,
2
]
},
]
}
}
},
{
$group: {
_id: {
date: "$date",
books: {
freq: {
$sum: 1
}
}
}
}
},
{
$project: {
"_id": 1,
"date": 1,
"books": 1
}
},
]
}
I aiming for a final output that looks like:
[
{date: "2022-04", authors: { "Johnson": 2, "Smith": 1}},
{date: "2022-05", authors: {"Johnson": 1, "Smith": 2, "Brooks": 1}}
]
I've seen ways to run a count on the subdocs, but in trying to implement I'm losing my group by date or just getting errors. I've seen enough to know its doable, just lost trying to get it just right. Any help is appreciated.
答案1
得分: 1
你的前三个阶段可以保持不变(假设日期存储为日期对象而不是字符串)。
之后:
- 按
$date
和$books.author
字段分组,并计算每个组的出现次数。这将给你最终答案中所需的计数。 - 然后只按
$date
分组,并将每个计数推送到一个authors
数组中,格式为键值对{k:key,v:value}
,以便在下一个阶段中将其转换为对象。 - 在
authors
数组上使用$arrayToObject
将其转换为对象。
如果你还想对日期进行排序,请添加一个 { $sort: {date: 1 } }
阶段。
db.collection.aggregate([
{ $unwind: "$books" },
{ $project: { _id: 1, books: 1, month: { "$month": "$date" }, year: { "$year": "$date" } } },
{ $project: { _id: 1, books: 1, date: { $concat: [ { $substr: [ "$year", 0, 4 ] }, "-", { $substr: [ "$month", 0, 2 ] } ] } } },
{
$group: {
_id: { date: "$date", author: "$books.author" },
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.date",
authors: { $push: { k: "$_id.author", v: "$count" } }
}
},
{
$project: {
_id: 0,
date: "$_id",
authors: { $arrayToObject: "$authors" }
}
}
])
playground 你可以从顶部的 "Stage" 下拉菜单中查看中间结果。
英文:
your first 3 stages can stay the same (assuming the dates are stored as date objects and not as strings)
after that
- group by
$date
and$books.author
fields and count the occurrence of each group. This will give the counts you need in the final answer - Then group by
$date
only and push each count to anauthors
array in the format of key value{k:key,v:value}
so that it can be converted to an object in next stage $arrayToObject
onauthors
array to convert it to an object
if you also want to sort the date add a { $sort: {date: 1 } }
stage
db.collection.aggregate([
{ $unwind: "$books" },
{ $project: { _id: 1, books: 1, month: { "$month": "$date" }, year: { "$year": "$date" } } },
{ $project: { _id: 1, books: 1, date: { $concat: [ { $substr: [ "$year", 0, 4 ] }, "-", { $substr: [ "$month", 0, 2 ] } ] } } },
{
$group: {
_id: { date: "$date", author: "$books.author" },
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.date",
authors: { $push: { k: "$_id.author", v: "$count" } }
}
},
{
$project: {
_id: 0,
date: "$_id",
authors: { $arrayToObject: "$authors" }
}
}
])
playground you can look at the intermediate results from the Stage dropdown on top
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论