英文:
MongoDB Aggregate groupBy date and count on subdocs
问题
[
{
"date": "2022-04",
"authors": { "Johnson": 2, "Smith": 1}
},
{
"date": "2022-05",
"authors": {"Johnson": 1, "Smith": 2, "Brooks": 1}
}
]
英文:
I know theres other versions of this question on here, but I'm struggling to get this to line up. I have docs that are similar to:
[
{_id: 1, date: "2022-04-08T23:30:12.000Z", books: [{author: "Johnson", title: "First Title"}, {author: "Smith", title: "Second Title}]},
{_id: 2, date: "2022-04-22T23:30:12.000Z", books: [{author: "Johnson", title: "Some Other Title"}]},
{_id: 3, date: "2022-05-05T23:30:12.000Z", books: [{author: "Smith", title: "Title Round 2"}]},
{_id: 4, date: "2022-05-15T23:30:12.000Z", books: [{author: "Johnson", title: "Found a Title", {author: "Smith", title: "Wrote again"}, {author: "Brooks", title: "New Title"}]}
]
I'm trying to group the documents by month-year and then run a count on the times a distinct value shows on the author field. So far I have a pipeline that looks like:
{
"$unwind": "$books"
},
{
$project: {
_id: 1,
books: 1,
month: {
"$month": "$date"
},
year: {
"$year": "$date"
}
}
},
{
$project: {
_id: 1,
books: 1,
date: {
$concat: [
{
$substr: [
"$year",
0,
4
]
},
"-",
{
$substr: [
"$month",
0,
2
]
},
]
}
}
},
{
$group: {
_id: {
date: "$date",
books: {
freq: {
$sum: 1
}
}
}
}
},
{
$project: {
"_id": 1,
"date": 1,
"books": 1
}
},
]
}
I aiming for a final output that looks like:
[
{date: "2022-04", authors: { "Johnson": 2, "Smith": 1}},
{date: "2022-05", authors: {"Johnson": 1, "Smith": 2, "Brooks": 1}}
]
I've seen ways to run a count on the subdocs, but in trying to implement I'm losing my group by date or just getting errors. I've seen enough to know its doable, just lost trying to get it just right. Any help is appreciated.
答案1
得分: 1
你的前三个阶段可以保持不变(假设日期存储为日期对象而不是字符串)。
之后:
- 按
$date和$books.author字段分组,并计算每个组的出现次数。这将给你最终答案中所需的计数。 - 然后只按
$date分组,并将每个计数推送到一个authors数组中,格式为键值对{k:key,v:value},以便在下一个阶段中将其转换为对象。 - 在
authors数组上使用$arrayToObject将其转换为对象。
如果你还想对日期进行排序,请添加一个 { $sort: {date: 1 } } 阶段。
db.collection.aggregate([
{ $unwind: "$books" },
{ $project: { _id: 1, books: 1, month: { "$month": "$date" }, year: { "$year": "$date" } } },
{ $project: { _id: 1, books: 1, date: { $concat: [ { $substr: [ "$year", 0, 4 ] }, "-", { $substr: [ "$month", 0, 2 ] } ] } } },
{
$group: {
_id: { date: "$date", author: "$books.author" },
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.date",
authors: { $push: { k: "$_id.author", v: "$count" } }
}
},
{
$project: {
_id: 0,
date: "$_id",
authors: { $arrayToObject: "$authors" }
}
}
])
playground 你可以从顶部的 "Stage" 下拉菜单中查看中间结果。
英文:
your first 3 stages can stay the same (assuming the dates are stored as date objects and not as strings)
after that
- group by
$dateand$books.authorfields and count the occurrence of each group. This will give the counts you need in the final answer - Then group by
$dateonly and push each count to anauthorsarray in the format of key value{k:key,v:value}so that it can be converted to an object in next stage $arrayToObjectonauthorsarray to convert it to an object
if you also want to sort the date add a { $sort: {date: 1 } } stage
db.collection.aggregate([
{ $unwind: "$books" },
{ $project: { _id: 1, books: 1, month: { "$month": "$date" }, year: { "$year": "$date" } } },
{ $project: { _id: 1, books: 1, date: { $concat: [ { $substr: [ "$year", 0, 4 ] }, "-", { $substr: [ "$month", 0, 2 ] } ] } } },
{
$group: {
_id: { date: "$date", author: "$books.author" },
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.date",
authors: { $push: { k: "$_id.author", v: "$count" } }
}
},
{
$project: {
_id: 0,
date: "$_id",
authors: { $arrayToObject: "$authors" }
}
}
])
playground you can look at the intermediate results from the Stage dropdown on top
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论