MongoDB聚合按日期分组并计算子文档数量

huangapple go评论58阅读模式
英文:

MongoDB Aggregate groupBy date and count on subdocs

问题

[
  {
    "date": "2022-04",
    "authors": { "Johnson": 2, "Smith": 1}
  },
  {
    "date": "2022-05",
    "authors": {"Johnson": 1, "Smith": 2, "Brooks": 1}
  }
]
英文:

I know theres other versions of this question on here, but I'm struggling to get this to line up. I have docs that are similar to:

[
{_id: 1, date: "2022-04-08T23:30:12.000Z", books: [{author: "Johnson", title: "First Title"}, {author: "Smith", title: "Second Title}]},
{_id: 2, date: "2022-04-22T23:30:12.000Z", books: [{author: "Johnson", title: "Some Other Title"}]},
{_id: 3, date: "2022-05-05T23:30:12.000Z", books: [{author: "Smith", title: "Title Round 2"}]},
{_id: 4, date: "2022-05-15T23:30:12.000Z", books: [{author: "Johnson", title: "Found a Title", {author: "Smith", title: "Wrote again"}, {author: "Brooks", title: "New Title"}]}
]

I'm trying to group the documents by month-year and then run a count on the times a distinct value shows on the author field. So far I have a pipeline that looks like:

{
          "$unwind": "$books"
        },
        {
          $project: {
            _id: 1,
            books: 1,
            month: {
              "$month": "$date"
            },
            year: {
              "$year": "$date"
            }
          }
        },
        {
          $project: {
            _id: 1,
            books: 1,
            date: {
              $concat: [
                {
                  $substr: [
                    "$year",
                    0,
                    4
                  ]
                },
                "-",
                {
                  $substr: [
                    "$month",
                    0,
                    2
                  ]
                },
                
              ]
            }
          }
        },
        {
          $group: {
            _id: {
              date: "$date",
              books: {
                freq: {
                  $sum: 1
                }
              }
            }
          }
        },
        {
          $project: {
            "_id": 1,
            "date": 1,
            "books": 1
          }
        },
        
      ]
    }

I aiming for a final output that looks like:

[
{date: "2022-04", authors: { "Johnson": 2, "Smith": 1}},
{date: "2022-05", authors: {"Johnson": 1, "Smith": 2, "Brooks": 1}}
]

I've seen ways to run a count on the subdocs, but in trying to implement I'm losing my group by date or just getting errors. I've seen enough to know its doable, just lost trying to get it just right. Any help is appreciated.

答案1

得分: 1

你的前三个阶段可以保持不变(假设日期存储为日期对象而不是字符串)。

之后:

  1. $date$books.author 字段分组,并计算每个组的出现次数。这将给你最终答案中所需的计数。
  2. 然后只按 $date 分组,并将每个计数推送到一个 authors 数组中,格式为键值对 {k:key,v:value},以便在下一个阶段中将其转换为对象。
  3. authors 数组上使用 $arrayToObject 将其转换为对象。

如果你还想对日期进行排序,请添加一个 { $sort: {date: 1 } } 阶段。

db.collection.aggregate([
  { $unwind: "$books" },
  { $project: { _id: 1, books: 1, month: { "$month": "$date" }, year: { "$year": "$date" } } },
  { $project: { _id: 1, books: 1, date: { $concat: [ { $substr: [ "$year", 0, 4 ] }, "-", { $substr: [ "$month", 0, 2 ] } ] } } },
  {
    $group: {
      _id: { date: "$date", author: "$books.author" },
      count: { $sum: 1 }
    }
  },
  {
    $group: {
      _id: "$_id.date",
      authors: { $push: { k: "$_id.author", v: "$count" } }
    }
  },
  {
    $project: {
      _id: 0,
      date: "$_id",
      authors: { $arrayToObject: "$authors" }
    }
  }
])

playground 你可以从顶部的 "Stage" 下拉菜单中查看中间结果。

英文:

your first 3 stages can stay the same (assuming the dates are stored as date objects and not as strings)

after that

  1. group by $date and $books.author fields and count the occurrence of each group. This will give the counts you need in the final answer
  2. Then group by $date only and push each count to an authors array in the format of key value {k:key,v:value} so that it can be converted to an object in next stage
  3. $arrayToObject on authors array to convert it to an object

if you also want to sort the date add a { $sort: {date: 1 } } stage

db.collection.aggregate([
  { $unwind: "$books" },
  { $project: { _id: 1, books: 1, month: { "$month": "$date" }, year: { "$year": "$date" } } },
  { $project: { _id: 1, books: 1, date: { $concat: [ { $substr: [ "$year", 0, 4 ] }, "-", { $substr: [ "$month", 0, 2 ] } ] } } },
  {
    $group: {
      _id: { date: "$date", author: "$books.author" },
      count: { $sum: 1 }
    }
  },
  {
    $group: {
      _id: "$_id.date",
      authors: { $push: { k: "$_id.author", v: "$count" } }
    }
  },
  {
    $project: {
      _id: 0,
      date: "$_id",
      authors: { $arrayToObject: "$authors" }
    }
  }
])

playground you can look at the intermediate results from the Stage dropdown on top

huangapple
  • 本文由 发表于 2023年5月30日 05:09:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360337.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定