如何使用MongoDB聚合只获取那些索引可被5整除的记录?

huangapple go评论70阅读模式
英文:

How to use MongoDB aggregation to get only those records whose index is divisible by 5?

问题

以下是您要翻译的内容:

这是我的数据在名为usages的集合中的样子:

{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675122960, usage: null, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123020, usage: 2.74, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123080, usage: 2.62, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123140, usage: 4.32, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123200, usage: null, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123260, usage: 5.38, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123320, usage: 5.08, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123380, usage: 1.02, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123440, usage: 2.35, limit: 2.7 },

请注意:这只是我的数据样本,实际上我有40000条记录。

我试图使用以下mongoose查询获取最后60条数据:

  const count = await Usage.count()
  Usage.find({}).sort({ timeStamp: 1 }).skip(count - 60);

到目前为止,一切都正常。

但现在,有一个要求,需要在所需范围内获取较少的数据。假设输出只需要来自最后60条记录的12条记录,而不会丢失太多数据。我们将在前端使用这些数据绘制折线图。

因此,我可以筛选数据,以便我将选择索引可被5整除的数据。

所以,这是我的代码:

    const count = await Usage.count();
    const data = await Usage
        .find({})
        .sort({ timeStamp: 1 })
        .skip(count - start)
        .exec();
    const requiredData = data.filter((item, index) => index % 5 === 0);
    console.log(requiredData);

上述代码应该按预期工作,但它非常慢,因为我是在从数据库获取数据之后对数据进行筛选。

理想情况下,我应该使用一些查询来筛选数据,以便数据库只给我筛选后的结果。

有人可以帮助我实现这个吗?

这是一个非常相似的问题:https://stackoverflow.com/questions/62201364/filter-items-of-array-by-index-in-aggregation-pipeline

但在那个问题中,OP想要根据名为arr的特定字段进行筛选。但在我的情况下,我想要根据整个集合进行筛选。我不太了解如何使用聚合来编写查询。我尝试过一些视频教程来学习聚合,但它看起来有点复杂,需要更多时间来学习。

英文:

Here is what my data looks like in a collection named usages:

{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675122960, usage: null, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123020, usage: 2.74, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123080, usage: 2.62, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123140, usage: 4.32, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123200, usage: null, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123260, usage: 5.38, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123320, usage: 5.08, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123380, usage: 1.02, limit: 2.7 },
{ _id: AUTOGENERATED_MONGO_OBJECT_ID, timeStamp: 1675123440, usage: 2.35, limit: 2.7 },

Please Note: This is just sample of how my data looks, actually, I have 40000 records.

I am trying to get last 60 records of data using this mongoose query:

  const count = await Usage.count()
  Usage.find({}).sort({ timeStamp: 1 }).skip(count - 60);

So far, everything is working fine.

But now, there is a requirement to get less data in the required range. Let's say output needs to have only 12 records from the last 60 records without losing much data. We will be plotting a line chart on the frontend using this data.

So, I can filter the data such that I will pick data whose index is divisible by 5.

So, here is my code:

    const count = await Usage.count();
    const data = await Usage
        .find({})
        .sort({ timeStamp: 1 })
        .skip(count - start)
        .exec();
    const requiredData = data.filter((item, index) => index % 5 === 0);
    console.log(requiredData);

The above code should work as expected, but it is very slow because I am filtering data after I get it from the database.

Ideally, I should filter the data using some query so the database should give me filtered results only.

Can someone help me to achieve that?

Here is a very similar question: https://stackoverflow.com/questions/62201364/filter-items-of-array-by-index-in-aggregation-pipeline

But in that question, OP wants to filter based on a particular field named arr. But in my case, I want to filter based on the whole collection. And I don't have much knowledge of how can I write a query using aggregate. I tried some video tutorials to learn aggregate, but it looks a bit complex, and need some more time to learn it.

答案1

得分: 3

尝试这个:

db.collection.aggregate([
   { $sort: { t: -1 } },
   { $limit: 60 },
   {
      $setWindowFields: {
         sortBy: { t: 1 },
         output: {
            pos: { $documentNumber: {} }
         }
      }
   },
   { $set: { remainder: { $mod: ["$pos", 5] } } },
   { $match: { remainder: 1 } },
   { $sort: { t: 1 } },
   { $unset: ["pos", "remainder"] }
])

Mongo Playground

使用 $setWindowFields 您甚至可以更高级地操作,例如如果与前一个文档的差异小于 xyz,则跳过文档。

英文:

Try this one:

db.collection.aggregate([
   { $sort: { t: -1 } },
   { $limit: 60 },
   {
      $setWindowFields: {
         sortBy: { t: 1 },
         output: {
            pos: { $documentNumber: {} }
         }
      }
   },
   { $set: { remainder: { $mod: ["$pos", 5] } } },
   { $match: { remainder: 1 } },
   { $sort: { t: 1 } },
   { $unset: ["pos", "remainder"] }
])

Mongo Playground

With $setWindowFields you could do it even more advanced, for example skip the document if difference to previous document is less than xyz.

huangapple
  • 本文由 发表于 2023年6月29日 18:26:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580174.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定