2020年1月3日 18:23:28go评论90阅读模式

英文:

mongodb caching with aggregation pipeline

问题

MongoDB聚合管道示例：
db.testing.aggregate(
    { 
        $match : { hosting : "aws.amazon.com" }
    },
    { 
        $group : { _id : "$hosting", total : { $sum : 1 } }
    },
    {
        $project : { title : 1 , author : 1, <一些其他转换> }
    },
    { $sort : { total : -1 } }
);
现在我想启用分页。有两种选择。
1. 在管道中使用 $skip 和 $limit。
    { $skip : pageNumber * pageSize }
    { $limit : pageSize }
对于每个页面，可以使用外部API级别的缓存，这将减少重复加载相同页面的时间，但由于排序，每个页面的第一次加载将会很慢，因为需要进行线性扫描。
2. 在应用程序中处理分页。
   - 缓存 findAll 结果，即 List findAll();
   - 现在分页将在服务层处理，并发布结果。
   - 从下一个请求开始，您将引用缓存的结果并从缓存中发送所需的记录集。
问题：如果数据库没有进行一些神奇的优化，第二种方法似乎更好。在第一种方法中，我认为由于管道涉及排序，因此每个页面请求都将对整个表进行扫描，这将不是最佳选择。您的看法是什么？应该选择哪一种？哪种做法是良好的实践（将一些数据库逻辑移到服务层以进行优化是否明智）？

英文:

mongodb aggregate pipeline like

db.testing.aggregate(
    { 
	$match : {hosting : &quot;aws.amazon.com&quot;}
    },
    { 
	$group : { _id : &quot;$hosting&quot;, total : { $sum : 1 } }
    },
    {
    $project : { title : 1 , author : 1, &lt;few other transformations&gt; }
    {$sort : {total : -1}}
);

Now I want to enable paging. I have 2 options.

Use skip and limit in the pipeline.

 { $skip : pageNumber * pageSize }
 { $limit : pageSize }

External API level caching for each page can be used which will reduce time for repeated loading of same pages, but the first loading of each page will be painful because of the linear scan due to sorting.

Handle pagination in application.
- Cache the findAll result i.e for List findAll();
- Now pagination will be handled at the service layer and result will be published
- From next request onward you will be referring to the cached result and send the desired set of records from the cache.

Question: 2nd approach seems better if database is not doing some magical optimizations. In 1st, my view is that since the pipeline involves sorting, hence every page request will do a scan of the full table, which will be sub-optimal. What are your views? Which one should be done? What would you choose? What is the good practice(Is moving some db logic to service layer for optimizations advisable)?

答案1

得分: 2

这取决于您的数据。

MongoDB不会缓存查询结果，以便为相同的查询返回缓存的结果。MongoDB文档链接

但是，您可以创建视图（使用源数据和管道），并根据需要进行按需更新。这将允许您具有良好性能的聚合数据，用于分页，并定期更新内容。您可以创建索引以获得更好的性能（无需在服务层开发额外的逻辑）

此外，如果您始终通过hosting字段进行过滤和$group操作，那么您可以受益于MongoDB索引在最后的$sort和$match阶段进行交换。在这种情况下，MongoDB将使用索引进行筛选和排序，分页将在内存中完成。

db.testing.createIndex({hosting: -1})
db.collection.aggregate([
  {
    $match: {
      hosting: "aws.amazon.com"
    }
  },
  {
    $sort: {
      hosting: -1
    }
  },
  {
    $group: {
      _id: "$hosting",
      title: {
        $first: "$title"
      },
      author: {
        $first: "$author"
      },
      total: {
        $sum: 1
      }
    }
  },
  {
    $project: {
      title: 1,
      author: 1,
      total: 1
    }
  },
  { $skip : pageNumber * pageSize },
  { $limit : pageSize }
])

英文:

It depends on your data.

MongoDB does not cache the query results in order to return the cached results for identical queries. https://docs.mongodb.com/manual/faq/fundamentals/#does-mongodb-handle-caching

However, you may create View (from source + pipeline) and update it on-demand. This will allow you to have aggregated data with good performance for paging and update the content periodically. You may create indexes for better performance (No need to develop in service layer extra logic)

Also, <i>if you always filter and $group by hosting field</i>, you may benefit MongoDB index swapping last $sort next ot $match stage. In this case, MongoDB will use index for filter + sort and paging are done in memory.

db.testing.createIndex({hosting:-1})
db.collection.aggregate([
  {
	$match: {
	  hosting: &quot;aws.amazon.com&quot;
	}
  },
  {
	$sort: {
	  hosting: -1
	}
  },
  {
	$group: {
	  _id: &quot;$hosting&quot;,
	  title: {
		$first: &quot;$title&quot;
	  },
	  author: {
		$first: &quot;$author&quot;
	  },
	  total: {
		$sum: 1
	  }
	}
  },
  {
	$project: {
	  title: 1,
	  author: 1,
	  total: 1
	}
  },
  { $skip : pageNumber * pageSize },
  { $limit : pageSize }
])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MongoDB使用聚合管道进行缓存

问题

答案1

无法以非阻塞方式在 Docker 容器内读取类路径资源文件。

如何在使用Oracle 10g的Spring JPA存储库中获取分页结果

安全的方法来“解包”webflux/reactor流中的Mono？

Selenium通过Selenium保持循环导航（Python）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。