英文:
mongodb caching with aggregation pipeline
问题
MongoDB聚合管道示例:
db.testing.aggregate(
{
$match : { hosting : "aws.amazon.com" }
},
{
$group : { _id : "$hosting", total : { $sum : 1 } }
},
{
$project : { title : 1 , author : 1, <一些其他转换> }
},
{ $sort : { total : -1 } }
);
现在我想启用分页。有两种选择。
1. 在管道中使用 $skip 和 $limit。
{ $skip : pageNumber * pageSize }
{ $limit : pageSize }
对于每个页面,可以使用外部API级别的缓存,这将减少重复加载相同页面的时间,但由于排序,每个页面的第一次加载将会很慢,因为需要进行线性扫描。
2. 在应用程序中处理分页。
- 缓存 findAll 结果,即 List findAll();
- 现在分页将在服务层处理,并发布结果。
- 从下一个请求开始,您将引用缓存的结果并从缓存中发送所需的记录集。
问题:如果数据库没有进行一些神奇的优化,第二种方法似乎更好。在第一种方法中,我认为由于管道涉及排序,因此每个页面请求都将对整个表进行扫描,这将不是最佳选择。您的看法是什么?应该选择哪一种?哪种做法是良好的实践(将一些数据库逻辑移到服务层以进行优化是否明智)?
英文:
mongodb aggregate pipeline like
db.testing.aggregate(
{
$match : {hosting : "aws.amazon.com"}
},
{
$group : { _id : "$hosting", total : { $sum : 1 } }
},
{
$project : { title : 1 , author : 1, <few other transformations> }
{$sort : {total : -1}}
);
Now I want to enable paging. I have 2 options.
-
Use skip and limit in the pipeline.
{ $skip : pageNumber * pageSize } { $limit : pageSize }
External API level caching for each page can be used which will reduce time for repeated loading of same pages, but the first loading of each page will be painful because of the linear scan due to sorting.
- Handle pagination in application.
- Cache the findAll result i.e for List findAll();
- Now pagination will be handled at the service layer and result will be published
- From next request onward you will be referring to the cached result and send the desired set of records from the cache.
Question: 2nd approach seems better if database is not doing some magical optimizations. In 1st, my view is that since the pipeline involves sorting, hence every page request will do a scan of the full table, which will be sub-optimal. What are your views? Which one should be done? What would you choose? What is the good practice(Is moving some db logic to service layer for optimizations advisable)?
答案1
得分: 2
这取决于您的数据。
MongoDB不会缓存查询结果,以便为相同的查询返回缓存的结果。MongoDB文档链接
但是,您可以创建视图(使用源数据和管道),并根据需要进行按需更新。这将允许您具有良好性能的聚合数据,用于分页,并定期更新内容。您可以创建索引以获得更好的性能(无需在服务层开发额外的逻辑)
此外,如果您始终通过hosting字段进行过滤和$group
操作,那么您可以受益于MongoDB索引在最后的$sort
和$match
阶段进行交换。在这种情况下,MongoDB将使用索引进行筛选和排序,分页将在内存中完成。
db.testing.createIndex({hosting: -1})
db.collection.aggregate([
{
$match: {
hosting: "aws.amazon.com"
}
},
{
$sort: {
hosting: -1
}
},
{
$group: {
_id: "$hosting",
title: {
$first: "$title"
},
author: {
$first: "$author"
},
total: {
$sum: 1
}
}
},
{
$project: {
title: 1,
author: 1,
total: 1
}
},
{ $skip : pageNumber * pageSize },
{ $limit : pageSize }
])
英文:
It depends on your data.
MongoDB does not cache the query results in order to return the cached results for identical queries. https://docs.mongodb.com/manual/faq/fundamentals/#does-mongodb-handle-caching
However, you may create View (from source + pipeline) and update it on-demand. This will allow you to have aggregated data with good performance for paging and update the content periodically. You may create indexes for better performance (No need to develop in service layer extra logic)
Also, <i>if you always filter and $group
by hosting field</i>, you may benefit MongoDB index swapping last $sort
next ot $match
stage. In this case, MongoDB will use index for filter + sort and paging are done in memory.
db.testing.createIndex({hosting:-1})
db.collection.aggregate([
{
$match: {
hosting: "aws.amazon.com"
}
},
{
$sort: {
hosting: -1
}
},
{
$group: {
_id: "$hosting",
title: {
$first: "$title"
},
author: {
$first: "$author"
},
total: {
$sum: 1
}
}
},
{
$project: {
title: 1,
author: 1,
total: 1
}
},
{ $skip : pageNumber * pageSize },
{ $limit : pageSize }
])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论