MongoDB使用聚合管道进行缓存

huangapple go评论64阅读模式
英文:

mongodb caching with aggregation pipeline

问题

MongoDB聚合管道示例

db.testing.aggregate(
    { 
        $match : { hosting : "aws.amazon.com" }
    },
    { 
        $group : { _id : "$hosting", total : { $sum : 1 } }
    },
    {
        $project : { title : 1 , author : 1, <一些其他转换> }
    },
    { $sort : { total : -1 } }
);

现在我想启用分页有两种选择

1. 在管道中使用 $skip 和 $limit

    { $skip : pageNumber * pageSize }
    { $limit : pageSize }

对于每个页面可以使用外部API级别的缓存这将减少重复加载相同页面的时间但由于排序每个页面的第一次加载将会很慢因为需要进行线性扫描

2. 在应用程序中处理分页
   - 缓存 findAll 结果即 List findAll();
   - 现在分页将在服务层处理并发布结果
   - 从下一个请求开始您将引用缓存的结果并从缓存中发送所需的记录集

问题如果数据库没有进行一些神奇的优化第二种方法似乎更好在第一种方法中我认为由于管道涉及排序因此每个页面请求都将对整个表进行扫描这将不是最佳选择您的看法是什么应该选择哪一种哪种做法是良好的实践将一些数据库逻辑移到服务层以进行优化是否明智)?
英文:

mongodb aggregate pipeline like

db.testing.aggregate(
    { 
	$match : {hosting : &quot;aws.amazon.com&quot;}
    },
    { 
	$group : { _id : &quot;$hosting&quot;, total : { $sum : 1 } }
    },
    {
    $project : { title : 1 , author : 1, &lt;few other transformations&gt; }
    {$sort : {total : -1}}
);

Now I want to enable paging. I have 2 options.

  1. Use skip and limit in the pipeline.

     { $skip : pageNumber * pageSize }
     { $limit : pageSize }
    

External API level caching for each page can be used which will reduce time for repeated loading of same pages, but the first loading of each page will be painful because of the linear scan due to sorting.

  1. Handle pagination in application.
    • Cache the findAll result i.e for List findAll();
    • Now pagination will be handled at the service layer and result will be published
    • From next request onward you will be referring to the cached result and send the desired set of records from the cache.

Question: 2nd approach seems better if database is not doing some magical optimizations. In 1st, my view is that since the pipeline involves sorting, hence every page request will do a scan of the full table, which will be sub-optimal. What are your views? Which one should be done? What would you choose? What is the good practice(Is moving some db logic to service layer for optimizations advisable)?

答案1

得分: 2

这取决于您的数据。

MongoDB不会缓存查询结果,以便为相同的查询返回缓存的结果。MongoDB文档链接

但是,您可以创建视图(使用源数据和管道),并根据需要进行按需更新。这将允许您具有良好性能的聚合数据,用于分页,并定期更新内容。您可以创建索引以获得更好的性能(无需在服务层开发额外的逻辑)

此外,如果您始终通过hosting字段进行过滤和$group操作,那么您可以受益于MongoDB索引在最后的$sort$match阶段进行交换。在这种情况下,MongoDB将使用索引进行筛选和排序,分页将在内存中完成。

db.testing.createIndex({hosting: -1})

db.collection.aggregate([
  {
    $match: {
      hosting: "aws.amazon.com"
    }
  },
  {
    $sort: {
      hosting: -1
    }
  },
  {
    $group: {
      _id: "$hosting",
      title: {
        $first: "$title"
      },
      author: {
        $first: "$author"
      },
      total: {
        $sum: 1
      }
    }
  },
  {
    $project: {
      title: 1,
      author: 1,
      total: 1
    }
  },
  { $skip : pageNumber * pageSize },
  { $limit : pageSize }
])
英文:

It depends on your data.

MongoDB does not cache the query results in order to return the cached results for identical queries. https://docs.mongodb.com/manual/faq/fundamentals/#does-mongodb-handle-caching

However, you may create View (from source + pipeline) and update it on-demand. This will allow you to have aggregated data with good performance for paging and update the content periodically. You may create indexes for better performance (No need to develop in service layer extra logic)

Also, <i>if you always filter and $group by hosting field</i>, you may benefit MongoDB index swapping last $sort next ot $match stage. In this case, MongoDB will use index for filter + sort and paging are done in memory.

db.testing.createIndex({hosting:-1})

db.collection.aggregate([
  {
	$match: {
	  hosting: &quot;aws.amazon.com&quot;
	}
  },
  {
	$sort: {
	  hosting: -1
	}
  },
  {
	$group: {
	  _id: &quot;$hosting&quot;,
	  title: {
		$first: &quot;$title&quot;
	  },
	  author: {
		$first: &quot;$author&quot;
	  },
	  total: {
		$sum: 1
	  }
	}
  },
  {
	$project: {
	  title: 1,
	  author: 1,
	  total: 1
	}
  },
  { $skip : pageNumber * pageSize },
  { $limit : pageSize }
])

huangapple
  • 本文由 发表于 2020年1月3日 18:23:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/59576841.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定