英文:
Handling paging with changing sort orders
问题
我正在创建一个 RESTful web 服务(使用 Golang),它从数据库中提取一组行,并将其返回给客户端(智能手机应用程序或 Web 应用程序)。该服务需要能够提供分页功能。唯一的问题是,这些数据是根据一个经常变化的“计算”列进行排序的(例如,网站上某个内容的“赞”或“踩”数量),因此在客户端请求之间,行可能会在页面编号之间跳动。
我已经研究了一些 PostgreSQL 的功能,可能可以帮助我解决这个问题,但似乎没有什么很好的解决方案。
- Materialized Views:用于保存“陈旧”的数据,这些数据只在一段时间后才更新。这并不能真正解决问题,因为如果用户在 Materialized View 更新时浏览数据,数据仍然会跳动。
- Cursors:为每个客户端会话创建并在请求之间保持。如果同时有很多并发会话(肯定会有的话),这似乎会是一个噩梦。
有人对如何处理这个问题有什么建议,无论是在客户端还是在数据库端?我真的能做些什么,还是这样的问题通常只能由客户端来解决?
编辑: 我应该提到,智能手机应用程序允许用户通过“无限滚动”查看更多数据,因此它在客户端自己维护了一个数据列表。
英文:
I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.
I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.
- Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
- Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).
Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?
Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.
答案1
得分: 15
这是一个没有完全令人满意解决方案的问题,因为你试图结合本质上不兼容的要求:
-
只向客户端按需发送所需的数据,即不能下载整个数据集然后在客户端进行分页。
-
最小化服务器必须跟踪的每个客户端状态的数量,以便与大量客户端的可扩展性相适应。
-
为每个客户端维护不同的状态。
这是一种“选择其中两个”的情况。你必须做出妥协;接受无法完全保持每个客户端的分页状态正确,接受必须将大型数据集下载到客户端,或接受必须使用大量服务器资源来维护客户端状态。
在这些妥协中有一些变化,它们混合了各种妥协,但归根结底都是这样。
例如,有些人会向客户端发送一些额外的数据,足以满足大多数客户端的要求。如果客户端超出了这个限制,那么分页就会出错。
一些系统会在短时间内缓存客户端状态(使用短期未记录的表、临时文件等),但会快速过期,因此如果客户端不断请求新数据,它的分页就会出错。
等等。
另请参阅:
- https://stackoverflow.com/q/13144005/398670
- https://stackoverflow.com/q/13142834/398670
- https://stackoverflow.com/q/24688135/398670
- https://stackoverflow.com/q/20340919/398670
- https://stackoverflow.com/q/16916633/398670
- https://stackoverflow.com/q/16839881/398670
我可能会实现一种混合解决方案,例如:
-
使用游标,读取并立即将数据的第一部分发送给客户端。
-
立即从游标中获取足够的额外数据,以满足99%的客户端需求。将其存储到快速、不安全的缓存(如memcached、Redis、BigMemory、EHCache等)中,使用一个键将其存储起来,以便后续请求可以通过相同的客户端检索它。然后关闭游标以释放数据库资源。
-
按最近最少使用的方式过期缓存,因此如果客户端读取速度不够快,它们必须从数据库中获取新的数据集,并且分页会发生变化。
-
如果客户端需要的结果超过大多数同行,分页将在某个时刻发生变化,从而切换到直接从数据库读取而不是从缓存读取,或生成一个新的更大的缓存数据集。
这样,大多数客户端不会注意到分页问题,而且你不必向大多数客户端发送大量数据,但你不会使数据库服务器崩溃。然而,你需要一个大而强大的缓存来实现这一点。它的实用性取决于你的客户端是否能够处理分页中断-如果分页中断是不可接受的,那么你只能在数据库端使用游标、临时表,在第一次请求时复制整个结果集等。它还取决于数据集的大小和每个客户端通常需要的数据量。
英文:
This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:
-
Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.
-
Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.
-
Maintain different state for each client
This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.
There are variations within those that mix the various compromises, but that's what it all boils down to.
For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.
Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.
Etc.
See also:
- https://stackoverflow.com/q/13144005/398670
- https://stackoverflow.com/q/13142834/398670
- https://stackoverflow.com/q/24688135/398670
- https://stackoverflow.com/q/20340919/398670
- https://stackoverflow.com/q/16916633/398670
- https://stackoverflow.com/q/16839881/398670
I'd probably implement a hybrid solution of some form, like:
-
Using a cursor, read and immediately send the first part of the data to the client.
-
Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.
-
Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.
-
If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.
That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.
答案2
得分: 0
我对这个问题没有一个完美的解决方案。但是,如果你希望用户对数据有一个陈旧的视图,那么游标是一个不错的选择。你唯一可以调整的是只在游标中存储前两页的数据。超过这个范围,你需要再次获取数据。
英文:
I am not aware of a perfect solution for this problem. But if you want the user to have a stale view of the data then cursor is the way to go. Only tuning you can do is to store only the data for 1st 2 pages in the cursor. Beyond that you fetch it again.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论