如何以最快的方式在 DynamoDB 中对数百万条记录进行分组并计数

huangapple go评论52阅读模式
英文:

How to get group by and count of millions of records in fastest way in dynamodb

问题

以下是要翻译的内容:

我在 DynamoDB 中有以下行与示例数据。我想以以下格式找出 count。目前我正在使用 querypagination 来实现,但由于有数百万行数据,速度非常慢。有没有其他更快的方法来做,因为我只需要找出计数,而不是单个项目。

示例数据

BrandName BrandCode Eventid
ABC       123       30100
ABC       123       30111
XYZ       456       30100
XYZ       456       30111

输出

事件数量:2

因为基于 eventid 仅有 2 种类型的事件。我想要它们的计数为 2。

注意:应用程序的主要目的是存储来自外部系统的事件。我们只希望以上作为审计,以检查消耗了多少事件和哪些被持久化。

英文:

So I have below rows in DynamoDB with example data. I want to find out the count in below format. Currently I am doing query and pagination to achieve this but its terribly slow due to huge number of rows in millions. Is there any other faster way to do it since I need to find only the count and not individual items.

Example Data

BrandName BrandCode Eventid
ABC       123       30100
ABC       123       30111
XYZ       456       30100
XYZ       456       30111

OUTPUT

Number of events : 2

Above since there are only 2 types of events based on the eventid. I want their count as 2

Note : The main intent of the application is to store the events that come from external system. We just want the above as an audit to check what count of events were consumed and what were persisted.

答案1

得分: 2

要实现这一点,您需要使用DynamoDB Streams和Lambda窗口函数。

基本上,您将所有项目修改流式传输到Lambda,Lambda监听INSERTREMOVE事件。例如,将Lambda窗口设置为1分钟,在Lambda内部,代码将对各个计数进行求和,并将结果写回DynamoDB中的单个项目。现在,您无需运行分页查询,只需执行GetItem操作。当然,这是最终一致性的,取决于Lambda的时间窗口。

这个链接解释了非常相似的概念。

这张图片也描述了类似的情况,其中候选人的投票计数被求和并写回存储总计数的项目中。

英文:

To achieve this you will need to use DynamoDB Streams and a Lambda window function.

Essentially you stream all of the item modifications to Lambda, which listens to INSERT and REMOVE events. Set Lambda window to 1min for example, inside the Lambda the code will sum the individual counts and write back to a single item in DynamoDB. So now instead of running a paginated Query you simply have to do a GetItem. Of course its eventually consistent, depending on the time window for Lambda.

This explains a very similar concept.

This image also depicts something similar where counts for a voting candidate are summed and written back to an item storing the total.

如何以最快的方式在 DynamoDB 中对数百万条记录进行分组并计数

huangapple
  • 本文由 发表于 2023年2月16日 19:24:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75471575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定