英文:
How to get group by and count of millions of records in fastest way in dynamodb
问题
以下是要翻译的内容:
我在 DynamoDB 中有以下行与示例数据。我想以以下格式找出 count
。目前我正在使用 query
和 pagination
来实现,但由于有数百万行数据,速度非常慢。有没有其他更快的方法来做,因为我只需要找出计数,而不是单个项目。
示例数据
BrandName BrandCode Eventid
ABC 123 30100
ABC 123 30111
XYZ 456 30100
XYZ 456 30111
输出
事件数量:2
因为基于 eventid
仅有 2 种类型的事件。我想要它们的计数为 2。
注意:应用程序的主要目的是存储来自外部系统的事件。我们只希望以上作为审计,以检查消耗了多少事件和哪些被持久化。
英文:
So I have below rows in DynamoDB with example data. I want to find out the count
in below format. Currently I am doing query
and pagination
to achieve this but its terribly slow due to huge number of rows in millions. Is there any other faster way to do it since I need to find only the count and not individual items.
Example Data
BrandName BrandCode Eventid
ABC 123 30100
ABC 123 30111
XYZ 456 30100
XYZ 456 30111
OUTPUT
Number of events : 2
Above since there are only 2 types of events based on the eventid
. I want their count as 2
Note : The main intent of the application is to store the events that come from external system. We just want the above as an audit to check what count of events were consumed and what were persisted.
答案1
得分: 2
要实现这一点,您需要使用DynamoDB Streams和Lambda窗口函数。
基本上,您将所有项目修改流式传输到Lambda,Lambda监听INSERT
和REMOVE
事件。例如,将Lambda窗口设置为1分钟,在Lambda内部,代码将对各个计数进行求和,并将结果写回DynamoDB中的单个项目。现在,您无需运行分页查询,只需执行GetItem
操作。当然,这是最终一致性的,取决于Lambda的时间窗口。
这个链接解释了非常相似的概念。
这张图片也描述了类似的情况,其中候选人的投票计数被求和并写回存储总计数的项目中。
英文:
To achieve this you will need to use DynamoDB Streams and a Lambda window function.
Essentially you stream all of the item modifications to Lambda, which listens to INSERT
and REMOVE
events. Set Lambda window to 1min for example, inside the Lambda the code will sum the individual counts and write back to a single item in DynamoDB. So now instead of running a paginated Query you simply have to do a GetItem
. Of course its eventually consistent, depending on the time window for Lambda.
This explains a very similar concept.
This image also depicts something similar where counts for a voting candidate are summed and written back to an item storing the total.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论