英文:
Post status of long running aws lambda function
问题
我有一个在短视频帧上运行机器学习模型的AWS Lambda函数。根据视频的长度,可能需要几分钟来运行。
我从另一个Lambda函数中异步调用这个Lambda。第二个Lambda只是启动第一个Lambda,并返回结果存储在S3中的位置。
我想要一种检查Lambda状态的方法,例如已处理了多少百分比的帧。
目前,我只是将状态写入S3位置,并定期检查该文件。是否有更好的方法来存储和检索这种状态信息?
为了提供一些背景信息,这只是一个教育项目,所以没有真正的要求。S3的方法适用于我的用例,但我想知道对于真正的生产系统来说,什么是最佳方法。
英文:
I have an AWS lambda function that runs a machine learning model on frames of a short video. It might take a few minutes to run, depending on the video.
I invoke the lambda from another lambda asynchronously. The second lambda just launches the first lambda and returns an S3 location where the results are stored.
I would like some way of checking the status of the lambda. E.g. what percent of the frames have been processed.
Right now I am just writing the status to the S3 location and periodically checking that file. Is there any better way of storing and retrieving this kind of status information?
For some context this is just for an educational project so there are no real requirements. The S3 method works fine for my use-case but I'm wondering what would be the best for a real production system.
答案1
得分: 1
我可以考虑几种方法:
- 你所做的实际上对于生产系统来说是可以的。对于这种情况,我可能会使用DynamoDB而不是S3,因为在大规模使用时,它更快速,可能也更便宜。
- 如果有明确的步骤可以将处理分解为,那么你可以使用Step Functions(可能是快速工作流)而不是Lambda,然后通过API检查工作流的状态来查看处理进展到哪个阶段。
- 最后,如果每帧都可以独立处理,你可以在其中加入一个SQS队列以并行处理。这不仅会提高性能和错误处理,还可以通过检查队列大小来跟踪进度(尽管如果可以同时处理多个并行工作可能会有些复杂)。
英文:
I can think of couple approaches:
- What you did is actually fine even for production system. I would probably use DynamoDB instead of S3 for this, as it's faster and probably cheaper at large scale for this use case.
- If there are clear steps that you could break down your processing into, then you could use Step Functions (likely express workflows) instead of Lambda, and then you could check the state of the workflow via API to check how far processing has progressed.
- Finally, if every frame can be processed independently, you could put an SQS queue in-between and parallelise processing. This would not only improve performance and error-handling, but also enable progress tracking by checking queue size (though that might not be trivial if you can have multiple parallel works in progress)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论