什么是考虑成本和性能的最佳AWS调度器使用方式?

huangapple go评论57阅读模式
英文:

What is the best way to use AWS Scheduler considering cost and performance

问题

I'm working on a JAVA project which uploads files to AWS S3 bucket. Now I need to process those files in S3 (validate and send data to database) everyday at 8.00 a.m. I'm planning to use AWS scheduler for this. But I'm confuse what's the scheduler I have to use and how to use. I went through documentation and found about AWS Batch and AWS cloud watch scheduler through Lambda. But I have no idea about what's the best way to use AWS scheduler in this scenario. Not sure weather AWS Batch works for this. Actually I need to consider the cost as well.

I'm glad if you could suggest me the best way to resolve this. Alternative methods are also welcome.

P.S: File process will take more than 15 mins. And also I need to config several other schedulers as well.

英文:

I'm working on a JAVA project which uploads files to AWS S3 bucket. Now I need to process those files in S3 (validate and send data to database) everyday at 8.00 a.m. I'm planning to use AWS scheduler for this. But I'm confuse what's the scheduler I have to use and how to use. I went through documentation and found about AWS Batch and AWS cloud watch scheduler through Lambda. But I have no idea about what's the best way to use AWS scheduler in this scenario. Not sure weather AWS Batch works for this. Actually I need to consider the cost as well.
I'm glad if you could suggest me the best way to resolve this. Alternative methods are also welcome.

P.S: File process will take more than 15 mins. And also I need to config several other schedulers as well.

答案1

得分: 3

  1. 使用 CloudWatch 规则在早上8点触发一个 Lambda 函数(例如:SchedulerLambda)。
  2. SchedulerLambda 不会处理任何文件,它将列出“定义”的位置中的文件。
  3. 对于每个文件,SchedulerLambda 将向主题发送 SNS 消息。
  4. SNS 有一个 SQS 订阅。
  5. SQS 有一个 Lambda 触发器(例如:FileProcessorLambda)。
  6. FileProcessorLambda 将以批处理的方式处理文件(最多为 10 个)。您可以根据您的用例调整批处理的数量。
  7. FileProcessorLambda 处理完一个文件后,还会将状态记录到 DynamoDB 中,以便在任何时候进行重试和恢复。

注意:这里的设计优先考虑成本、扩展性、维护性和设计(松耦合性)。

注意:从这里的假设是处理一个文件(单个文件)不会超过 Lambda 的 15 分钟限制。如果处理一个文件的时间超过 15 分钟,上述解决方案将不起作用。如果您确认,我可以提供另一个解决方案。

英文:

My proposed solution here is

  1. Using Clouwatch Rule to trigger a lambda at 8am. (for example: SchedulerLambda)
  2. SchedulerLambda will NOT process any file, it will list files in the 'defined' location.
  3. For each of file, SchedulerLambda will send a SNS messsage to topic
  4. SNS has a SQS subscription
  5. SQS has a Lambda trigger (for example: FileProcessorLambda)
  6. FileProcessorLambda will process by a batch (max is 10). You can adjust a number of batch depends on your use-case.
  7. After FileProcessorLambda has finished a file, it will track status to DynamoDB as well. The reason for it to retry and resume at any time.

Note: The design here is to take cost, scaling, maintenance and design (loose-coupling) as priority.

Note: The assumption from here is processing a file (single files) doesn't take more than 15 minutes as limit of lambda. If a processing time of a file takes more than 15 minutes, the above solution won't work. I can give another solution if you confirm.

答案2

得分: 1

通过EventBridge(以前是CloudWatch Events)和AWS Lambda之一的方式(AWS中总是有很多方式)来实现。我之前没有使用过AWS Batch。

编写并部署您的AWS Lambda函数。在Lambda中,您可以访问S3存储桶,验证数据,然后将其发送到数据库。

如果您打开AWS控制台,转到您的Lambda函数。接下来,添加触发器并选择EventBridge。

现在您可以创建一个新规则。要使它每天在上午8点运行,您的计划表达式是cron(0 8 * * ? *)

一些需要注意的事项:

  • 不要忘记Lambda函数的最长运行时间不能超过15分钟。
  • 计划表达式使用的是UTC时间,而不是本地时间。夏令时可能会影响计划表达式。
英文:

One way (there are always many with AWS) is through EventBridge formerly CloudWatch Events and AWS Lambda. I haven't worked with AWS Batch before.

Code and deploy your AWS Lambda function. In your Lambda you access the S3 bucket, validate, and send the data to the database.

If you open the AWS Console, go to your Lambda function. Next add Trigger and select EventBridge.

Now you can create a new rule. To make it run everday at 8am your Schedule Expression is cron(0 8 * * ? *)

Some things to keep in mind:

  • Don't forget a Lambda can never run longer than 15 minutes
  • Schedule Expression are in UTC and not in local time. DST is an issue.

huangapple
  • 本文由 发表于 2020年7月29日 12:41:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/63146466.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定