AWS Lambda 每天查询 Redshift 一次。

huangapple go评论79阅读模式
英文:

AWS Lambda query to Redshift once a day

问题

我对AWS生态系统还比较新,尤其是在数据方面。

我有一个项目,需要我每24小时自动针对Redshift中的表运行查询,然后可能从查询结果中删除一些列,并使用RESTful API命中第三方站点的一些端点以进行进一步的检查。

我对此有一些问题。

  1. 对于这种任务,使用AWS Lambda(Python)和Redshift是一个好的使用模式吗?
  2. 在AWS Lambda方面,我应该选择Java、Python还是Node.js?哪个在查询Redshift方面有更好的支持?
  3. Lambda和Redshift都将位于同一个VPC中,并使用相同的私有子网用于出站NAT网关,这样的设置安全吗?
  4. 有没有关于这种设置的示例代码分享?
  5. AWS Lambda是否有定期调度程序,每隔24小时触发一次?还是仅基于事件触发?
  6. 由于应用程序数据库位于DynamoDB中,与其类似的数据是否更有效且更容易设置为AWS Lambda查询DynamoDB?

谢谢,
Sam。

英文:

I am fairly new to AWS ecosystem, especially with the data side.

I have a project that requires me to run a query against a table in Redshift every 24hrs automatically, and perhaps remove a few columns in that query results and use RESTful API to hit some endpoints at a 3rd-party site for further checking.

I have a few questions on this.

  1. Is it a good usage pattern to use AWS Lambda (Python) and
    Redshift for such task?
  2. Should I choose Java vs Python vs
    NodeJS for AWS Lambda? Which one has a better support for querying
    Redshift?
  3. Both Lambda and Redshift would be in the same VPC, and
    using the same private subnets for egress NAT gateway, is this
    secured setup?
  4. Any sample code to share on this setup?
  5. Does AWS Lambda has a regular scheduler to trigger every 24hrs? Or is it simply based on events?
  6. Since application database is in DynamoDB, is it more efficient and easier to setup for AWS Lambda to query DynamoDB for similar data instead?

Thanks,
Sam.

答案1

得分: 2

我将尽力回答您的问题:

  1. 是的,没有理由不这样做。
  2. 这完全取决于您的偏好。所有语言都支持您的用例。
  3. 这很好。由于您正在使用 IAM 进一步管理访问权限,您只需确保从 Lambda 函数发出的外部流量得到适当监控。
  4. 有很多资源可供使用。可以查看一下
  5. 您可以设置一个 CloudWatch 规则,使用 CRON 表达式来按需调用您的函数。此外,您还可以为您的函数设置许多其他触发器,例如 DynamoDB 流,CloudWatch 日志事件等等,可能性无穷无尽。
  6. 如果您只是想定期查询以收集一些数据,那么实际存储数据的位置并没有区别。
英文:

I'll try to answer your question with the best intentions:

  1. Yes, there is no argument to not do this.
  2. It solely depends on your preference. All languages offer support for your use case.
  3. This is perfectly fine. As you're managing further access rights with IAM, you just have to look that your egress traffic from your lambda function is properly monitored.
  4. There is a lot out there. Just have a look.
  5. You can set up a CloudWatch rule with a CRON string that will invoke your function as you need it. Also, you can set up a lot of other triggers for your functions like DynamoDB streams, CloudWatch log events, ... there are endless possibilities.
  6. If you just want to do a regularly query to gather some data, there's no difference where your data is actually stored.

答案2

得分: 0

通常你会发现许多 AWS 工具能够解决相同的问题。
正确的选择取决于你的优先考虑。你是在寻找最低成本?效率?还是便利?

我在下面回答你的问题:

在这种情况下,使用 AWS Lambda(Python)和 Redshift 是一个好的使用模式吗?
是的,可以。Redshift 通常是一项非常昂贵的服务,你确定在这里需要 Redshift 吗?

对于 AWS Lambda,我应该选择 Java、Python 还是 NodeJS?哪个更好地支持查询 Redshift?
如果你想避免冷启动,Java 将需要大约每 5 分钟调用一次事件桥。除此之外,真的取决于你。

Lambda 和 Redshift 都会在同一个 VPC 中,并且使用相同的私有子网用于出站 NAT 网关,这种设置安全吗?
可以,但是 NAT 网关会很昂贵。根据你尝试解决的问题,可能会有一些变通方法。

有任何示例代码在这种设置下分享吗?
https://aws.amazon.com/blogs/big-data/building-an-event-driven-application-with-aws-lambda-and-the-amazon-redshift-data-api/

AWS Lambda 是否有定期的调度程序,每 24 小时触发一次?还是仅基于事件?
是的,你可以使用 cron 或类似的方式来使用事件桥编程 Lambda 触发器。

由于应用程序数据库位于 DynamoDB 中,使用 AWS Lambda 查询 DynamoDB 获取类似的数据是否更高效且更容易设置?
我对最后一个问题有点困惑,但通常从 Lambda 查询 DynamoDB 非常容易。

编辑:拼写错误

英文:

Normally you will find many AWS tools are able to solve the same problem.
The right choice depends of your priorities. What are you looking for lowest cost? efficiency? convinience?

I answer your questions below:

Is it a good usage pattern to use AWS Lambda (Python) and Redshift for such task?
Yes it's ok. Redshift is normally a very expensive service, are you sure you need Redshift here?

Should I choose Java vs Python vs NodeJS for AWS Lambda? Which one has a better support for querying Redshift?

Java will require an event bridge call every 5 minutes or so if you want to avoid cold starts. Apart from that it really is up to you.

Both Lambda and Redshift would be in the same VPC, and using the same private subnets for egress NAT gateway, is this secured setup?
It's ok but again NAT Gateways are expensive. Depending on the problem you are trying to solve there might be some work around.

Any sample code to share on this setup?
https://aws.amazon.com/blogs/big-data/building-an-event-driven-application-with-aws-lambda-and-the-amazon-redshift-data-api/

Does AWS Lambda has a regular scheduler to trigger every 24hrs? Or is it simply based on events?
Yes you can use cron or similar to program the lambda trigger using event bridge.

Since application database is in DynamoDB, is it more efficient and easier to setup for AWS Lambda to query DynamoDB for similar data instead?
I'm a bit confused with this last question, but normally is very easy to query Dynamo fom lambda.

Edit: Typo

huangapple
  • 本文由 发表于 2020年7月24日 14:03:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/63067734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定