英文:
AWS VPC for Glue Job accessing SOAP API
问题
我刚开始使用AWS。我正在构建一个AWS Glue脚本,用于连接到指定的SOAP API,并下载响应数据(XML格式),然后保存到S3。
我的问题是 - 我是否需要在AWS上设置VPC、子网、IP地址(私有/公共),以使Glue作业正常工作,即连接到SOAP API并提取数据?我在互联网上搜索了很多,但没有找到明确的答案。如果确实需要VPC,请建议一些资源,可以帮助我配置AWS上的Glue作业以正常运行。
请告诉我是否需要关于我的问题的其他信息。
谢谢。
英文:
I am just getting started with AWS. I am building an AWS Glue script that connects to a given SOAP API and downloads response data (in XML) and saves to S3.
My question is - do I need to setup VPC with subnet, IP (private/public) etc on AWS for the Glue job to work i.e. connect to the SOAP API and extract data ? I searched a lot over the internet but didn't find a concrete answer. If VPC is indeed required, then please suggest some resource which would help me configure the same on AWS for the Glue job to work.
Please let me know if additional information is required about my question.
Thanks.
答案1
得分: 1
以下是您要翻译的内容:
"The VPC configuration is not needed, as AWS Glue by default has normal internet access, because it's using default VPC that is managed by AWS for you.
So as long as your SOAP API is accessible through the internet, it should work fine.
However, if your API is accessible only in a private network, then you need to make sure that AWS Glue has assigned the correct VPC that is a part of that network, including the correct VPC Security Groups.
If your API is accessible only using a private network, then if it's possible, I would recommend making it available over the internet and, for example, getting credentials for it using AWS Secrets Manager.
But how to implement that?
Since the logic for calling the SOAP API would require some custom Python/Scala code (depends on which runtime of Glue you are using), it might get complicated, as adding external libraries is additional work.
To abstract the Glue runtime from the SOAP API, I would recommend using the following design:
- Implement your custom logic in a Lambda function.
- Grant the correct access to the Glue IAM role to invoke that function.
- Invoke the Lambda from your Glue code (there is an example of how to do that here: https://stackoverflow.com/a/50542986/6639950)
Probably you just want to fetch some data in a single or a few different ways, so it's, in my opinion, the best way to do it. Thanks to this, you can use your runtime of choice in Lambda and easily develop that, including using custom libraries for SOAP API calls."
英文:
The VPC configuration is not needed, as AWS Glue by default has normal internet access, because it's using default VPC that is managed by AWS for you.
So as long as your SOAP API is accessible through internet, it should work fine.
However if your API is accessible only in a private network, then you need to make sure, that AWS Glue has assigned correct VPC that is a part of that network, including correct VPC Security Groups.
If your API is accessible only using private network, then if it's possible I would recommend to make it available over the internet and for example get credentials for it using AWS Secrets Manager
But how to implement that?
Since the logic for calling SOAP API would require some custom Python/Scala code (depends which runtime of Glue you are using), it might get complicated, as adding external libraries is additional work.
To abstract the Glue runtime from SOAP API, I would recommend using following design:
- implement your custom logic in Lambda function
- grant correct access to Glue IAM role to invoke that function
- invoke the Lambda from your Glue code (there is an example how to do that here: https://stackoverflow.com/a/50542986/6639950)
Probably you just want to fetch some data in a single or a few different ways, so it's in my opinion the best way to do it. Thanks to this you can use your runtime of choice in Lambda and easily develop that, including using custom libraries for SOAP API calls.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论