英文:
How to pass EMR Serverless PySpark entryPointArguments as variable
问题
I have an EMR Serverless PySpark job I am launching from a step function. I am trying to pass arguments to SparkSubmit from the entryPointArguments in the form of variables set in the beginning of the step function i.e. today_date, source, tuned_parameters, which I then use in the PySpark code.
I was able to find a partial solution in this post here however I am trying to pass variables from the step function and not the hardcoded argument i.e.. "prd".
"JobDriver": {
"SparkSubmit": {
"EntryPoint": "s3://xxxx-my-code/test/my_code_edited_3.py",
"EntryPointArguments": ["-env", "prd", "-source.$", "$.source"]
}
}
Using argparse I am able to read the first argument "-env" and it is successfully returning "prd", however I am having troubles figuring out how to pass a variable for the source argument.
英文:
I have an EMR Serverless PySpark job I am launching from a step function. I am trying to pass arguments to SparkSubmit from the entryPointArguments in the form of variables set in the beginning of the step function i.e. today_date, source, tuned_parameters, which I then use in the PySpark code.
I was able to find a partial solution in this post here however I am trying to pass variables from the step function and not the hardcoded argument i.e.. "prd".
"JobDriver": {
"SparkSubmit": {
"EntryPoint": "s3://xxxx-my-code/test/my_code_edited_3.py",
"EntryPointArguments": ["-env", "prd", "-source.$", "$.source"]
}
}
Using argparse I am able to read the first argument "-env" and it is successfully returning "prd", however I am having troubles figuring out how to pass a variable for the source argument.
答案1
得分: 2
成功找到了这个问题的答案。将变量参数传递给EMR Serverless SparkSubmit是通过AmazonStateLanguage内置函数实现的。
假设StepFunction的JSON输入是:
{
"source": "mysource123",
}
在EntryPointArgument中传递这个变量参数的正确方式是:
"EntryPointArguments.$": "States.Array('-source', $.source)"
然后,可以使用argparse在EMR Serverless中的PySpark作业中读取这个变量:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-source")
args = parser.parse_args()
print(args.source)
打印语句的结果是mysource123。
英文:
Managed to find an answer for this question. Passing variable arguments to EMR Serverless SparkSubmit is achieved with AmazonStateLanguage intrinsic functions.
Provided that the JSON input to the StepFunction is:
{
"source": "mysource123",
}
The correct way to pass this variable argument in the EntryPointArgument is:
"EntryPointArguments.$": "States.Array('-source', $.source)"
Then, using argparse one can read this variable in the PySpark job in EMR Serverless.
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-source")
args = parser.parse_args()
print(args.source)
The result of the print statement is mysource123.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论