英文:
Role of command-runner.jar and script-runner.jar in aws emr
问题
在执行 EMR 集群中的 Spark 作业时,我们将步骤添加为
'HadoopJarStep': {
'Args': [
'spark-submit',
's3://spark-test-bucket-pr/spark_job/spark_job_3.py'
],
'Jar': 'command-runner.jar'
}
想要了解 'command-runner.jar' 文件在这里的作用。因为在本地 Hadoop 集群中运行 Spark 作业时,我们不使用这样的 JAR 文件。此外,在 EMR 中执行 shell 脚本时,我们使用 'script-runner.jar' 文件。所以,有人可以让我理解这两个 JAR 文件的作用,以及为什么我们需要它们来执行 EMR 集群中的 Spark 作业吗?
英文:
When we execute a spark job in emr cluster,we add step as
'HadoopJarStep': {
'Args': [
'spark-submit',
's3://spark-test-bucket-pr/spark_job/spark_job_3.py'
],
'Jar': 'command-runner.jar'
}
Want to understand what is the role of 'command-runner.jar' file here. Because when we run spark job in on-premise hadoop cluster we dont use any such jar.
Also to execute shell script in emr we use script-runner.jar file.
So anyone can make me understand the role of these two jars and why we need them to execuute spark jobs in EMR cluster?
答案1
得分: 1
以下是AWS提供的辅助jar文件。当你在本地运行时,你直接与Hadoop集群通信。但是在AWS EMR中,你会有一个额外的AWS特定接口层。
你可以使用script-runner
或command-runner
运行相同的EMR作业。据我所知,使用command-runner
,你也可以指定一个包含你的业务逻辑的自定义jar文件。
更多详细信息和示例请参考这里:https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-commandrunner.html#emr-commandrunner-examples
英文:
These are helper jar files provided by AWS. When you run on-premise, you are directly talking to the hadoop cluster. But with AWS EMR, you have an extra layer of AWS specific interfaces.
You can run the same EMR job with either of the script-runner
or command-runner
. AFAIK, with command-runner you could also specify a custom jar file that has your business logic.
More details with examples here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-commandrunner.html#emr-commandrunner-examples
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论