英文:
What is the correct way to use the AddJobFlowStep in the AWS EMR sdk?
问题
我已经使用go AWS SDK创建了一个集群,并向其添加了一个作业流程步骤。然而,当我以编程方式执行时,步骤的执行总是失败的。有一个有趣的观点需要注意,当我从UI附加jar文件时,它成功执行。
因此,当jar文件从UI附加时,步骤执行的结果如下(成功运行并进入COMPLETED状态):
(复制完整文本)
JAR位置:command-runner.jar
主类:无
参数:
spark-submit --deploy-mode cluster --class Hello
s3://mdv-testing/Util-assembly-1.0.jar
失败时的操作:继续执行
然而,当我尝试以编程方式执行时,步骤的输出如下:
状态:失败
原因:找不到主类。
日志文件:s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-337M25MLV3BHT/stderr.gz
详细信息:Caused by: java.lang.ClassNotFoundException: scala.reflect.api.TypeCreator
JAR位置:s3://mdv-testing/Util-assembly-1.0.jar
主类:无
参数:spark-submit "--class Hello"
失败时的操作:取消并等待
我尝试了各种参数组合,并意识到command-runner.jar从未出现。因此,我相应地更改了代码,并将command-runner.jar作为参数发送。现在,它反映了与成功执行的步骤相同的详细信息。以下是修改后的输出:
状态:失败
原因:未知错误。
日志文件:s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-3NI5ZO15VTWQK/
JAR位置:command-runner.jar
主类:无
参数:"spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar"
失败时的操作:取消并等待
Go代码
package main
import (
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/emr"
)
func main() {
sess := session.New(&aws.Config{Region: aws.String("us-east-1")})
svc := emr.New(sess)
params := &emr.AddJobFlowStepsInput{
JobFlowId: aws.String("j-3RW9K14BS6aaa"),
Steps: []*emr.StepConfig{
{
ActionOnFailure: aws.String("CANCEL_AND_WAIT"), //TERMINATE_CLUSTER"),
HadoopJarStep: &emr.HadoopJarStepConfig{
Args: []*string{
aws.String("spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar"),
},
Jar: aws.String("command-runner.jar"), },
Name: aws.String("ReportJarExecution"),
},
},
}
resp, err := svc.AddJobFlowSteps(params)
if err != nil {
// 打印错误,将err转换为awserr.Error以获取错误的Code和Message。
fmt.Println(err.Error())
return
}
// 美观地打印响应数据。
fmt.Println(resp)
}
有人可以帮帮我吗!我觉得我离解决方案很近,但它一直逃避我
英文:
I've used the go AWS sdk to create a cluster and added a job flow step to it.
However the execution of the step always fails when I do it programatically.
An interesting point to notice is that when I attach the jar from the UI, it successfully executes.
So when the jar is attached from the UI, this is the outcome of the step execution(it runs successfully and moves to the COMPLETED state):
(Copying the full text)
> JAR location : command-runner.jar
> Main class : None Arguments :
> spark-submit --deploy-mode cluster --class Hello
> s3://mdv-testing/Util-assembly-1.0.jar Action on failure: Continue
However, this is the output of the step when I try programatically:
> Status :FAILED Reason : Main Class not found.
> Log File : s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-337M25MLV3BHT/stderr.gz
> Details : Caused by: java.lang.ClassNotFoundException: scala.reflect.api.TypeCreator
> JAR location : s3://mdv-testing/Util-assembly-1.0.jar Main class : None > Arguments : spark-submit "--class Hello"
> Action on failure: Cancel and wait
I tried various combinations for the arguments and realised that the command-runner.jar was never present.
I accordingly made changes to the code and send the command-runner.jar as the argument now. This now reflects the same details as the step that executes successfully.
This is the revised output:
> Status :FAILED Reason : Unknown Error.
> Log File : s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-3NI5ZO15VTWQK/
> JAR location : command-runner.jar
> Main class : None
> Arguments : "spark-submit --deploy-mode cluster --class Hello
> s3://mdv-testing/Util-assembly-1.0.jar
> Action on failure: Cancel and wait
Go Code
package main
import (
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/emr"
)
func main() {
sess := session.New(&aws.Config{Region: aws.String("us-east-1")})
svc := emr.New(sess)
params := &emr.AddJobFlowStepsInput{
JobFlowId: aws.String("j-3RW9K14BS6aaa"),
Steps: []*emr.StepConfig{
{
ActionOnFailure: aws.String("CANCEL_AND_WAIT"), //TERMINATE_CLUSTER"),
HadoopJarStep: &emr.HadoopJarStepConfig{
Args: []*string{
aws.String("spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar"),
},
Jar: aws.String("command-runner.jar"), },
Name: aws.String("ReportJarExecution"),
},
},
}
resp, err := svc.AddJobFlowSteps(params)
if err != nil {
// Print the error, cast err to awserr. sError to get the Code and
// Message from an error.
fmt.Println(err.Error())
return
}
// Pretty-print the response data.
fmt.Println(resp)
}
Can someone please help me !!! I think I'm pretty close to the solution but it is evading me big time
答案1
得分: 0
我成功解决了这个问题。
对于那些遇到类似问题的人,答案是我们需要将参数分别发送到一个数组中。
英文:
I managed to solve this issue.
For anyone who is struggling with something similar, the answer is that we need to send the arguments separately in an array.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论