AddJobFlowStep在AWS EMR SDK中的正确使用方法是什么?

huangapple go评论77阅读模式
英文:

What is the correct way to use the AddJobFlowStep in the AWS EMR sdk?

问题

我已经使用go AWS SDK创建了一个集群,并向其添加了一个作业流程步骤。然而,当我以编程方式执行时,步骤的执行总是失败的。有一个有趣的观点需要注意,当我从UI附加jar文件时,它成功执行。

因此,当jar文件从UI附加时,步骤执行的结果如下(成功运行并进入COMPLETED状态):
(复制完整文本)

JAR位置:command-runner.jar
主类:无
参数:
spark-submit --deploy-mode cluster --class Hello
s3://mdv-testing/Util-assembly-1.0.jar
失败时的操作:继续执行

然而,当我尝试以编程方式执行时,步骤的输出如下:

状态:失败
原因:找不到主类。
日志文件:s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-337M25MLV3BHT/stderr.gz
详细信息:Caused by: java.lang.ClassNotFoundException: scala.reflect.api.TypeCreator
JAR位置:s3://mdv-testing/Util-assembly-1.0.jar
主类:无
参数:spark-submit "--class Hello"
失败时的操作:取消并等待

我尝试了各种参数组合,并意识到command-runner.jar从未出现。因此,我相应地更改了代码,并将command-runner.jar作为参数发送。现在,它反映了与成功执行的步骤相同的详细信息。以下是修改后的输出:

状态:失败
原因:未知错误。
日志文件:s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-3NI5ZO15VTWQK/
JAR位置:command-runner.jar
主类:无
参数:"spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar"
失败时的操作:取消并等待

Go代码

package main
import (
"fmt"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/emr"
)

func main() {
sess := session.New(&aws.Config{Region: aws.String("us-east-1")})
svc := emr.New(sess)

params := &emr.AddJobFlowStepsInput{
JobFlowId: aws.String("j-3RW9K14BS6aaa"),
Steps: []*emr.StepConfig{
{
    ActionOnFailure: aws.String("CANCEL_AND_WAIT"), //TERMINATE_CLUSTER"),
    HadoopJarStep: &emr.HadoopJarStepConfig{
    Args: []*string{
                     aws.String("spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar"),
                   },
                     Jar: aws.String("command-runner.jar"), },
                     Name: aws.String("ReportJarExecution"),
    },
},
}

resp, err := svc.AddJobFlowSteps(params)

if err != nil {
// 打印错误,将err转换为awserr.Error以获取错误的Code和Message。
fmt.Println(err.Error())
return
}

// 美观地打印响应数据。
fmt.Println(resp)
}

有人可以帮帮我吗!我觉得我离解决方案很近,但它一直逃避我 AddJobFlowStep在AWS EMR SDK中的正确使用方法是什么?

英文:

I've used the go AWS sdk to create a cluster and added a job flow step to it.
However the execution of the step always fails when I do it programatically.
An interesting point to notice is that when I attach the jar from the UI, it successfully executes.

So when the jar is attached from the UI, this is the outcome of the step execution(it runs successfully and moves to the COMPLETED state):
(Copying the full text)

> JAR location : command-runner.jar
> Main class : None Arguments :
> spark-submit --deploy-mode cluster --class Hello
> s3://mdv-testing/Util-assembly-1.0.jar Action on failure: Continue

However, this is the output of the step when I try programatically:

> Status :FAILED Reason : Main Class not found.
> Log File : s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-337M25MLV3BHT/stderr.gz
> Details : Caused by: java.lang.ClassNotFoundException: scala.reflect.api.TypeCreator
> JAR location : s3://mdv-testing/Util-assembly-1.0.jar Main class : None > Arguments : spark-submit "--class Hello"
> Action on failure: Cancel and wait

I tried various combinations for the arguments and realised that the command-runner.jar was never present.
I accordingly made changes to the code and send the command-runner.jar as the argument now. This now reflects the same details as the step that executes successfully.
This is the revised output:

> Status :FAILED Reason : Unknown Error.
> Log File : s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-3NI5ZO15VTWQK/
> JAR location : command-runner.jar
> Main class : None
> Arguments : "spark-submit --deploy-mode cluster --class Hello
> s3://mdv-testing/Util-assembly-1.0.jar
> Action on failure: Cancel and wait

Go Code

package main
import (
"fmt"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/emr"
)

func main() {
sess := session.New(&aws.Config{Region: aws.String("us-east-1")})
svc := emr.New(sess)

params := &emr.AddJobFlowStepsInput{
JobFlowId: aws.String("j-3RW9K14BS6aaa"),
Steps: []*emr.StepConfig{
{
    ActionOnFailure: aws.String("CANCEL_AND_WAIT"), //TERMINATE_CLUSTER"),
    HadoopJarStep: &emr.HadoopJarStepConfig{
    Args: []*string{
                     aws.String("spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar"),
                   },
                     Jar: aws.String("command-runner.jar"), },
                     Name: aws.String("ReportJarExecution"),
    },
},
}

resp, err := svc.AddJobFlowSteps(params)

if err != nil {
// Print the error, cast err to awserr. sError to get the Code and
// Message from an error.
fmt.Println(err.Error())
return
}

// Pretty-print the response data.
fmt.Println(resp)
}

Can someone please help me !!! I think I'm pretty close to the solution but it is evading me big time AddJobFlowStep在AWS EMR SDK中的正确使用方法是什么?

答案1

得分: 0

我成功解决了这个问题。
对于那些遇到类似问题的人,答案是我们需要将参数分别发送到一个数组中。

英文:

I managed to solve this issue.
For anyone who is struggling with something similar, the answer is that we need to send the arguments separately in an array.

huangapple
  • 本文由 发表于 2017年3月21日 17:47:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/42923413.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定