编程取消一个pyspark dataproc批处理作业

huangapple go评论76阅读模式
英文:

Programmatically cancelling a pyspark dataproc batch job

问题

使用golang,我有几个正在运行的dataproc批处理作业,我可以通过创建一个像这样的客户端来访问它们的Uuid。

BatchClient, err := dataproc.NewBatchControllerClient(context, ...options)

如果我想要删除一个批处理作业,我可以使用Google Cloud的golang客户端库来做到这一点(请求体中包含批处理的Uuid)。

_, err := batchClient.DeleteBatch(context, request, ...options)

然而,似乎没有办法以编程方式取消一个已经在运行的批处理作业。如果我尝试删除一个已经在运行的批处理作业,我会得到一个FAILED_PRECONDITION的错误。

现在,我知道Google Cloud的SDK命令行界面有一种简单的方法来取消一个作业,就像这样:

gcloud dataproc batches cancel (BATCH : --region=REGION) [GCLOUD_WIDE_FLAG …]

不幸的是,这种方法不适用于我的应用程序。

英文:

Using golang, I have several dataproc batch jobs running and I can access them via their Uuid by creating a client like this.

BatchClient, err := dataproc.NewBatchControllerClient(context, ...options)

If I wanted to delete a batch job, I could do it using google cloud's golang client library like this (the request body contains the Uuid of the batch)

_, err := batchClient.DeleteBatch(context, request, ...options)

However, there doesn't seem to be any way to cancel a batch that's already running programmatically. If I try to delete a batch that is already running, I rightfully get an error of FAILED_PRECONDITION

Now, I'm aware that Google cloud's SDK cli has a simple way to cancel a job like this:

gcloud dataproc batches cancel (BATCH : --region=REGION) [GCLOUD_WIDE_FLAG …]

Unfortunately, this approach is not a good fit for my application.

答案1

得分: 2

在删除批处理资源之前,您需要确保它处于终止状态(失败或成功)。

要实现这一点,对于正在运行的批处理,您需要通过关联的长时间运行操作进行取消:https://cloud.google.com/dataproc-serverless/docs/reference/rest/v1/projects.locations.operations/cancel

英文:

Before deleting a batch resource you need to make sure that it's in the terminal state (either failed or succeeded).

To achieve this for running batch, you need to cancel it via associated long-running operation: https://cloud.google.com/dataproc-serverless/docs/reference/rest/v1/projects.locations.operations/cancel

答案2

得分: 0

在dataproc golang客户端库的2.0版本中添加了无服务器作业处理功能。

要访问此版本,需要更新以下软件包:

    dataproc "cloud.google.com/go/dataproc/v2/apiv1"

    dataprocpb "cloud.google.com/go/dataproc/v2/apiv1/dataprocpb"

之后,可以使用提供的batchClient.CancelOperation来取消无服务器批处理作业,使用与删除批处理作业相同的客户端,代码如下:

    err := batchClient.CancelOperation(context, request, ...options)
英文:

The functionality for serverless job handling was added in version 2.0 of the dataproc golang client library.

To access this version, the following packages had to be updated:

    dataproc "cloud.google.com/go/dataproc/v2/apiv1"

	dataprocpb "cloud.google.com/go/dataproc/v2/apiv1/dataprocpb" 

Afterwards, the provided batch client.CancelOperation can be used to cancel a serverless batch job using the same client that's used to delete a batch job like this:

err := batchClient.CancelOperation(context, request, ...options)

huangapple
  • 本文由 发表于 2023年6月6日 02:06:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76408949.html
  • go
  • google-cloud-dataproc
  • google-cloud-dataproc-serverless
  • google-cloud-platform
  • pyspark
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定