gRPC Unimplemented在使用JobServiceClient在Vertex AI中创建自定义作业时引发的异常。

huangapple go评论70阅读模式
英文:

gRPC Unimplemented thrown when creating a custom job in Vertex AI using JobServiceClient

问题

I'm providing the translated code section:

我正在尝试实现代码以启动 Vertex 中的自定义作业。

我使用 `gcloud` 来启动自定义作业没有问题:

gcloud ai custom-jobs --project my_project_id create --region=europe-west1 --display-name="train model based on custom container" --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest


我没有找到官方的 .NET 代码示例,但尝试模仿 [其他人在 Python 中执行的方式][1],同时 ChatGPT 生成了类似的代码示例:

```csharp
var projectId = "my_project_id";
var locationId = "europe-west1";
var client = await JobServiceClient.CreateAsync();

var createCustomJobRequest = new CreateCustomJobRequest
{
  ParentAsLocationName = new LocationName(projectId, locationId),
  CustomJob = new CustomJob
  {
    DisplayName = "train model based on custom container",
    JobSpec = new CustomJobSpec()
    {
      WorkerPoolSpecs =
      {
        new WorkerPoolSpec
        {
          MachineSpec = new MachineSpec
          {
            MachineType = "n1-standard-4"
          },
          ReplicaCount = 1,
          ContainerSpec = new ContainerSpec()
          {
            ImageUri = "europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest"
          }
        }
      }
    }
  }
};

var result3 = await client.CreateCustomJobAsync(createCustomJobRequest); // exception thrown here

不幸的是,我收到一个异常:

Grpc.Core.RpcException: 'Status(StatusCode="Unimplemented", Detail="Bad gRPC response. HTTP status code: 404")'

我尝试过但失败了的事情:

  1. 使用接受 CustomJobParent 而不是 CreateCustomJobRequest 对象的 CreateCustomJobAsync() 方法的重载。
  2. 使用 JobServiceClientBuilder 而不是 JobServiceClient.CreateAsync(),并将 Endpoint 参数设置为 europe-west1-aiplatform.googleapis.com

我缺少什么,以便在 Vertex AI 中启动自定义作业?


Please note that this is a direct translation of the code portion you provided.

<details>
<summary>英文:</summary>

I&#39;m trying to implement code to start a custom job in Vertex.

I have no problem starting a custom job using `gcloud`:

gcloud ai custom-jobs --project my_project_id create --region=europe-west1 --display-name="train model based on custom container" --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest


I&#39;ve not been able to find official code sample for .NET but tried to mimick [someone else doing it in Python][1] plus ChatGPT produced a similar code sample:

var projectId = "my_project_id";
var locationId = "europe-west1";
var client = await JobServiceClient.CreateAsync();

var createCustomJobRequest = new CreateCustomJobRequest
{
ParentAsLocationName = new LocationName(projectId, locationId),
CustomJob = new CustomJob
{
DisplayName = "train model based on custom container",
JobSpec = new CustomJobSpec()
{
WorkerPoolSpecs =
{
new WorkerPoolSpec
{
MachineSpec = new MachineSpec
{
MachineType = "n1-standard-4"
},
ReplicaCount = 1,
ContainerSpec = new ContainerSpec()
{
ImageUri = "europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest"
}
}
}
}
}
};

var result3 = await client.CreateCustomJobAsync(createCustomJobRequest); // exception thrown here


Unfortunately, I get an exception back:

Grpc.Core.RpcException: 'Status(StatusCode="Unimplemented", Detail="Bad gRPC response. HTTP status code: 404")'


Things I&#39;ve tried and failed
1) Used the overload of `CreateCustomJobAsync()` that takes a `CustomJob` and a `Parent` instead of a `CreateCustomJobRequest` object.
2) Used `JobServiceClientBuilder` instead of `JobServiceClient.CreateAsync()` and set the `Endpoint` argument as `europe-west1-aiplatform.googleapis.com`.

What am I missing to get a custom job started in Vertex AI?


  [1]: https://stackoverflow.com/questions/73154492/vertex-ai-custom-job-cannot-launch-it-via-python

</details>


# 答案1
**得分**: 1

I should have digged a bit more around `JobServiceClientBuilder`. Specifically, when using the builder's `client` object to start a job I actually got a different message back:

```plaintext
Grpc.Core.RpcException
  HResult=0x80131500
  Message=Status(StatusCode="PermissionDenied", Detail="Permission 'aiplatform.customJobs.create' denied on resource '//aiplatform.googleapis.com/projects/my_project_id/locations/europe-west1' (or it may not exist).")
  Source=Google.Api.Gax.Grpc

While the message was somewhat clear I wasn't sure if it was the right error message, like how Unimplemented didn't make sense so I dismissed this one too.

Anyway, since writing the question I thought that gcloud and SDK authentication may be different. It turns out that the active user in the command line (the * next to the user in gcloud auth list) is my own credential while the environment variable GOOGLE_APPLICATION_CREDENTIAL is referencing a service account. Once I added the role Vertex AI Administrator to the service account I was finally able to start a job.

So, use JobServiceClient.CreateAsync() if the sa behind GOOGLE_APPLICATION_CREDENTIAL has the right permission. If you need to use another sa then instantiate a JobServiceClient like so:

var client = await new JobServiceClientBuilder
{
	Endpoint = "europe-west1-aiplatform.googleapis.com",
	GoogleCredential = GoogleCredential.FromFile(@"your-service-account.json")
}.BuildAsync();

I know the latter is "standard GCP authentication" knowledge, it just didn't come to my mind immediately.

英文:

I should have digged a bit more around JobServiceClientBuilder. Specifically, when using the builder's client object to start a job I actually got a different message back:

Grpc.Core.RpcException
  HResult=0x80131500
  Message=Status(StatusCode=&quot;PermissionDenied&quot;, Detail=&quot;Permission &#39;aiplatform.customJobs.create&#39; denied on resource &#39;//aiplatform.googleapis.com/projects/my_project_id/locations/europe-west1&#39; (or it may not exist).&quot;)
  Source=Google.Api.Gax.Grpc

While the message was somewhat clear I wasn't sure if it was the right error message, like how Unimplemented didn't make sense so I dismissed this one too.

Anyway, since writing the question I thought that gcloud and SDK authentication may be different. It turns out that the active user in the command line (the * next to the user in gcloud auth list) is my own credential while the environment variable GOOGLE_APPLICATION_CREDENTIAL is referencing a service account. Once I added the role Vertex AI Administrator to the service account I was finally able to start a job.

So, use JobServiceClient.CreateAsync() if the sa behind GOOGLE_APPLICATION_CREDENTIAL has the right permission. If you need to use another sa then instantiate a JobServiceClient like so:

var client = await new JobServiceClientBuilder
{
	Endpoint = &quot;europe-west1-aiplatform.googleapis.com&quot;,
	GoogleCredential = GoogleCredential.FromFile(@&quot;your-service-account.json&quot;)
}.BuildAsync();

I know the latter is "standard GCP authentication" knowledge, it just didn't come to my mind immediately.

答案2

得分: 0

我不确定为什么为OP提供适当的权限会解决问题,但代码存在不同的问题,导致了模糊的错误消息。根据客户端库文档,Vertex AI需要使用区域化的终端节点,因此客户端应该被构建以反映将要访问的资源。在这种情况下,代码如下:

var client = await new JobServiceClientBuilder
{
    Endpoint = "europe-west1-aiplatform.googleapis.com"
}.BuildAsync();

... 显然,如果您在多个区域使用资源,您可以使用插值字符串文字使其动态化。但是您确实需要为不同的区域使用不同的客户端。

希望我们可以在将来减少这种模糊错误。

英文:

I'm not sure why giving appropriate permissions fixed the problem for the OP, but there's a different issue with the code which is causing the obscure error message. As per the client library documentation, Vertex AI requires regionalized endpoints, so the client should be constructed to reflect the resources that will be accessed. In this case, the code would be:

var client = await new JobServiceClientBuilder
{
    Endpoint = &quot;europe-west1-aiplatform.googleapis.com&quot;
}.BuildAsync();

... obviously you can make that dynamic by region using an interpolated string literal, if you're using resources in multiple regions. But you do need different clients for different regions.

I'm hoping we can make the error less obscure in the future.

huangapple
  • 本文由 发表于 2023年5月25日 19:20:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331719.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定