英文:
gRPC Unimplemented thrown when creating a custom job in Vertex AI using JobServiceClient
问题
I'm providing the translated code section:
我正在尝试实现代码以启动 Vertex 中的自定义作业。
我使用 `gcloud` 来启动自定义作业没有问题:
gcloud ai custom-jobs --project my_project_id create --region=europe-west1 --display-name="train model based on custom container" --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest
我没有找到官方的 .NET 代码示例,但尝试模仿 [其他人在 Python 中执行的方式][1],同时 ChatGPT 生成了类似的代码示例:
```csharp
var projectId = "my_project_id";
var locationId = "europe-west1";
var client = await JobServiceClient.CreateAsync();
var createCustomJobRequest = new CreateCustomJobRequest
{
ParentAsLocationName = new LocationName(projectId, locationId),
CustomJob = new CustomJob
{
DisplayName = "train model based on custom container",
JobSpec = new CustomJobSpec()
{
WorkerPoolSpecs =
{
new WorkerPoolSpec
{
MachineSpec = new MachineSpec
{
MachineType = "n1-standard-4"
},
ReplicaCount = 1,
ContainerSpec = new ContainerSpec()
{
ImageUri = "europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest"
}
}
}
}
}
};
var result3 = await client.CreateCustomJobAsync(createCustomJobRequest); // exception thrown here
不幸的是,我收到一个异常:
Grpc.Core.RpcException: 'Status(StatusCode="Unimplemented", Detail="Bad gRPC response. HTTP status code: 404")'
我尝试过但失败了的事情:
- 使用接受
CustomJob
和Parent
而不是CreateCustomJobRequest
对象的CreateCustomJobAsync()
方法的重载。 - 使用
JobServiceClientBuilder
而不是JobServiceClient.CreateAsync()
,并将Endpoint
参数设置为europe-west1-aiplatform.googleapis.com
。
我缺少什么,以便在 Vertex AI 中启动自定义作业?
Please note that this is a direct translation of the code portion you provided.
<details>
<summary>英文:</summary>
I'm trying to implement code to start a custom job in Vertex.
I have no problem starting a custom job using `gcloud`:
gcloud ai custom-jobs --project my_project_id create --region=europe-west1 --display-name="train model based on custom container" --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest
I've not been able to find official code sample for .NET but tried to mimick [someone else doing it in Python][1] plus ChatGPT produced a similar code sample:
var projectId = "my_project_id";
var locationId = "europe-west1";
var client = await JobServiceClient.CreateAsync();
var createCustomJobRequest = new CreateCustomJobRequest
{
ParentAsLocationName = new LocationName(projectId, locationId),
CustomJob = new CustomJob
{
DisplayName = "train model based on custom container",
JobSpec = new CustomJobSpec()
{
WorkerPoolSpecs =
{
new WorkerPoolSpec
{
MachineSpec = new MachineSpec
{
MachineType = "n1-standard-4"
},
ReplicaCount = 1,
ContainerSpec = new ContainerSpec()
{
ImageUri = "europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest"
}
}
}
}
}
};
var result3 = await client.CreateCustomJobAsync(createCustomJobRequest); // exception thrown here
Unfortunately, I get an exception back:
Grpc.Core.RpcException: 'Status(StatusCode="Unimplemented", Detail="Bad gRPC response. HTTP status code: 404")'
Things I've tried and failed
1) Used the overload of `CreateCustomJobAsync()` that takes a `CustomJob` and a `Parent` instead of a `CreateCustomJobRequest` object.
2) Used `JobServiceClientBuilder` instead of `JobServiceClient.CreateAsync()` and set the `Endpoint` argument as `europe-west1-aiplatform.googleapis.com`.
What am I missing to get a custom job started in Vertex AI?
[1]: https://stackoverflow.com/questions/73154492/vertex-ai-custom-job-cannot-launch-it-via-python
</details>
# 答案1
**得分**: 1
I should have digged a bit more around `JobServiceClientBuilder`. Specifically, when using the builder's `client` object to start a job I actually got a different message back:
```plaintext
Grpc.Core.RpcException
HResult=0x80131500
Message=Status(StatusCode="PermissionDenied", Detail="Permission 'aiplatform.customJobs.create' denied on resource '//aiplatform.googleapis.com/projects/my_project_id/locations/europe-west1' (or it may not exist).")
Source=Google.Api.Gax.Grpc
While the message was somewhat clear I wasn't sure if it was the right error message, like how Unimplemented
didn't make sense so I dismissed this one too.
Anyway, since writing the question I thought that gcloud
and SDK authentication may be different. It turns out that the active user in the command line (the *
next to the user in gcloud auth list
) is my own credential while the environment variable GOOGLE_APPLICATION_CREDENTIAL
is referencing a service account. Once I added the role Vertex AI Administrator
to the service account I was finally able to start a job.
So, use JobServiceClient.CreateAsync()
if the sa behind GOOGLE_APPLICATION_CREDENTIAL
has the right permission. If you need to use another sa then instantiate a JobServiceClient
like so:
var client = await new JobServiceClientBuilder
{
Endpoint = "europe-west1-aiplatform.googleapis.com",
GoogleCredential = GoogleCredential.FromFile(@"your-service-account.json")
}.BuildAsync();
I know the latter is "standard GCP authentication" knowledge, it just didn't come to my mind immediately.
英文:
I should have digged a bit more around JobServiceClientBuilder
. Specifically, when using the builder's client
object to start a job I actually got a different message back:
Grpc.Core.RpcException
HResult=0x80131500
Message=Status(StatusCode="PermissionDenied", Detail="Permission 'aiplatform.customJobs.create' denied on resource '//aiplatform.googleapis.com/projects/my_project_id/locations/europe-west1' (or it may not exist).")
Source=Google.Api.Gax.Grpc
While the message was somewhat clear I wasn't sure if it was the right error message, like how Unimplemented
didn't make sense so I dismissed this one too.
Anyway, since writing the question I thought that gcloud
and SDK authentication may be different. It turns out that the active user in the command line (the *
next to the user in gcloud auth list
) is my own credential while the environment variable GOOGLE_APPLICATION_CREDENTIAL
is referencing a service account. Once I added the role Vertex AI Administrator
to the service account I was finally able to start a job.
So, use JobServiceClient.CreateAsync()
if the sa behind GOOGLE_APPLICATION_CREDENTIAL
has the right permission. If you need to use another sa then instantiate a JobServiceClient
like so:
var client = await new JobServiceClientBuilder
{
Endpoint = "europe-west1-aiplatform.googleapis.com",
GoogleCredential = GoogleCredential.FromFile(@"your-service-account.json")
}.BuildAsync();
I know the latter is "standard GCP authentication" knowledge, it just didn't come to my mind immediately.
答案2
得分: 0
我不确定为什么为OP提供适当的权限会解决问题,但代码存在不同的问题,导致了模糊的错误消息。根据客户端库文档,Vertex AI需要使用区域化的终端节点,因此客户端应该被构建以反映将要访问的资源。在这种情况下,代码如下:
var client = await new JobServiceClientBuilder
{
Endpoint = "europe-west1-aiplatform.googleapis.com"
}.BuildAsync();
... 显然,如果您在多个区域使用资源,您可以使用插值字符串文字使其动态化。但是您确实需要为不同的区域使用不同的客户端。
希望我们可以在将来减少这种模糊错误。
英文:
I'm not sure why giving appropriate permissions fixed the problem for the OP, but there's a different issue with the code which is causing the obscure error message. As per the client library documentation, Vertex AI requires regionalized endpoints, so the client should be constructed to reflect the resources that will be accessed. In this case, the code would be:
var client = await new JobServiceClientBuilder
{
Endpoint = "europe-west1-aiplatform.googleapis.com"
}.BuildAsync();
... obviously you can make that dynamic by region using an interpolated string literal, if you're using resources in multiple regions. But you do need different clients for different regions.
I'm hoping we can make the error less obscure in the future.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论