英文:
Can not pull container image to GKE Autopilot from private Artifact Registry even these in same project
问题
根据下面的文章,看起来我们可以在相同项目中从Artifact Registry拉取容器镜像到GKE,而无需任何额外的身份验证。
但是当我尝试时,遇到了ImagePullBackOff
错误。是否有任何错误?误解?还是我需要使用其他身份验证?
重现步骤:
在https://console.cloud.google.com的某个项目中使用Google Cloud Shell非常方便。
创建Artifact Registry
gcloud artifacts repositories create test \
--repository-format=docker \
--location=asia-northeast2
推送示例镜像
gcloud auth configure-docker asia-northeast2-docker.pkg.dev
docker pull nginx
docker tag nginx asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
docker push asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
创建GKE Autopilot集群
使用GUI控制台创建GKE Autopilot集群。
几乎所有选项都是默认的,但我更改了这两个选项。
- 将集群名称设置为test。
- 将区域设置为与Registry的区域相同(在此情况下为asia-northeast2)。
- 启用Anthos Service Mesh。
从Artifact Registry部署容器镜像到GKE
gcloud container clusters get-credentials test --zone asia-northeast2
kubectl run test --image asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
检查Pod状态
kubectl describe po test
然后,我得到了ImagePullBackOff
错误。
英文:
According to articles below, it seems we can pull container image to GKE from Artifact Registry without any additional authentication when these in same project.
https://cloud.google.com/artifact-registry/docs/integrate-gke
https://www.youtube.com/watch?v=BfS7mvPA-og
But when I try it, I faced ImagePullBackOff
error.
Is there any mistake? misunderstanding? Or should I need use another authentication?
Reproduce
It's convenient to use Google Cloud Shell in some project on https://console.cloud.google.com .
Create Artifact Registry
gcloud artifacts repositories create test \
--repository-format=docker \
--location=asia-northeast2
Push sample image
gcloud auth configure-docker asia-northeast2-docker.pkg.dev
docker pull nginx
docker tag nginx asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
docker push asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
Create GKE Autopilot cluster
Create GKE Autopilot cluster by using GUI console.
Almost all options is default but I changed these 2.
- Set cluster name as test.
- Set region same as registry's one. (In this case, asia-northeast2)
- Enabled Anthos Service Mesh.
Deploy container image to GKE from Artifact Registry
gcloud container clusters get-credentials test --zone asia-northeast2
kubectl run test --image asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
Check Pod state
kubectl describe po test
Name: test
Namespace: default
Priority: 0
Service Account: default
Node: xxxxxxxxxxxxxxxxxxx
Start Time: Wed, 08 Feb 2023 12:38:08 +0000
Labels: run=test
Annotations: autopilot.gke.io/resource-adjustment:
{"input":{"containers":[{"name":"test"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"reque...
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Pending
IP: 10.73.0.25
IPs:
IP: 10.73.0.25
Containers:
test:
Container ID:
Image: asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
Requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-szq85 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-szq85:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: kubernetes.io/arch=amd64:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s gke.io/optimize-utilization-scheduler Successfully assigned default/test to xxxxxxxxxxxxxxxxxxx
Normal Pulling 16s kubelet Pulling image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image"
Warning Failed 16s kubelet Failed to pull image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image": rpc error: code = Unknown desc = failed to pull and unpack image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image:latest": failed to resolve reference "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image:latest": failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden
Warning Failed 16s kubelet Error: ErrImagePull
Normal BackOff 15s kubelet Back-off pulling image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image"
Warning Failed 15s kubelet Error: ImagePullBackOff
then, I got ImagePullBackOff
.
答案1
得分: 3
这可能是因为 GKE Autopilot 服务帐号没有足够的权限访问 Artifact Registry。您可以通过将 roles/artifactregistry.reader
角色添加到配置为使用 GKE Autopilot 节点池的服务帐号来授予所需的权限。此外,您可能需要调整服务帐号的 IAM 权限,以便它能够访问私有 Artifact Registry。
gcloud artifacts repositories add-iam-policy-binding <repository-name> \
--location=<location> \
--member=serviceAccount:<nnn>-compute@developer.gserviceaccount.com \
--role="roles/artifactregistry.reader";
您可以尝试创建一个新的服务帐号并授予它拉取镜像所需的权限,然后尝试拉取镜像。
简单的故障排除步骤包括:
- 您应该确保您的 GKE 集群已配置为允许访问 Artifact Registry。您可以通过前往 GKE 仪表板并确保启用了“允许访问 Artifact Registry”选项来执行此操作。
- 您尝试拉取的容器镜像不存在于 Artifact Registry 中。您应该检查注册表,确保容器镜像已正确上传并可访问。
- 您可以查看错误日志以获取有关导致此问题的更多信息。此外,您可以查阅 GKE 文档以获取有关排除故障的更多信息。
英文:
This could be because the GKE Autopilot service account does not have the necessary permissions to access the Artifact Registry. You can grant the needed permissions by adding the roles/artifactregistry.reader
role to the service account that the GKE Autopilot node pool is configured to use. Additionally, you may need to adjust the IAM permissions for the service account so that it has access to the private Artifact Registry.
gcloud artifacts repositories add-iam-policy-binding <repository-name> \
--location=<location> \
--member=serviceAccount:<nnn>-compute@developer.gserviceaccount.com \
--role="roles/artifactregistry.reader"
Can you try creating a new service account and granting it the necessary permissions to pull the image and try to pull the image once.
Simple troubleshooting steps are:
- you should ensure that your GKE cluster is configured to allow access to the Artifact Registry. You can do this by going to the GKE dashboard and making sure that the “Allow access to Artifact Registry” option is enabled.
- The container image you are trying to pull does not exist in the Artifact Registry. You should check the registry to make sure that the container image is correctly uploaded and can be accessed.
- you can look into the error logs to get more information on what is causing this issue. Additionally, you can check the GKE documentation for more information on troubleshooting this issue.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论