GKE Volume Attach/mount error for regional persistent disk

huangapple go评论172阅读模式
英文:

GKE Volume Attach/mount error for regional persistent disk

问题

I understand that you'd like a translation of the provided technical content. Here's the translation:

"我遇到了一个volumeattach错误。我有一个地区性持久磁盘,它与我的地区性GKE集群在同一个GCP项目中。我的地区性集群位于europe-west2,节点在europe-west2-a、b和c。该地区性磁盘在europe-west2-b和c区域复制。

我有一个nfs服务器部署清单,引用了gcePersistantDisk。

这是我的部署清单:

(清单内容)

和我的pv/pvc清单:

(清单内容)

当我应用上面的部署清单时,我收到以下错误:

'rpc错误:code = 不可用,描述 = 由于退避条件,不允许在节点“projects/ap-mc-qa-xxx-xxxx/zones/europe-west2-a/instances/node-instance-id”上执行ControllerPublish操作'

卷附加告诉我:

'附加错误:消息:rpc错误:code = 未找到,描述 = ControllerPublishVolume找不到ID为projects/UNSPECIFIED/zones/UNSPECIFIED/disks/my-regional-disk-name的卷:googleapi:错误0:未找到'

这些清单在部署到分区集群/磁盘时似乎工作正常。我已经检查了一些事项,比如确保集群服务帐户具有必要的权限。磁盘当前未被使用。

我漏掉了什么?"

Please note that the technical details and code remain in English, as requested. If you need further assistance or have any specific questions about this technical issue, please feel free to ask.

英文:

I am struggling with a volumeattach error. I have a regional persistent disk which is in the same GCP project as my regional GKE cluster. My regional cluster is in europe-west2 with nodes in europe-west2-a, b and c. the regional disk is replicated across zones europe-west2-b and c.

I have a nfs-server deployment manifest which refers to the gcePersistantDisk.

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. annotations: []
  5. labels:
  6. app.kubernetes.io/managed-by: Helm
  7. name: nfs-server
  8. namespace: namespace
  9. spec:
  10. progressDeadlineSeconds: 600
  11. replicas: 1
  12. selector:
  13. matchLabels:
  14. role: nfs-server
  15. strategy:
  16. rollingUpdate:
  17. maxSurge: 25%
  18. maxUnavailable: 25%
  19. type: RollingUpdate
  20. template:
  21. metadata:
  22. labels:
  23. role: nfs-server
  24. spec:
  25. serviceAccountName: nfs-server
  26. containers:
  27. - image: gcr.io/google_containers/volume-nfs:0.8
  28. imagePullPolicy: IfNotPresent
  29. name: nfs-server
  30. ports:
  31. - containerPort: 2049
  32. name: nfs
  33. protocol: TCP
  34. - containerPort: 20048
  35. name: mountd
  36. protocol: TCP
  37. - containerPort: 111
  38. name: rpcbind
  39. protocol: TCP
  40. securityContext:
  41. privileged: true
  42. terminationMessagePath: /dev/termination-log
  43. terminationMessagePolicy: File
  44. volumeMounts:
  45. - mountPath: /data
  46. name: nfs-pvc
  47. dnsPolicy: ClusterFirst
  48. restartPolicy: Always
  49. schedulerName: default-scheduler
  50. securityContext: {}
  51. terminationGracePeriodSeconds: 30
  52. volumes:
  53. - gcePersistentDisk:
  54. fsType: ext4
  55. pdName: my-regional-disk-name
  56. name: nfs-pvc
  57. affinity:
  58. nodeAffinity:
  59. requiredDuringSchedulingIgnoredDuringExecution :
  60. nodeSelectorTerms:
  61. - matchExpressions:
  62. - key: topology.gke.io/zone
  63. operator: In
  64. values:
  65. - europe-west2-b
  66. - europe-west2-c

and my pv/pvc

  1. apiVersion: v1
  2. kind: PersistentVolume
  3. metadata:
  4. name: nfs-pv
  5. spec:
  6. accessModes:
  7. - ReadWriteMany
  8. capacity:
  9. storage: 200Gi
  10. nfs:
  11. path: /
  12. server: nfs-server.namespace.svc.cluster.local
  13. persistentVolumeReclaimPolicy: Retain
  14. volumeMode: Filesystem
  15. ---
  16. apiVersion: v1
  17. kind: PersistentVolumeClaim
  18. metadata:
  19. labels:
  20. app.kubernetes.io/managed-by: Helm
  21. name: nfs-pvc
  22. namespace: namespace
  23. spec:
  24. accessModes:
  25. - ReadWriteMany
  26. resources:
  27. requests:
  28. storage: 8Gi
  29. storageClassName: ""
  30. volumeMode: Filesystem
  31. volumeName: nfs-pv

When I apply my deployment manifest above I get the following error:

  1. 'rpc error: code = Unavailable desc = ControllerPublish not permitted on node "projects/ap-mc-qa-xxx-xxxx/zones/europe-west2-a/instances/node-instance-id" due to backoff condition'

The volume attachment tells me this:

  1. Attach Error: Message: rpc error: code = NotFound desc = ControllerPublishVolume could not find volume with ID projects/UNSPECIFIED/zones/UNSPECIFIED/disks/my-regional-disk-name: googleapi: Error 0: , notFound

These manifests seemed to work fine when it was deployed for a zonal cluster/disk. I've checked things like making sure the cluster svc acct has the necessary permissions. Disk is currently not in use.

What am I missing???

答案1

得分: 0

I think we should focus on the type of Nodes that make up your Kubernetes cluster.

Regional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.

如果使用区域性持久磁盘不是硬性要求,考虑使用非区域性持久磁盘存储类。如果使用区域性持久磁盘是硬性要求,考虑使用调度策略,例如污点和容忍,以确保需要区域性 PD 的 Pod 被调度到不是优化机器的节点池。

Link

英文:

I think we should focus on the type of Nodes that make up your Kubernetes cluster.

> Regional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.

> Consider using a non-regional persistent disk storage class if using a regional persistent disk is not a hard requirement. If using a regional persistent disk is a hard requirement, consider scheduling strategies such as taints and tolerations to ensure that the Pods that need regional PD are scheduled on a node pool that are not optimized machines.

https://cloud.google.com/kubernetes-engine/docs/troubleshooting#error_400_cannot_attach_repd_to_an_optimized_vm

答案2

得分: 0

I'll provide a translation for the code section you shared:

因此,上面的方法不可行的原因是,区域性持久磁盘功能允许创建在同一区域内的 2 个区域中可用的持久磁盘。为了使用该功能,卷必须作为 PersistentVolume 进行配置;不支持直接从 Pod 中引用卷。示例代码如下:

  1. apiVersion: v1
  2. kind: PersistentVolume
  3. metadata:
  4. name: nfs-pv
  5. spec:
  6. capacity:
  7. storage: 200Gi
  8. accessModes:
  9. - ReadWriteMany
  10. gcePersistentDisk:
  11. pdName: my-regional-disk
  12. fsType: ext4

现在尝试弄清楚如何重新配置 NFS 服务器以使用区域性磁盘。

英文:

So the reason that the above won't work is because a regional persistant disk feature allows the creation of persistent disks that are available in 2 zones within the same region. In order to use that feature, the volume must be provisioned as a PersistentVolume; referencing the volume directly from a pod is not supported. Something like this:

  1. apiVersion: v1
  2. kind: PersistentVolume
  3. metadata:
  4. name: nfs-pv
  5. spec:
  6. capacity:
  7. storage: 200Gi
  8. accessModes:
  9. - ReadWriteMany
  10. gcePersistentDisk:
  11. pdName: my-regional-disk
  12. fsType: ext4

Now trying to figure out how to re-configure the NFS sever to use a regional disk.

huangapple
  • 本文由 发表于 2023年4月6日 21:25:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950079.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定