GKE Volume Attach/mount error for regional persistent disk

huangapple go评论128阅读模式
英文:

GKE Volume Attach/mount error for regional persistent disk

问题

I understand that you'd like a translation of the provided technical content. Here's the translation:

"我遇到了一个volumeattach错误。我有一个地区性持久磁盘,它与我的地区性GKE集群在同一个GCP项目中。我的地区性集群位于europe-west2,节点在europe-west2-a、b和c。该地区性磁盘在europe-west2-b和c区域复制。

我有一个nfs服务器部署清单,引用了gcePersistantDisk。

这是我的部署清单:

(清单内容)

和我的pv/pvc清单:

(清单内容)

当我应用上面的部署清单时,我收到以下错误:

'rpc错误:code = 不可用,描述 = 由于退避条件,不允许在节点“projects/ap-mc-qa-xxx-xxxx/zones/europe-west2-a/instances/node-instance-id”上执行ControllerPublish操作'

卷附加告诉我:

'附加错误:消息:rpc错误:code = 未找到,描述 = ControllerPublishVolume找不到ID为projects/UNSPECIFIED/zones/UNSPECIFIED/disks/my-regional-disk-name的卷:googleapi:错误0:未找到'

这些清单在部署到分区集群/磁盘时似乎工作正常。我已经检查了一些事项,比如确保集群服务帐户具有必要的权限。磁盘当前未被使用。

我漏掉了什么?"

Please note that the technical details and code remain in English, as requested. If you need further assistance or have any specific questions about this technical issue, please feel free to ask.

英文:

I am struggling with a volumeattach error. I have a regional persistent disk which is in the same GCP project as my regional GKE cluster. My regional cluster is in europe-west2 with nodes in europe-west2-a, b and c. the regional disk is replicated across zones europe-west2-b and c.

I have a nfs-server deployment manifest which refers to the gcePersistantDisk.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations: []
  labels:
    app.kubernetes.io/managed-by: Helm
  name: nfs-server
  namespace: namespace
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      serviceAccountName: nfs-server 
      containers:
      - image: gcr.io/google_containers/volume-nfs:0.8
        imagePullPolicy: IfNotPresent
        name: nfs-server
        ports:
        - containerPort: 2049
          name: nfs
          protocol: TCP
        - containerPort: 20048
          name: mountd
          protocol: TCP
        - containerPort: 111
          name: rpcbind
          protocol: TCP
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data
          name: nfs-pvc
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - gcePersistentDisk:
          fsType: ext4
          pdName: my-regional-disk-name
        name: nfs-pvc
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution :
            nodeSelectorTerms: 
              - matchExpressions:
                - key: topology.gke.io/zone
                  operator: In
                  values:
                      - europe-west2-b
                      - europe-west2-c

and my pv/pvc

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 200Gi
  nfs:
    path: /
    server: nfs-server.namespace.svc.cluster.local
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
  name: nfs-pvc
  namespace: namespace
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 8Gi
  storageClassName: ""
  volumeMode: Filesystem
  volumeName: nfs-pv

When I apply my deployment manifest above I get the following error:

'rpc error: code = Unavailable desc = ControllerPublish not permitted on node "projects/ap-mc-qa-xxx-xxxx/zones/europe-west2-a/instances/node-instance-id" due to backoff condition'

The volume attachment tells me this:

Attach Error: Message:  rpc error: code = NotFound desc = ControllerPublishVolume could not find volume with ID projects/UNSPECIFIED/zones/UNSPECIFIED/disks/my-regional-disk-name: googleapi: Error 0: , notFound

These manifests seemed to work fine when it was deployed for a zonal cluster/disk. I've checked things like making sure the cluster svc acct has the necessary permissions. Disk is currently not in use.

What am I missing???

答案1

得分: 0

I think we should focus on the type of Nodes that make up your Kubernetes cluster.

Regional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.

如果使用区域性持久磁盘不是硬性要求,考虑使用非区域性持久磁盘存储类。如果使用区域性持久磁盘是硬性要求,考虑使用调度策略,例如污点和容忍,以确保需要区域性 PD 的 Pod 被调度到不是优化机器的节点池。

Link

英文:

I think we should focus on the type of Nodes that make up your Kubernetes cluster.

> Regional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.

> Consider using a non-regional persistent disk storage class if using a regional persistent disk is not a hard requirement. If using a regional persistent disk is a hard requirement, consider scheduling strategies such as taints and tolerations to ensure that the Pods that need regional PD are scheduled on a node pool that are not optimized machines.

https://cloud.google.com/kubernetes-engine/docs/troubleshooting#error_400_cannot_attach_repd_to_an_optimized_vm

答案2

得分: 0

I'll provide a translation for the code section you shared:

因此,上面的方法不可行的原因是,区域性持久磁盘功能允许创建在同一区域内的 2 个区域中可用的持久磁盘。为了使用该功能,卷必须作为 PersistentVolume 进行配置;不支持直接从 Pod 中引用卷。示例代码如下:

apiVersion: v1
kind: PersistentVolume
metadata:
 name: nfs-pv
spec:
 capacity:
   storage: 200Gi
 accessModes:
 - ReadWriteMany
 gcePersistentDisk:
   pdName: my-regional-disk
   fsType: ext4

现在尝试弄清楚如何重新配置 NFS 服务器以使用区域性磁盘。

英文:

So the reason that the above won't work is because a regional persistant disk feature allows the creation of persistent disks that are available in 2 zones within the same region. In order to use that feature, the volume must be provisioned as a PersistentVolume; referencing the volume directly from a pod is not supported. Something like this:

apiVersion: v1
kind: PersistentVolume
metadata:
 name: nfs-pv
spec:
 capacity:
   storage: 200Gi
 accessModes:
 - ReadWriteMany
 gcePersistentDisk:
   pdName: my-regional-disk
   fsType: ext4

Now trying to figure out how to re-configure the NFS sever to use a regional disk.

huangapple
  • 本文由 发表于 2023年4月6日 21:25:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950079.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定