2023年8月8日 21:00:32go评论111阅读模式

英文:

Multi cluster CockroachDB with Cilium Cluster Mesh

问题

我正在尝试启用一个多集群的CockroachDB，使用Cilium Cluster Mesh连接3个k8s集群。关于多集群CockroachDB的想法在cockroachlabs.com - 1和2中有描述。鉴于该文章要求对CoreDNS ConfigMap进行更改，而不是使用Cilium的全局服务，感觉不太理想。

因此，问题是如何在Cilium Cluster Mesh环境中启用多集群CockroachDB，使用Cilium的全局服务而不是修改CoreDNS ConfigMap？

使用helm安装CockroachDB时，它会部署一个StatefulSet，其中包含精心编制的--join参数。它包含了CockroachDB pod的FQDN，这些pod将加入集群。

Pod的FQDN来自于service.discovery，该服务使用clusterIP: None创建，并且

(...) 仅用于为StatefulSet中的每个pod创建DNS条目，以便它们可以解析彼此的IP地址。

发现服务会自动注册StatefulSet中所有pod的DNS条目，以便它们可以轻松引用。

是否可以为在远程集群上运行的StatefulSet创建类似的发现服务或替代服务？这样，在启用集群网格的情况下，集群Α中的pod X、Y、Z可以通过它们的FQDN访问集群Β中的pod J、K、L吗？

如create-service-per-pod-in-statefulset中所建议的，可以创建类似以下的服务：

{{- range $i, $_ := until 3 -}}
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: &quot;true&quot;
    io.cilium/global-service: &#39;true&#39;
    service.cilium.io/affinity: &quot;remote&quot;
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  name: dbs-cockroachdb-remote-{{ $i }}
  namespace: dbs
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
    statefulset.kubernetes.io/pod-name: cockroachdb-{{ $i }}
  type: ClusterIP
  clusterIP: None
  publishNotReadyAddresses: true
---
kind: Service
apiVersion: v1
metadata:
  name: dbs-cockroachdb-public-remote-{{ $i }}
  namespace: dbs
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  annotations:
    io.cilium/global-service: &#39;true&#39;
    service.cilium.io/affinity: &quot;remote&quot;
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
{{- end -}}

这样它们就类似于原始的service.discovery和service.public。

然而，尽管存在Cilium的注释

io.cilium/global-service: &#39;true&#39;
service.cilium.io/affinity: &quot;remote&quot;

服务似乎仍然绑定到本地的k8s集群，导致CockroachDB由3个节点而不是6个节点组成（集群A中的3个节点+集群B中的3个节点）。

无论我在我的--join命令覆盖中使用哪个服务（dbs-cockroachdb-public-remote-X或dbs-cockroachdb-remote-X），结果都是相同的，只有3个节点而不是6个。

有什么想法吗？

英文:

I am trying to enable a multi cluster CockroachDB spawning 3 k8s clusters connected with Cilium Cluster Mesh. The idea of having a multi cluster CockroachDB is described on cockroachlabs.com - 1, 2. Given the fact that the article calls for a change in CoreDNS ConfigMap, instead of using Cilium global-services feels suboptimal.

Therefore the question arises, how to enable a multi cluster CockroachDB in a Cilium Cluster Mesh environment, using Cilium global services instead of hacking CoreDNS ConfigMap ?

With CockroachDB installed via helm, it deploys a StatefulSet with a carefully crafted --join parameter. It contains FQDNs of CockroachDB pods that are to join the cluster.

The pod FQDNs come from service.discover that is created with clusterIP: None and

> (...) only exists to create DNS entries for each pod in the StatefulSet such that they can resolve each other's IP addresses.

The discovery service automatically registers DNS entries for all pods within the StatefulSet, so that they can be easily referenced

Can a similar discovery service or alternative be created for a StatefulSet running on a remote cluster ? So that with cluster mesh enabled, pods J,K,L in cluster Β could be reached from pods X,Y,Z in cluster Α by their FQDN ?

As suggested in create-service-per-pod-in-statefulset, one could create services like

{{- range $i, $_ := until 3 -}}
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: &quot;true&quot;
    io.cilium/global-service: &#39;true&#39;
    service.cilium.io/affinity: &quot;remote&quot;
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  name: dbs-cockroachdb-remote-{{ $i }}
  namespace: dbs
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
    statefulset.kubernetes.io/pod-name: cockroachdb-{{ $i }}
  type: ClusterIP
  clusterIP: None
  publishNotReadyAddresses: true
---
kind: Service
apiVersion: v1
metadata:
  name: dbs-cockroachdb-public-remote-{{ $i }}
  namespace: dbs
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  annotations:
    io.cilium/global-service: &#39;true&#39;
    service.cilium.io/affinity: &quot;remote&quot;
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
{{- end -}}

So that they resemble the original service.discovery and service.public

However, despite the presence of cilium annotations

io.cilium/global-service: &#39;true&#39;
service.cilium.io/affinity: &quot;remote&quot;

services look bound to the local k8s cluster, resulting in CockroachDB consisting of 3 instead of 6 nodes. (3 in cluster A + 3 in cluster B)

hubble:

CockroachDB:

It does not matter how which service (dbs-cockroachdb-public-remote-X, or dbs-cockroachdb-remote-X) I use in my --join command overwrite

    join:
      - dbs-cockroachdb-0.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-1.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-2.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-public-remote-0.dbs:26257
      - dbs-cockroachdb-public-remote-1.dbs:26257
      - dbs-cockroachdb-public-remote-2.dbs:26257

The result is the same, 3 nodes instead of 6.

Any ideas?

答案1

得分: 2

显然，由于7070的原因，修补CoreDNS ConfigMap是我们可以做的最合理的事情。在该错误的评论中，提到了一篇文章，提供了额外的上下文信息。

我对这个故事的改变是，我使用kubernetes插件配置更新了配置映射：

apiVersion: v1
data:
  Corefile: |- 
    saturn.local {
      log
      errors
      kubernetes saturn.local {
        endpoint https://[ENDPOINT]
        kubeconfig [PATH_TO_KUBECONFIG]
      }
    }
    rhea.local {
      ...

这样我就可以解析其他名称了。
在我的设置中，每个集群都有自己的domain.local。PATH_TO_KUBECONFIG是一个普通的kubeconfig文件。必须在kube-system命名空间中创建通用密钥，并且必须在coredns部署下挂载密钥卷。

英文:

Apparently due to 7070, patching CoreDNS ConfigMap is the most reasonable thing we can do. In the comments of that bug, an article is mentioned, that provides additional context.

My twist to this story is that I updated the config map with kubernetes plugin config:

apiVersion: v1
data:
  Corefile: |-
    saturn.local {
      log
      errors
      kubernetes saturn.local {
        endpoint https://[ENDPOINT]
        kubeconfig [PATH_TO_KUBECONFIG]
      }
    }
    rhea.local {
      ...

So that I could resolve other names as well.
In my setup, each cluster has its own domain.local. PATH_TO_KUBECONFIG is a plane kubeconfig file. Generic secret has to be created in kube-system namespace and the secret volume has to be mounted under coredns deployment.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

多集群CockroachDB与Cilium集群网格

问题

答案1

如何在Golang中使用Kubernetes服务账户？

Prometheus告警规则未显示

如何验证一个HTTP请求是否来自内部定时任务？

Nginx在Digital Ocean Kubernetes环境中出现400 Bad Request错误。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。