Cassandra新手关于集群的问题

huangapple go评论53阅读模式
英文:

Cassandra newbie question regarding clusters

问题

阅读关于Cassandra的POC。将会使用Java和Spring与之交互。目前有一件事情不太清楚。它是一种点对点架构。假设我有3个节点,1.1.1.1、1.1.1.2和1.1.1.3。我明白Cassandra会将数据分布到这3个节点并进行复制等操作。

在Spring中,对于Datastax Cassandra驱动程序以及cqlsh...由于数据中心/集群中没有领导者...在cqlsh中称之为服务器,在Spring Datastax驱动程序中称之为联系点。

我应该将这3个IP都作为联系点吗?还是只选择一个?如果我有10,000个容器,它们都连接到.1,那可能会负载过重,对吗?如果我有1,000个节点,我不可能要把所有1000个都作为联系点吧?

只是想看看应该如何连接到集群。所有的文档和教程似乎都是针对单个服务器的。你会觉得这应该是基本信息哈哈...

英文:

Reading up on Cassandra for a POC. Will be hitting it with Java / Spring. One thing isn't clear at this point. It's a peer to peer architecture. So lets say I have 3 nodes. 1.1.1.1, 1.1.1.2 and 1.1.1.3. I get that Cassandra will distribute the data across all 3 node and do its replication thing, etc.

In Spring, for the Datastax Cassandra driver as well as cqlsh... since there is no leader in a data-center/cluster... In cqlsh they call it a server, in the Spring Datastax driver, they call it a contact point.

Do I put all 3 IPs as contact points? Or do I just pick one? If I have 10,000 containers and they all connect to .1, that'll probably kill that box, no? What if I have 1,000 nodes. I can't possibly have to put all 1000 as contact points?

Just trying to see how you are supposed to connect to the cluster. All the docs and tutorials seem to be aimed at a single server. You'd think this would be basic information lol...

答案1

得分: 4

快速答案是肯定的。您可以选择一个(因为所有Cassandra节点都是相等的,没有主/从,没有主/次),两个或全部三个。

联系点没有什么特殊之处。驱动程序使用联系点作为在初始连接期间“联系”群集的一种方式,可以说是入口点。

联系点是Cassandra集群中节点的地址,驱动程序在初始化阶段使用它们来发现集群拓扑。一旦驱动程序完成初始连接,它将了解集群中所有其他节点,包括它们属于哪个机架和数据中心(拓扑)。一旦连接,驱动程序还将监听拓扑更改,检测节点何时被添加或下线。

到这个时候,您可能已经发现只需要一个联系点,因为一旦连接到集群,驱动程序将获得其他节点的地址。但一般建议至少有2个联系点,这样如果第一个联系点由于任何原因不可用,驱动程序可以联系另一个联系点。

再次强调,驱动程序仅在初始化阶段使用联系点,然后启动应用程序。这并不意味着驱动程序将专门将所有请求路由到这些联系点。驱动程序将负载均衡/路由请求到集群中的所有节点。干杯!

英文:

The quick answer is yes. You can pick one (any node will do since all Cassandra nodes are equal -- no master/slave, no primary/secondary), two or all three.

There isn't anything special about the contact points. The driver uses the contact points as a way of "contacting" the cluster, an entry point if you will, during the initial connection.

Contact points are addresses of nodes in the Cassandra cluster that a driver uses to discover the cluster topology during the initialisation phase. Once the driver done the initial connection, it will know about all the other nodes in the cluster including which racks and DCs they belong to (topology). Once connected, the driver will also be listening for topology changes, detect when nodes are added or decommissioned.

By this time, you would have already worked out that only one contact point is required since the driver will get the addresses of the other nodes once it is connected to the cluster. But general recommendation is to have at least 2 contact points such that if the first contact point is unavailable for whatever reason, the driver can contact another contact point.

To reiterate, the driver only uses the contact points during the initialisation phase then you start your application. It does not mean the driver will exclusively route all requests to those CPs only. The driver will load balance/route requests to all nodes in the cluster. Cheers!

huangapple
  • 本文由 发表于 2020年7月31日 04:19:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/63180796.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定