英文:
Possibility of Split brain in ActiveMQ Artemis HA shared storage?
问题
在Artemis HA共享存储部署中,出现拆分脑的可能性是什么?ActiveMQ Artemis 2.17.0被部署为具有共享存储的HA主/备份,在AWS EFS中。在artemis.log
中检查的特定日志语句有哪些?
主集群配置
<connectors>
<connector name="artemis">tcp://<master_ip>:61616</connector>
<connector name="discovery-connector">tcp://<slave_ip>:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="artemis_cluster_configuration">
<connector-ref>artemis</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>discovery-connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
备份集群配置
<connectors>
<connector name="artemis">tcp://<slave_ip>:61616</connector>
<connector name="discovery-connector">tcp://<master_ip>:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="artemis_cluster_configuration">
<connector-ref>artemis</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>discovery-connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<shared-store>
<slave>
<failover-on-shutdown>true</failover-on-shutdown>
<allow-failback>true</allow-failback>
</slave>
</shared-store>
</ha-policy>
以上是您提供的Artemis HA共享存储部署的主集群和备份集群配置信息。请告知如果您需要进一步的信息或翻译。
英文:
What are the possibilities of split brain in Artemis HA shared storage deployment? ActiveMQ Artemis 2.17.0 is deployed as HA active/passive with shared storage in AWS EFS. Any specific log statements to check in artemis.log
?
master cluster configuration
<connectors>
<connector name="artemis">tcp://<master_ip>:61616</connector>
<connector name="discovery-connector">tcp://<slave_ip>:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="artemis_cluster_configuration">
<connector-ref>artemis</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>discovery-connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
slave cluster configuration
<connectors>
<connector name="artemis">tcp://<slave_ip>:61616</connector>
<connector name="discovery-connector">tcp://<master_ip>:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="artemis_cluster_configuration">
<connector-ref>artemis</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>discovery-connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<shared-store>
<slave>
<failover-on-shutdown>true</failover-on-shutdown>
<allow-failback>true</allow-failback>
</slave>
</shared-store>
</ha-policy>
答案1
得分: 0
通常情况下,共享存储对抗拆分脑是具有弹性的。我相信与共享存储和拆分脑相关的唯一问题,自2.17.0以来已经修复,是ARTEMIS-4143,它涉及到主代理从共享存储断开连接,然后在备份已经变为活动状态之后重新连接的情况。
如果您在broker.xml
中使用discovery-group
,那么如果遇到拆分脑,您可能会看到一个带有AMQ212034
代码的WARN
日志消息,其内容如下:
There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID={}
尽管如此,我不确定AWS EFS的锁定语义。ActiveMQ Artemis共享存储是设计运行在支持独占文件锁(例如NFSv4)的SAN或NAS文件系统上的。如果AWS EFS支持这一点,那么应该是可以的。否则,它将无法正常工作,并且两个代理很可能会同时处于活动状态(即遇到拆分脑)。
英文:
Generally speaking, shared storage is resilient against split-brain. I believe the only issue related to shared storage and split-brain which has been fixed since 2.17.0 is ARTEMIS-4143 which deals with the primary broker becoming disconnected from the shared storage and then reconnecting after the backup has already become active.
If you are using a discovery-group
in your broker.xml
then if you encounter split-brain you'll likely see a WARN
log message with a code of AMQ212034
that says:
There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID={}
That said, I'm not certain about the locking semantics of AWS EFS. ActiveMQ Artemis shared storage was designed to run on a SAN or NAS filesystem that supports exclusive file locks (e.g. NFSv4). If AWS EFS supports that then it should be fine. Otherwise it won't work properly and both brokers are likely to be active simultaneously (i.e. encounter split-brain).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论