英文:
How to Reconnect to all IBM Queue Managers in a cluster
问题
我们在各种微服务之间使用IBM MQ进行一些集成。这些应用程序非常关键,我们的目标是零停机时间。我们有三个队列管理器的集群,每个队列管理器都运行在不同的服务器上(在不同的AWS可用性区域),例如QM1运行在服务器1,QM2运行在服务器2,QM3运行在服务器3
。
我们配置了三个ConnectionFactory
如下(请注意连接名称列表的差异):
var connectionFactory1 = new MQQueueConnectionFactory();
connectionFactory1.setConnectionNameList("server1,server2,server3");
connectionFactory1.setPort(1414);
....
var connectionFactory2 = new MQQueueConnectionFactory();
connectionFactory2.setConnectionNameList("server2,server3,server1");
connectionFactory2.setPort(1414);
....
var connectionFactory3 = a MQQueueConnectionFactory();
connectionFactory3.setConnectionNameList("server3,server1,server2");
connectionFactory3.setPort(1414);
....
这个设置的想法是能够同时利用这三个队列管理器。第一个连接工厂将从QM1
中消费,第二个连接工厂将从QM2
中消费,依此类推。
不时地,运行IBM队列管理器的服务器需要打补丁并重新启动。当这种情况发生时,显然需要关闭队列管理器。当一个队列管理器关闭时,所有流量将通过集群中的其他两个队列管理器重定向,因此消息的流动永远不会停止。
当server1
关闭时:
connectionFactory1
切换到server2,server3
以从QM2
消费connectionFactory2
切换到server2,server3
以从QM2
消费connectionFactory3
切换到server3,server2
以从QM3
消费
在打补丁之后,我们重新启动 QM1。
我们遇到的问题是,这三个连接工厂保持以上的切换状态,而根本不重新连接到 QM1
。唯一能够恢复所需状态的方法是重新启动应用程序,但这并不是一个好的/可接受的解决方案。
在我们的客户端代码中,我们实现了一些弹性模式,以确定 QM1 何时重新启动,并重置 connectionFactory1
(包装在 MQQueueConnectionFactory
周围的 Spring CachingConnectionFactory
),以及停止和启动了所有为首选队列管理器消费的监听器容器,但这没有效果。唯一能够做到的方式是实际上重新启动 Spring 应用上下文,但这类似于实际重新启动应用程序。当您有许多这样的应用程序时,这确实不是一个好的解决方案。
我注意到 MQQueueConnectionFactory 有一个方法 setClientReconnectOptions(int options) throws javax.jms.JMSException
,但阅读该方法的注释对我来说并不很清楚是否可以用于我们想要的目的。
在此先为您的建议表示感谢。
英文:
We use IBM MQ for some integration between various micro services. The applications are quite critical and we aim for zero down times. We have a cluster of three Queue Managers each one running on a different server (on a separate AWS availability zones) say QM1 on sever1, QM2 on server2 and QM3 on server3
.
We configure three ConnectionFactory
like below (note the connection name list differences):
var connectionFactory1 = new MQQueueConnectionFactory();
connectionFactory1.setConnectionNameList("server1,server2,server3");
conectionFactory1.setPort(1414);
....
var connectionFactory2 = new MQQueueConnectionFactory();
connectionFactory2.setConnectionNameList("server2,server3,server1");
conectionFactory2.setPort(1414);
....
var connectionFactory3 = new MQQueueConnectionFactory();
connectionFactory3.setConnectionNameList("server3,server1,server2");
conectionFactory3.setPort(1414);
....
The idea behind this setup is to be able to utilize all the three Queue Managers at the same time. The first connection factory will consume from QM1
, the second connection factory will consume from QM2
, and so on.
From time to time the servers running IBM Queue Managers need to be patched and restarted. When this happens, obviously the queue managers needs to be shut down.
While a Queue Manager is down all the traffic is redirected through the other two Queue Managers in the cluster so the flow of the messages never stops.
While server1
is down:
connectionFactory1
switches toserver2,server3
so consuming fromQM2
connectionFactory2
switches toserver2,server3
so consuming fromQM2
connectionFactory3
switches toserver3,server2
so consuming fromQM3
After patching server1 we start QM1.
The issue we have is that the three connection factories stay switched as above without reconnecting to QM1
at all. The only one way we were able to restore the desired state was by restarting the application which is not really a good/acceptable solution.
In our client code we implemented some resiliency patterns to find out when the QM1 comes back up and reset connectionFactory1
(spring CachingConnectionFactory
wrapped around MQQueueConnectionFactory
) as well as stopping and started all listener containers consuming for that QM1 as prefered queue manager but this had no effect. The only way we could do it was to actually restart Spring Application Context but this is similar to actually restarting the application. And when you have many such applications this is really not a good solution.
I noticed that MQQueueConnectionFactory has a method setClientReconnectOptions(int options)
but reading the comment of that method did not make it very clear to me if that can be used for what we want.
throws javax.jms.JMSException
Thank you in advance for your inputs.
答案1
得分: 1
重新连接选项是用于在连接失败后重新建立连接的选项。它不会因为连接集合不平衡而影响重新建立连接。
有关可重新连接的客户端的更多信息,请参阅IBM文档中的可重新连接的客户端。
从应用程序内部解决这个问题并不容易,因为应用程序无法了解整个环境。这就是为什么IBM MQ现在具有一个称为Uniform Clusters with Application Rebalancing的功能,它正是做这个的。当您启动被重新启动的队列管理器时,集群(具有整套客户端连接应用程序的图片)注意到不平衡并告诉一些应用程序去其他地方。它利用了客户端应用程序根据上述选项重新连接的能力,但切换到另一个队列管理器的驱动程序来自当前连接的队列管理器,而不是由客户端确定。
Uniform Clusters功能和Application Balancing在IBM MQ V9.1.2中添加,并在随后的多个CD版本中进行了增强。因此,提供它的第一个LTS版本将是IBM MQ V9.2.0。
有关Uniform Clusters的更多信息,请参阅IBM文档中的关于Uniform Clusters。
英文:
Reconnect options are for re-making the connection after a failure. It will not affect re-making the connection just because the set of connections is unbalanced.
For more about reconnectable clients, see Reconnectable clients in the IBM Docs.
This is not an easy problem to solve from inside the application because it does not know about the whole environment. That is why IBM MQ now has a feature called Uniform Clusters with Application Rebalancing which does EXACTLY this. When you start up the recycled queue manager, the cluster (which has a picture of the whole set of client connected applications) notices the imbalance and tells some of the applications to go elsewhere. It utilises the client application ability to reconnect as per the above options, but the driver to move to another queue manager comes from the queue manager it is currently connected to, rather than being determined by the client.
The Uniform Clusters feature and Application Balancing were added in IBM MQ V9.1.2 and enhanced in several of the subsequent CD releases. The first LTS release to provide it would therefore be IBM MQ V9.2.0.
For more about Uniform Clusters, see About uniform clusters in the IBM Docs.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论