ActiveMQ Pure Master / Slave - Out of sync
- by pico
What i have :
1 master broker and 1 slave broker both in ActiveMQ 5.4.0
What i use :
waitForSlave on master side and failover uri on slave side (in the master connector URI)
What i want to do :
I want to wait some delay (like 5 seconds) in case of a tiny network failures between master and slave before starting slave transpôrt connectors
So i put this in slave config :
<broker xmlns="http://activemq.apache.org/schema/core"
brokerName="slave"
dataDirectory="${activemq.base}/data"
useJmx="true"
persistent="true"
populateJMSXUserID="true"
masterConnectorURI="failover://(tcp://master:61616)?initialReconnectDelay=1000&maxReconnectDelay=30000"
shutdownOnMasterFailure="false"
advisorySupport="false">
It seems to work but after a network hang between master and slave, the slave reconnect successfully and then the master logs a lot of :
2010-10-18 17:08:44,421 | ERROR | Slave Failed | org.apache.activemq.broker.ft.MasterBroker | ActiveMQ Task
java.lang.IllegalStateException: Cannot lookup a connection that had not been registered: ID:master-1040-634226732611718750-0:0
at org.apache.activemq.broker.MapTransportConnectionStateRegister.lookupConnectionState(MapTransportConnectionStateRegister.java:93)
at org.apache.activemq.broker.TransportConnection.lookupConnectionState(TransportConnection.java:1412)
at org.apache.activemq.broker.TransportConnection.processRemoveConsumer(TransportConnection.java:561)
at org.apache.activemq.command.RemoveInfo.visit(RemoveInfo.java:76)
at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:309)
at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:185)
at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:69)
at org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:218)
at org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
at org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
On the slave side everything is fine.
So after that, i've tried to stop the master to see if the slave is capable of turning master after these "network hangs".
The master took long time to shutdown (10 seconds) and then some error message appears in slave logs :
2010-10-18 17:09:32,915 | WARN | Async error occurred: java.lang.IllegalStateException: Cannot lookup a connection that had not been registered: ID:master-1049-634226732657812500-0:3 | org.apache.activemq.broker.TransportConnection.Service | VMTransport: vm://slave#5
java.lang.IllegalStateException: Cannot lookup a connection that had not been registered: ID:master-1049-634226732657812500-0:3
at org.apache.activemq.broker.MapTransportConnectionStateRegister.lookupConnectionState(MapTransportConnectionStateRegister.java:93)
at org.apache.activemq.broker.TransportConnection.lookupConnectionState(TransportConnection.java:1412)
at org.apache.activemq.broker.TransportConnection.processRemoveSession(TransportConnection.java:600)
at org.apache.activemq.command.RemoveInfo.visit(RemoveInfo.java:74)
at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:309)
at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:185)
at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:69)
at org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:218)
at org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
at org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
Are they any ways to keep my kaha stores (they are individual stores) synchronised?
The main problem is that the slave never turn master after a master failure, it stay block on this message :
2010-10-18 17:09:33,681 | WARN | Transport (master/172.21.60.61:61616) failed to tcp://master:61616 , attempting to automatically reconnect due to: java.net.SocketException: Software caused connection abort: socket write error | org.apache.activemq.transport.failover.FailoverTransport | ActiveMQ Transport: tcp://master/172.21.60.61:61616
I'm totally stuck with these syncs problems, any help welcome!
Regards