Hello,
Over the weekend, my cluster’s singleton node seems to freeze or not running. All my scheduled jobs did not run. On the singleton logs, it stop updating on ALL the nodes until I restart a node and the node took over as the singleton node. However it detected that the previous singleton node is still alive. I had to kill both the nodes at the same time and let the rest of the other node to take over as singleton node.
I check my logs and noticed that redis server was being auto updated and reloaded. That was the time the singleton logs stop updating.
Had anyone had similar experience and able to resolve it? The cluster is still able to serve services via http call but only scheduled jobs not running.
2017-10-14 04:20:25,508 - INFO - 8920:Dummy-252766 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K05W9QJ05HC38CF9YDXNNM
M634W6], server id:[2], name:[server1-intra1]
2017-10-14 04:20:26,125 - INFO - 8920:Dummy-252768 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K05GCRHB5NDPR0MJYTX8HW
33C6N5], server id:[2], name:[server1-intra1]
2017-10-14 04:20:44,673 - INFO - 8920:Dummy-252769 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K06Z3C88Z799KKSFQP46RS
P3735W], server id:[2], name:[server1-intra1]
2017-10-14 04:20:55,505 - INFO - 8920:Dummy-252770 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K07JQT42C18DDCCE8CJ09J
V7KM6B], server id:[2], name:[server1-intra1]
2017-10-14 04:20:56,126 - INFO - 8920:Dummy-252772 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K07HNP04GHY6D3BCADY862
9VJ94H], server id:[2], name:[server1-intra1]
2017-10-14 04:21:14,675 - INFO - 8920:Dummy-252773 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K07SFZEZKXDFP8YSNQZHCR
FF3ZAA], server id:[2], name:[server1-intra1]
2017-10-16 09:25:51,347 - INFO - 17140:Thread-207 - zato_singleton:22 - Pickup notifier starting
2017-10-16 09:26:13,698 - INFO - 17140:Dummy-221 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:15,540 - INFO - 17140:Dummy-222 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:17,697 - INFO - 17141:Dummy-218 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[4], name:[server2-i
ntra2]
2017-10-16 09:26:43,043 - INFO - 17140:Dummy-230 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:43,083 - INFO - 17140:Dummy-232 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:47,295 - INFO - 17141:Dummy-225 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[4], name:[server2-i
ntra2]
2017-10-16 09:27:12,921 - INFO - 17141:Dummy-230 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]