Singleton seems to freeze or stop running when redis restart on zato 2.0,7


#1

Hello,

Over the weekend, my cluster’s singleton node seems to freeze or not running. All my scheduled jobs did not run. On the singleton logs, it stop updating on ALL the nodes until I restart a node and the node took over as the singleton node. However it detected that the previous singleton node is still alive. I had to kill both the nodes at the same time and let the rest of the other node to take over as singleton node.

I check my logs and noticed that redis server was being auto updated and reloaded. That was the time the singleton logs stop updating.

Had anyone had similar experience and able to resolve it? The cluster is still able to serve services via http call but only scheduled jobs not running.

2017-10-14 04:20:25,508 - INFO - 8920:Dummy-252766 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K05W9QJ05HC38CF9YDXNNM
M634W6], server id:[2], name:[server1-intra1]
2017-10-14 04:20:26,125 - INFO - 8920:Dummy-252768 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K05GCRHB5NDPR0MJYTX8HW
33C6N5], server id:[2], name:[server1-intra1]
2017-10-14 04:20:44,673 - INFO - 8920:Dummy-252769 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K06Z3C88Z799KKSFQP46RS
P3735W], server id:[2], name:[server1-intra1]
2017-10-14 04:20:55,505 - INFO - 8920:Dummy-252770 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K07JQT42C18DDCCE8CJ09J
V7KM6B], server id:[2], name:[server1-intra1]
2017-10-14 04:20:56,126 - INFO - 8920:Dummy-252772 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K07HNP04GHY6D3BCADY862
9VJ94H], server id:[2], name:[server1-intra1]
2017-10-14 04:21:14,675 - INFO - 8920:Dummy-252773 - zato_singleton:22 - Not becoming a cluster-wide singleton, cid:[K07SFZEZKXDFP8YSNQZHCR
FF3ZAA], server id:[2], name:[server1-intra1]
2017-10-16 09:25:51,347 - INFO - 17140:Thread-207 - zato_singleton:22 - Pickup notifier starting
2017-10-16 09:26:13,698 - INFO - 17140:Dummy-221 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:15,540 - INFO - 17140:Dummy-222 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:17,697 - INFO - 17141:Dummy-218 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[4], name:[server2-i
ntra2]
2017-10-16 09:26:43,043 - INFO - 17140:Dummy-230 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:43,083 - INFO - 17140:Dummy-232 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]
2017-10-16 09:26:47,295 - INFO - 17141:Dummy-225 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[4], name:[server2-i
ntra2]
2017-10-16 09:27:12,921 - INFO - 17141:Dummy-230 - zato_singleton:22 - Cluster-wide singleton keep-alive OK, server id:[2], name:[server1-i
ntra1]


#2

Is it 2.0.7 or 2.0.8?


#3

It’s 2.0.7. My production environment.


#4

Keith, are you saying that after an automatic backround Redis restart followed by a full restart of your Zato environment, the Zato servers will not run scheduled jobs?

I’m just not clear if the situation is resolved and you’re trying to find the root cause or if the scheduler jobs still don’t run?


#5

No.
What I meant was the zato singleton seems to freeze. I discovered it was due to my redis server that did an autoupdate on redis package and the service was reloaded/restarted (singleton logs stop updating on all nodes when this happened).

I had to restart the cluster in order to resolve it.

Yes, I am trying to find a solution for zato singleton to survive a redis server restart (like redis package update).

Thanks.


#6

Also. Anyway to prevent 2 singletons from running? Occasionally I noticed my cluster had 2 singletons alive in the singleton logs.


#7

2.0.8 has changed the way Redis connections are handled and can automatically re-establish connections after they are broken for any reason (package update including).

The relevant changes are here:

As a side note, Zato 3.0 will not have the notion of a singleton server at all - there is no need to now, previously, it was the only way to run AMQP and ZeroMQ under gevent and the scheduler took advantage of it but going forward, there will be no singleton.