Load balance failed after reboot OS

Hi everyone
I just encontered a bizarre situation, since I’m newbie to zato, I complete following the tutorial in part1 and everything was right, after I reboot computer and try to repeat the part1 for excersie, then the page show “Could not fetch the load balancer’s configuration”, I checked the load-balance logs, it shows something like this:

2017-01-24 08:21:48,882 - e[1;31mERRORe[0m - 2323:MainThread - root:130 - Traceback (most recent call last):
  File "/opt/zato/2.0.7/code/zato-agent/src/zato/agent/load_balancer/server.py", line 128, in _dispatch
    return SSLServer._dispatch(self, method, params)
  File "/usr/lib64/python2.7/SimpleXMLRPCServer.py", line 420, in _dispatch
    return func(*params)
  File "/opt/zato/2.0.7/code/zato-agent/src/zato/agent/load_balancer/server.py", line 256, in _lb_agent_get_servers_state
    for access_type, server_name, state in self._show_stat():
  File "/opt/zato/2.0.7/code/zato-agent/src/zato/agent/load_balancer/server.py", line 207, in _show_stat
    stat = self.haproxy_stats.execute('show stat')
  File "/opt/zato/2.0.7/code/zato-agent/src/zato/agent/load_balancer/haproxy_stats.py", line 40, in execute
    client.connect(self.socket_name)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

In the log of server, it outputs something like:
Traceback (most recent call last):
File “/opt/zato/2.0.7/code/eggs/gunicorn-18.0-py2.7.egg/gunicorn/arbiter.py”, line 494, in spawn_worker
self.cfg.post_fork(self, worker)
File “/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py”, line 864, in post_fork
ParallelServer.start_server(worker.app.zato_wsgi_app, arbiter.zato_deployment_key)
File “/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py”, line 810, in start_server
is_first, locally_deployed = parallel_server._after_init_common(server, zato_deployment_key)
File “/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py”, line 373, in _after_init_common
is_first, locally_deployed = self.maybe_on_first_worker(server, self.kvdb.conn, deployment_key)
File “/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py”, line 269, in maybe_on_first_worker
with Lock(lock_name, self.deployment_lock_expires, self.deployment_lock_timeout, redis_conn):
File “/opt/zato/2.0.7/code/eggs/retools-0.3-py2.7.egg/retools/lock.py”, line 73, in enter
raise LockTimeout(“Timeout while waiting for lock”)

initally I though maybe I did some setting cause the problem, then I get a clean OS and repeat it , it started well first time and later failed again after reboot OS, so can someone tell me what does “Connection refused” indicate or just me found a bug in there?

Hi Rafal

I used /path/./zato-qs-start.sh to start the service, unfortunately in the tutorial part1, there is details of restart or stop the service. so I guess that “zato-qs-restart.sh”, “zato-qs-stop.sh” are used for restart and stop, but the stop does not seems effectly as the start program always report that “pidfile” files are existed among server1,server2, “web-admin”, load-balance randomly. so i create a script to remove “rm -f /opt/env/server1/pidfile /opt/env/server2/pidfile /opt/env/web-admin/pidfile /opt/env/load-balancer/zato-lb-agent.pid” after I run zato-qs-stop.sh, and follow “lsof -i:17011,8183,17010” to ensure all ports are closed properly. is that right to restart all service?

zhaozhao