Updating Zato


#1

Hi everyone!

When in the need of updating Zato with new sources, one is supposed to follow the instructions in the document Installing updates to Zato installation. Thas is:

  1. Pull updates from GIT.
  2. Install the new code via Pip.
  3. Stop all Zato components.
  4. Start all Zato components (when starting the servers, adding --sync-internal).

Now I’ve been struggling to find the exact list of commands, and in which order, one has to execute to stop all components and to start all components again. I have not been able to find them in the Architecture overview, Installing and configuring Zato components, Attack surface or the Command line interface.

So far I’ve been stopping them from top to bottom, then starting from top to bottom again, but I am pretty sure I have been forgetting about one or two components, as I get the pidfile already exists and the port already being used errors very often.

When can I find the exact steps, in order? Or which would them be?

Thanks in advance.


#2

Thanks, I have clarified the documentation in that regards:

https://zato.io/docs/admin/guide/install/update.html


#3

Thanks for the update, @dsuch. Two things:

  1. I believe there is a mistake in the docs. Where it says _ Next, regardless of the OS_:
$ cd ./code
$ ./bin/pip install -e ./zato-cy

I think that it should say:

$ cd ../code
$ ./bin/pip install -e ./zato-cy
  1. In the first step, the load balancer, after issuing the stop command, then the start command, I am getting the Address localhost:20151 already taken exception.
zato@18c22e518363:~$ zato stop /opt/zato/env/qs-1/load-balancer
Load-balancer `/opt/zato/env/qs-1/load-balancer` shutting down
zato@18c22e518363:~$ zato start /opt/zato/env/qs-1/load-balancer
Exception: (u"Load balancer's agent check failed. Address `localhost:20151` already taken.",) (Hint: re-run with --verbose for full traceback)
zato@18c22e518363:~$ zato stop /opt/zato/env/qs-1/load-balancer
No pidfile found in `/opt/zato/env/qs-1/load-balancer/pidfile`
zato@18c22e518363:~$ zato start /opt/zato/env/qs-1/load-balancer
Exception: (u"Load balancer's agent check failed. Address `localhost:20151` already taken.",) (Hint: re-run with --verbose for full traceback)
zato@18c22e518363:~$ zato start /opt/zato/env/qs-1/load-balancer --verbose
Traceback (most recent call last):
  File "/opt/zato/current/bin/zato", line 11, in <module>
    load_entry_point('zato-cli', 'console_scripts', 'zato')()
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/zato_command.py", line 379, in main
    return run_command(get_parser().parse_args())
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/__init__.py", line 356, in run_command
    command_class[args.command](args).run(args)
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/__init__.py", line 632, in run
    return_code = self.execute(args)
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/__init__.py", line 909, in execute
    return self._get_dispatch()[json_data['component']](args)
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/start.py", line 104, in _on_lb
    self.run_check_config()
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/start.py", line 55, in run_check_config
    'secret_key': self.args.secret_key,
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/__init__.py", line 909, in execute
    return self._get_dispatch()[json_data['component']](args)
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/check_config.py", line 254, in _on_lb
    self.ensure_json_config_port_free('Load balancer\'s agent', None, lba_conf)
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/check_config.py", line 59, in ensure_json_config_port_free
    self.ensure_port_free(prefix, conf['port'], address)
  File "/opt/zato/3.0/code/zato-cli/src/zato/cli/check_config.py", line 52, in ensure_port_free
    raise Exception('{} check failed. Address `{}` already taken.'.format(prefix, address))
Exception: Load balancer's agent check failed. Address `localhost:20151` already taken.

This is on Ubuntu 18.04 using Docker. Any hints?

Next thing I did was try with server1 and got the same results:

zato@18c22e518363:~$ zato stop /opt/zato/env/qs-1/server1
Server `/opt/zato/env/qs-1/server1` shutting down
zato@18c22e518363:~$ zato start /opt/zato/env/qs-1/server1 --sync-internal
Exception: (u'Cannot proceed, found pidfile `/opt/zato/env/qs-1/server1/pidfile`',) (Hint: re-run with --verbose for full traceback)

After some more attemps, I managed to get the OK:

zato@18c22e518363:~$ zato start /opt/zato/env/qs-1/server1 --sync-internal
Exception: (u'Cannot proceed, found pidfile `/opt/zato/env/qs-1/server1/pidfile`',) (Hint: re-run with --verbose for full traceback)
zato@18c22e518363:~$ zato stop /opt/zato/env/qs-1/server1
Server `/opt/zato/env/qs-1/server1` shutting down
zato@18c22e518363:~$ zato start /opt/zato/env/qs-1/server1 --sync-internal
OK

I noticed this on the log:

2018-09-26 13:00:30,402 DEBG 'zato-server2' stdout output:
2018-09-26 13:00:30,402 - INFO - 571:MainThread - gunicorn.main:176 - Handling signal: term

2018-09-26 13:00:31,507 DEBG 'zato-server2' stdout output:
2018-09-26 13:00:31,507 - INFO - 571:MainThread - gunicorn.main:176 - Shutting down: Master

2018-09-26 13:00:32,123 DEBG 'zato-server2' stderr output:
No handlers could be found for logger "zato.common.util.proc"

2018-09-26 13:00:32,205 DEBG fd 19 closed, stopped monitoring <POutputDispatcher at 140441392098120 for <Subprocess at 140441392004848 with name zato-server2 in state RUNNING> (stdout)>
2018-09-26 13:00:32,205 DEBG fd 22 closed, stopped monitoring <POutputDispatcher at 140441392097184 for <Subprocess at 140441392004848 with name zato-server2 in state RUNNING> (stderr)>
2018-09-26 13:00:32,205 INFO exited: zato-server2 (exit status 18; not expected)
2018-09-26 13:00:32,205 DEBG received SIGCLD indicating a child quit
2018-09-26 13:00:33,208 INFO spawned: 'zato-server2' with pid 604
2018-09-26 13:00:34,210 INFO success: zato-server2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-09-26 13:00:36,886 DEBG 'zato-server2' stdout output:
2018-09-26 13:00:36,885 - INFO - 615:MainThread - gunicorn.main:176 - Starting gunicorn 18.0

2018-09-26 13:00:36,888 DEBG 'zato-server2' stdout output:
2018-09-26 13:00:36,888 - INFO - 615:MainThread - gunicorn.main:176 - Listening at: http://0.0.0.0:17011 (615)

2018-09-26 13:00:36,889 DEBG 'zato-server2' stdout output:
2018-09-26 13:00:36,888 - INFO - 615:MainThread - gunicorn.main:176 - Using worker: gevent

2018-09-26 13:00:36,895 DEBG 'zato-server2' stdout output:
2018-09-26 13:00:36,893 - INFO - 627:MainThread - gunicorn.main:176 - Booting worker with pid: 627

2018-09-26 13:00:37,394 DEBG 'zato-server2' stdout output:
2018-09-26 13:00:37,394 - INFO - 627:MainThread - zato.server.base.parallel:422 - Preferred address of `server2@quickstart-236325` (pid: 627) is `http://172.17.0.2:17011`

It seems the process is being respawned so quickly that I don’t have the time to issue the start command with the --sync-internal parameter.

Should we be stopping some sort of overlord or watcher before? Is it because of the error the load balancer produced right before?

Thanks.


#4

Thanks for spotting the typo.

As for restarting the components - zato stop should cleanly stop everything and the errors from servers are not related to the one from load-balancer.

I’m not sure what application logs this information, but when you were stopping server2, there was this entry:

No handlers could be found for logger "zato.common.util.proc"

This indicates that there was an error on a level low enough for logging to be unable to leave any details of it.

Usually, there will be a file in /tmp with stdout and stderr of each process, for instance /tmp/tmpcpMQIq-zato-start-server.txt - can you check what is inside of it? There should be some details about why zato stop did not succeed.


#5

I repeated the steps and found the file you mentioned:

root@18c22e518363:~# cat /tmp/tmp8YhdGK-zato-start-server.txt
No handlers could be found for logger "bzr"
Traceback (most recent call last):
  File "/opt/zato/3.0/code/local/lib/python2.7/site-packages/gevent/greenlet.py", line 534, in run
    result = self._run(*self.args, **self.kwargs)
  File "/opt/zato/3.0/code/zato-distlock/src/zato/distlock/__init__.py", line 179, in _wait_in_greenlet
    self.release()
  File "/opt/zato/3.0/code/zato-distlock/src/zato/distlock/__init__.py", line 274, in release
    unlock(self.tmp_file)
  File "/opt/zato/3.0/code/local/lib/python2.7/site-packages/portalocker/portalocker.py", line 130, in posix_unlock
    fcntl.flock(file_.fileno(), fcntl.LOCK_UN)
ValueError: I/O operation on closed file
<Greenlet at 0x7f85204414b0: <bound method FCNTLLock._wait_in_greenlet of <zato.distlock.FCNTLLock object at 0x7f8520fcfb10>>> failed with ValueError

Traceback (most recent call last):
  File "/opt/zato/3.0/code/local/lib/python2.7/site-packages/gevent/greenlet.py", line 534, in run
    result = self._run(*self.args, **self.kwargs)
  File "/opt/zato/3.0/code/zato-distlock/src/zato/distlock/__init__.py", line 179, in _wait_in_greenlet
    self.release()
  File "/opt/zato/3.0/code/zato-distlock/src/zato/distlock/__init__.py", line 274, in release
    unlock(self.tmp_file)
  File "/opt/zato/3.0/code/local/lib/python2.7/site-packages/portalocker/portalocker.py", line 130, in posix_unlock
    fcntl.flock(file_.fileno(), fcntl.LOCK_UN)
ValueError: I/O operation on closed file
<Greenlet at 0x7f85202fe910: <bound method FCNTLLock._wait_in_greenlet of <zato.distlock.FCNTLLock object at 0x7f852010cbd0>>> failed with ValueError

I hope it helps.


#6

This may be a quirk of Docker, perhaps depending on what filesystem it uses. I cannot reproduce it anywhere but I pushed a commit to make sure that no I/O is performed on closed files - can you check it now?


#7

I updated Zato as per the documented process and tried to stop/start services. Both right after updating it and even restarting the Docker container, I got the same message:

$ zato stop /opt/zato/env/qs-1/web-admin
Web admin `/opt/zato/env/qs-1/web-admin` shutting down

Zato’s log:

2018-09-27 13:23:34,168 DEBG 'zato-web-admin' stderr output:
No handlers could be found for logger "zato.common.util.proc"

2018-09-27 13:23:34,248 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 139939210637256 for <Subprocess at 139939210637184 with name zato-web-admin in state RUNNING> (stdout)>
2018-09-27 13:23:34,248 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 139939210731960 for <Subprocess at 139939210637184 with name zato-web-admin in state RUNNING> (stderr)>
2018-09-27 13:23:34,248 INFO exited: zato-web-admin (exit status 18; not expected)
2018-09-27 13:23:34,249 DEBG received SIGCLD indicating a child quit
2018-09-27 13:23:35,251 INFO spawned: 'zato-web-admin' with pid 261
2018-09-27 13:23:36,253 INFO success: zato-web-admin entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

It seems that Zato is respawning the process almost immediately. Is this an expected behaviour (bug vs. feature)? Something to do with Docker?

$ ps xua
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4504   708 pts/0    Ss+  13:21   0:00 /bin/sh -c /usr/bin/supervisord -c /opt/zato/supervisord.conf
root         8  0.1  0.2  55804 18212 pts/0    S+   13:21   0:00 /usr/bin/python /usr/bin/supervisord -c /opt/zato/supervisord.conf
zato        12  0.1  0.0  37224  4180 pts/0    Sl   13:21   0:00 redis-server *:6379
zato        14  0.0  0.0  18028  2756 pts/0    S    13:21   0:00 /bin/bash /opt/zato/zato_start_server2
root        16  0.0  0.0  65508  5336 pts/0    S    13:21   0:00 /usr/sbin/sshd -D
zato        18  0.0  0.0  18028  2744 pts/0    S    13:21   0:00 /bin/bash /opt/zato/zato_start_scheduler
zato        20  0.0  0.0  18028  2796 pts/0    S    13:21   0:00 /bin/bash /opt/zato/zato_start_load_balancer
zato        25  1.3  1.1 447208 97328 pts/0    S    13:21   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/server2 --fg
zato        29  1.3  1.1 441568 95952 pts/0    S    13:21   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/load-balancer --fg
zato        66  2.1  1.7 645240 142528 pts/0   Sl   13:21   0:02 gunicorn: master [gunicorn]
zato        80  1.4  1.2 396900 98208 pts/0    S    13:21   0:01 /opt/zato/3.0/code/bin/python -m zato.agent.load_balancer.main /opt/zato/env/qs-1/load-balancer/config/repo sync_internal=FalseZATO_ZATO_ZATOsecre
root       105  0.0  0.0  90444  6664 ?        Rs   13:21   0:00 sshd: root@pts/1
root       114  0.0  0.0  24908  5124 pts/1    Ss   13:21   0:00 -bash
zato       128  0.0  0.0  29376  3160 ?        Rs   13:21   0:00 haproxy -D -f /opt/zato/env/qs-1/load-balancer/config/repo/zato.config -p /opt/zato/env/qs-1/load-balancer/pidfile
zato       130 15.0  2.7 1099804 225988 pts/0  Sl   13:21   0:17 gunicorn: worker [gunicorn]
zato       131  0.0  0.0  18028  2772 pts/0    S    13:21   0:00 /bin/bash /opt/zato/zato_start_server1
zato       133  1.4  1.1 447204 97340 pts/0    S    13:21   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/server1 --fg
postgres   161  0.0  0.2 297232 24444 ?        S    13:21   0:00 /usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
postgres   164  0.0  0.0 297232  4012 ?        Ss   13:21   0:00 postgres: checkpointer process   
postgres   165  0.0  0.0 297232  5684 ?        Ss   13:21   0:00 postgres: writer process   
postgres   166  0.0  0.0 297232  4012 ?        Ss   13:21   0:00 postgres: wal writer process   
postgres   167  0.0  0.0 297668  6744 ?        Ss   13:21   0:00 postgres: autovacuum launcher process   
postgres   168  0.0  0.0 152344  4212 ?        Ss   13:21   0:00 postgres: stats collector process   
zato       176  2.1  1.7 645252 142648 pts/0   Sl   13:21   0:02 gunicorn: master [gunicorn]
zato       198 15.5  2.7 1097872 223224 pts/0  Sl   13:21   0:17 gunicorn: worker [gunicorn]
root       199  0.0  0.0  52448  3580 pts/1    S    13:21   0:00 su - zato
zato       200  0.0  0.0  24960  4984 pts/1    S    13:21   0:00 -su
zato       216  1.8  1.1 447228 97300 pts/0    S    13:22   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/scheduler --fg
zato       225  4.4  1.3 451896 108344 pts/0   Sl   13:22   0:03 /opt/zato/3.0/code/bin/python -m zato.scheduler.main sync_internal=FalseZATO_ZATO_ZATOsecret_key=ZATO_ZATO_ZATOfg=True
postgres   236  0.0  0.1 298192 10452 ?        Ss   13:22   0:00 postgres: genesis genesis 127.0.0.1(51830) idle
postgres   243  0.0  0.1 298200  9512 ?        Ss   13:22   0:00 postgres: genesis genesis 127.0.0.1(51848) idle
zato       261  0.0  0.0  18028  2704 pts/0    S    13:23   0:00 /bin/bash /opt/zato/zato_start_web_admin
zato       263 12.9  1.1 447084 97228 pts/0    S    13:23   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/web-admin --fg
zato       272 18.6  1.4 426200 116972 pts/0   Sl   13:23   0:02 /opt/zato/3.0/code/bin/python -m zato.admin.main sync_internal=FalseZATO_ZATO_ZATOsecret_key=ZATO_ZATO_ZATOfg=True
zato       282  0.0  0.0  39880  3352 pts/1    R+   13:23   0:00 ps xua

The same happens with server1:

zato stop /opt/zato/env/qs-1/server1
Server `/opt/zato/env/qs-1/server1` shutting down

This is Zato’s log file:

2018-09-27 13:29:37,137 DEBG 'zato-server1' stdout output:
2018-09-27 13:29:37,137 - INFO - 176:MainThread - gunicorn.main:176 - Handling signal: term

2018-09-27 13:29:37,649 DEBG 'zato-server1' stdout output:
2018-09-27 13:29:37,648 - INFO - 198:MainThread - gunicorn.main:176 - Worker exiting (pid: 198)

2018-09-27 13:29:37,840 DEBG 'zato-server1' stdout output:
2018-09-27 13:29:37,840 - INFO - 176:MainThread - gunicorn.main:176 - Shutting down: Master

2018-09-27 13:29:38,451 DEBG 'zato-server1' stderr output:
No handlers could be found for logger "zato.common.util.proc"

2018-09-27 13:29:38,532 DEBG fd 15 closed, stopped monitoring <POutputDispatcher at 139939210733688 for <Subprocess at 139939210636896 with name zato-server1 in state RUNNING> (stdout)>
2018-09-27 13:29:38,532 DEBG fd 21 closed, stopped monitoring <POutputDispatcher at 139939210733544 for <Subprocess at 139939210636896 with name zato-server1 in state RUNNING> (stderr)>
2018-09-27 13:29:38,533 INFO exited: zato-server1 (exit status 18; not expected)
2018-09-27 13:29:38,533 DEBG received SIGCLD indicating a child quit
2018-09-27 13:29:39,535 INFO spawned: 'zato-server1' with pid 306
2018-09-27 13:29:40,537 INFO success: zato-server1 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

And this is the output of ps:

$ ps xua |grep server
zato        12  0.1  0.0  37224  4356 pts/0    Sl   13:21   0:00 redis-server *:6379
zato        14  0.0  0.0  18028  2756 pts/0    S    13:21   0:00 /bin/bash /opt/zato/zato_start_server2
zato        25  0.2  1.1 447208 97328 pts/0    S    13:21   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/server2 --fg
zato       306  0.0  0.0  18028  2664 pts/0    S    13:29   0:00 /bin/bash /opt/zato/zato_start_server1
zato       308  1.8  1.1 447204 97592 pts/0    S    13:29   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/server1 --fg
zato       342  0.0  0.0  16740   888 pts/1    S+   13:31   0:00 grep --color=auto server

Just incidentally, in case this had something to do with Docker, when updating Zato, would restarting the Docker container be enough or would I have to restart the servers with --sync-internal anyway? I ask this because I can perfectly work with restarting Docker to keep testing… :slight_smile:

Thanks.


#8

There is no built-in functionality to respawn processes after zato stop was used so it is not Zato doing it. When you run zato stop it simply stops and to start again you need to run zato start explicitly.

I see this entry in logs INFO spawned: 'zato-server1' with pid 306 - this does not come from Zato, and I am not sure, is it Docker that somehow does not wait for Zato processes to close, instead respawning them immediately?

As for --sync-internal - it needs to be used with servers each time the source code is updated, this is for sure. As for restarting the container - you would need to try it out, I would be interested in knowing the answer myself.


#9

That is very good to know. For some reason I may have thought that there was some sort of watcher or overlord process taking the role of parent. :slight_smile:

I have been playing a little with the kill command both in the console outside of Docker and inside it (via SSH and via docker exec) and yes, the processes are being respawned automagically:

From outside:

$ ps xua|grep web
jsabater  3082  0.0  0.0  18028  2704 pts/0    S    20:59   0:00 /bin/bash /opt/zato/zato_start_web_admin
jsabater  3101  0.2  1.1 447164 96344 pts/0    S    20:59   0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/web-admin --fg
jsabater  4562  0.0  0.0  15648  1004 pts/2    S+   21:13   0:00 grep --color=auto web
$ kill -TERM 3101

Produces this output on Zato’s log:

2018-09-27 19:13:13,345 DEBG 'zato-web-admin' stderr output:
/opt/zato/zato_start_web_admin: line 6:    27 Terminated              /opt/zato/current/bin/zato start /opt/zato/env/qs-1/web-admin --fg

2018-09-27 19:13:13,346 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 139830924894648 for <Subprocess at 139830924803968 with name zato-web-admin in state RUNNING> (stderr)>
2018-09-27 19:13:13,346 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 139830924804040 for <Subprocess at 139830924803968 with name zato-web-admin in state RUNNING> (stdout)>
2018-09-27 19:13:13,346 INFO exited: zato-web-admin (exit status 143; not expected)
2018-09-27 19:13:13,346 DEBG received SIGCLD indicating a child quit
2018-09-27 19:13:14,349 INFO spawned: 'zato-web-admin' with pid 264
2018-09-27 19:13:15,351 INFO success: zato-web-admin entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-09-27 19:13:15,786 DEBG 'zato-web-admin' stdout output:
Exception: (u'Web admin check failed. Address `0.0.0.0:8183` already taken.',) (Hint: re-run with --verbose for full traceback)

So it definitely looks like something related to Docker. I have been searching for information related to docker execand docker top and found some posts with information that seemed related, but not really the same. The whole run one process/concern per container and problems I’ve read about when managing containers with multiple processes seem to be the logic explanation, but no proof so far.

I’ll reinstall the whole container and see if it keeps happening, or an installation without Docker, after I’m done with some more prototyping.

How can I tell the difference? What should I look for to know sync happened and it’s okay to just restart the Docker container after git-pulling?


#10

There is a file called config/repo/internal-cache.dat for each server, this is where the internal cache is kept on disk. After you use –sync-internal, the file is deleted, which in turn triggers its recreation.

Hence, I expect that timestamps will be different after --sync-internal is used.


#11

Very good to know. Would deleting the file manually, while the server is stopped, serve the same purpose? I take it it’s only deleted and recreated when starting with the --sync-internal parameter…


#12

I believe the same typo can also be found at the installation instructions on Ubuntu :slight_smile:


#13

The details of this flag are not part of the documented API or tools because they may change any time - feel free to look it up in the source code but it really is an internal detail.


#14

I belive I have found the root of the issue. It seems that Docker and supervisord, although recommended, when handling containers with multiple services, do not play well together under certain configurations.

I’ve seen this error on the logs when attempting to quickly stop and start any of the servers:

2018-10-09 18:42:37,405 INFO success: zato-server1 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-09 18:42:37,791 DEBG 'zato-server1' stdout output:
Exception: (u'Server check failed. Address `0.0.0.0:17010` already taken.',) (Hint: re-run with --verbose for full traceback)

2018-10-09 18:42:37,865 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 139922836315600 for <Subprocess at 139922836680648 with name zato-server1 in state RUNNING> (stdout)>
2018-10-09 18:42:37,865 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 139922836364176 for <Subprocess at 139922836680648 with name zato-server1 in state RUNNING> (stderr)>
2018-10-09 18:42:37,865 INFO exited: zato-server1 (exit status 16; not expected)
2018-10-09 18:42:37,865 DEBG received SIGCLD indicating a child quit
2018-10-09 18:42:38,868 INFO spawned: 'zato-server1' with pid 350

I have found information regarding the error I’ve seen on the logs at Stack Overflow here and here.

I’ve never used supervisord before but, from a quick read, it seems the key might be to prevent it from restarting the services it’s supervising and just use it to start them.


#15

That is a nice suggestion, I have just pushed a change that makes supervisord not restart processes automatically (autorestart = false). Can you try it out? Thank you.


#16

I take it you refer to the changes in the supervisord.conf file on the zato-build project on Github. Since I don’t know how to apply those changes without reinstalling the Docker container, I just added the lines (autorestart = false) myself to my existing installation (/opt/zato/supervisord.conf).

The first time I restarted the container with the abovementioned changes, for some reason I found both servers were stopped. Starting them worked fine.

Then I restarted (stop and start again) the container and I found both servers started.

Then I installed Zato updates, as per the usual protocol.

Stopping the web admin panel worked fine and I could start it afterwards:

2018-10-14 20:42:23,765 DEBG 'zato-web-admin' stderr output:
No handlers could be found for logger "zato.common.util.proc"

2018-10-14 20:42:23,845 DEBG fd 9 closed, stopped monitoring <POutputDispatcher at 140589916403328 for <Subprocess at 140589916402464 with name zato-web-admin in state RUNNING> (stdout)>
2018-10-14 20:42:23,845 DEBG fd 14 closed, stopped monitoring <POutputDispatcher at 140589916403832 for <Subprocess at 140589916402464 with name zato-web-admin in state RUNNING> (stderr)>
2018-10-14 20:42:23,845 INFO exited: zato-web-admin (exit status 18; not expected)
2018-10-14 20:42:23,845 DEBG received SIGCLD indicating a child quit

Stopping the server1 worked fine and I could start it afterwards with --sync-internal:

2018-10-14 20:43:01,279 DEBG 'zato-server1' stdout output:
2018-10-14 20:43:01,279 - INFO - 70:MainThread - gunicorn.main:176 - Handling signal: term

2018-10-14 20:43:01,330 DEBG 'zato-server1' stdout output:
2018-10-14 20:43:01,330 - INFO - 117:MainThread - gunicorn.main:176 - Worker exiting (pid: 117)

2018-10-14 20:43:01,481 DEBG 'zato-server1' stdout output:
2018-10-14 20:43:01,480 - INFO - 70:MainThread - gunicorn.main:176 - Shutting down: Master

2018-10-14 20:43:02,099 DEBG 'zato-server1' stderr output:
No handlers could be found for logger "zato.common.util.proc"

2018-10-14 20:43:02,180 DEBG fd 20 closed, stopped monitoring <POutputDispatcher at 140589916405056 for <Subprocess at 140589916402176 with name zato-server1 in state RUNNING> (stdout)>
2018-10-14 20:43:02,180 DEBG fd 24 closed, stopped monitoring <POutputDispatcher at 140589916405560 for <Subprocess at 140589916402176 with name zato-server1 in state RUNNING> (stderr)>
2018-10-14 20:43:02,180 INFO exited: zato-server1 (exit status 18; not expected)
2018-10-14 20:43:02,180 DEBG received SIGCLD indicating a child quit

Idem with server2:

2018-10-14 20:44:00,744 DEBG 'zato-server2' stdout output:
2018-10-14 20:44:00,743 - INFO - 74:MainThread - gunicorn.main:176 - Handling signal: term

2018-10-14 20:44:01,206 DEBG 'zato-server2' stdout output:
2018-10-14 20:44:01,205 - INFO - 119:MainThread - gunicorn.main:176 - Worker exiting (pid: 119)

2018-10-14 20:44:01,346 DEBG 'zato-server2' stdout output:
2018-10-14 20:44:01,346 - INFO - 74:MainThread - gunicorn.main:176 - Shutting down: Master

2018-10-14 20:44:01,969 DEBG 'zato-server2' stderr output:
No handlers could be found for logger "zato.common.util.proc"

2018-10-14 20:44:02,048 DEBG fd 25 closed, stopped monitoring <POutputDispatcher at 140589916405344 for <Subprocess at 140589916402320 with name zato-server2 in state RUNNING> (stdout)>
2018-10-14 20:44:02,048 DEBG fd 29 closed, stopped monitoring <POutputDispatcher at 140589916463832 for <Subprocess at 140589916402320 with name zato-server2 in state RUNNING> (stderr)>
2018-10-14 20:44:02,048 INFO exited: zato-server2 (exit status 18; not expected)
2018-10-14 20:44:02,048 DEBG received SIGCLD indicating a child quit

Stopping and starting the load balancer did not produce any output on the log in the console but I got this error message when trying to start it:

Exception: (u"Load balancer's agent check failed. Address `localhost:20151` already taken.",) (Hint: re-run with --verbose for full traceback)

Finally, the scheduler restarting seemed to work fine:

2018-10-14 20:46:59,897 DEBG 'zato-scheduler' stdout output:
OK

2018-10-14 20:46:59,976 DEBG fd 35 closed, stopped monitoring <POutputDispatcher at 140589916465056 for <Subprocess at 140589916402032 with name zato-scheduler in state RUNNING> (stdout)>
2018-10-14 20:46:59,976 DEBG fd 39 closed, stopped monitoring <POutputDispatcher at 140589916465560 for <Subprocess at 140589916402032 with name zato-scheduler in state RUNNING> (stderr)>
2018-10-14 20:46:59,977 INFO exited: zato-scheduler (exit status 0; expected)
2018-10-14 20:46:59,977 DEBG received SIGCLD indicating a child quit

#17

It looks everything works correctly though I will also have to check the ‘No handlers …’ message. It likely does not hurt but still, there is no reason to keep it as is. Thanks @jsabater.


#18

I will be making a new installation on my laptop in a matter of days, so I’ll be able to test it from scratch and provide further feedback. Please let me know if you require any other tests. Glad to help :+1:


#20

Sure, it will be great to hear how it goes, thanks for your assistance.


#21

I have been detecting that many times either of the servers is down after starting the container. Sometimes it’s server1 (most commonly), sometimes it’s server2. Sometimes both are up. I’ve never seen both down upon start-up.

So I am going to leave here a number of cases and see if, when put together, something makes sense.

jsabater@kasumi:~$ docker start --interactive 511a3594b3dc
2018-10-17 07:59:30,224 CRIT Set uid to user 0
2018-10-17 07:59:30,233 INFO supervisord started with pid 7
2018-10-17 07:59:31,235 INFO spawned: 'postgresql' with pid 10
2018-10-17 07:59:31,238 INFO spawned: 'zato-web-admin' with pid 11
2018-10-17 07:59:31,239 INFO spawned: 'redis' with pid 12
2018-10-17 07:59:31,240 INFO spawned: 'zato-server1' with pid 13
2018-10-17 07:59:31,241 INFO spawned: 'zato-server2' with pid 15
2018-10-17 07:59:31,242 INFO spawned: 'ssh' with pid 17
2018-10-17 07:59:31,244 INFO spawned: 'zato-scheduler' with pid 18
2018-10-17 07:59:31,245 INFO spawned: 'zato-load-balancer' with pid 19
2018-10-17 07:59:31,248 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 140619366388944 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stdout)>
2018-10-17 07:59:31,248 DEBG 'postgresql' stderr output:
postgresql: unrecognized service

2018-10-17 07:59:31,248 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 140619366388584 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stderr)>
2018-10-17 07:59:31,248 INFO exited: postgresql (exit status 1; not expected)
2018-10-17 07:59:31,249 DEBG received SIGCLD indicating a child quit
2018-10-17 07:59:31,253 DEBG 'redis' stdout output:
12:C 17 Oct 07:59:31.252 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf

2018-10-17 07:59:31,254 DEBG 'redis' stdout output:
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.0.6 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 12
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

12:M 17 Oct 07:59:31.254 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
12:M 17 Oct 07:59:31.254 # Server started, Redis version 3.0.6
12:M 17 Oct 07:59:31.254 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
12:M 17 Oct 07:59:31.254 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

2018-10-17 07:59:31,264 DEBG 'redis' stdout output:
12:M 17 Oct 07:59:31.264 * DB loaded from disk: 0.010 seconds
12:M 17 Oct 07:59:31.264 * The server is now ready to accept connections on port 6379

2018-10-17 07:59:32,266 INFO spawned: 'postgresql' with pid 44
2018-10-17 07:59:32,267 INFO success: zato-web-admin entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-17 07:59:32,267 INFO success: redis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-17 07:59:32,267 INFO success: zato-server1 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-17 07:59:32,267 INFO success: zato-server2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-17 07:59:32,267 INFO success: ssh entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-17 07:59:32,267 INFO success: zato-scheduler entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-17 07:59:32,267 INFO success: zato-load-balancer entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-10-17 07:59:32,294 DEBG 'postgresql' stderr output:
postgresql: unrecognized service

2018-10-17 07:59:32,294 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140619366388584 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stderr)>
2018-10-17 07:59:32,294 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 140619366388800 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stdout)>
2018-10-17 07:59:32,294 INFO exited: postgresql (exit status 1; not expected)
2018-10-17 07:59:32,294 DEBG received SIGCLD indicating a child quit
2018-10-17 07:59:34,307 INFO spawned: 'postgresql' with pid 98
2018-10-17 07:59:34,342 DEBG 'postgresql' stderr output:
postgresql: unrecognized service

2018-10-17 07:59:34,342 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140619366388584 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stderr)>
2018-10-17 07:59:34,342 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 140619366389088 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stdout)>
2018-10-17 07:59:34,342 INFO exited: postgresql (exit status 1; not expected)
2018-10-17 07:59:34,342 DEBG received SIGCLD indicating a child quit
2018-10-17 07:59:35,642 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,642 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_add_remove_server of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [add_remove_server]

2018-10-17 07:59:35,643 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,643 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_execute_command of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [execute_command]

2018-10-17 07:59:35,643 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,643 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_get_config of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [get_config]

2018-10-17 07:59:35,643 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,643 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_get_config_source_code of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [get_config_source_code]

2018-10-17 07:59:35,643 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,643 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_get_server_data_dict of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [get_server_data_dict]

2018-10-17 07:59:35,644 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,643 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_get_servers_state of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [get_servers_state]

2018-10-17 07:59:35,644 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,644 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_get_uptime_info of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [get_uptime_info]

2018-10-17 07:59:35,644 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,644 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_get_work_config of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [get_work_config]

2018-10-17 07:59:35,644 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,644 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_haproxy_version_info of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [haproxy_version_info]

2018-10-17 07:59:35,644 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,644 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_is_haproxy_alive of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [is_haproxy_alive]

2018-10-17 07:59:35,645 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,644 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_ping of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [ping]

2018-10-17 07:59:35,645 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,645 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_rename_server of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [rename_server]

2018-10-17 07:59:35,645 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,645 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_validate_save of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [validate_save]

2018-10-17 07:59:35,645 DEBG 'zato-load-balancer' stdout output:
2018-10-17 07:59:35,645 - INFO - 85:MainThread - root:114 - Registering [<bound method LoadBalancerAgent._lb_agent_validate_save_source_code of <zato.agent.load_balancer.server.LoadBalancerAgent object at 0x7fbc6c334e50>>] under public name [validate_save_source_code]

2018-10-17 07:59:36,166 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:36,166 - INFO - 70:MainThread - gunicorn.main:176 - Starting gunicorn 18.0

2018-10-17 07:59:36,170 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:36,169 - INFO - 70:MainThread - gunicorn.main:176 - Listening at: http://0.0.0.0:17011 (70)

2018-10-17 07:59:36,170 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:36,170 - INFO - 70:MainThread - gunicorn.main:176 - Using worker: gevent

2018-10-17 07:59:36,175 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:36,174 - INFO - 132:MainThread - gunicorn.main:176 - Booting worker with pid: 132

2018-10-17 07:59:36,199 DEBG 'zato-server1' stderr output:
No handlers could be found for logger "zato.common.util.proc"

2018-10-17 07:59:36,277 DEBG fd 20 closed, stopped monitoring <POutputDispatcher at 140619366391104 for <Subprocess at 140619366388224 with name zato-server1 in state RUNNING> (stdout)>
2018-10-17 07:59:36,277 DEBG fd 24 closed, stopped monitoring <POutputDispatcher at 140619366391608 for <Subprocess at 140619366388224 with name zato-server1 in state RUNNING> (stderr)>
2018-10-17 07:59:36,277 INFO exited: zato-server1 (exit status 18; not expected)
2018-10-17 07:59:36,277 DEBG received SIGCLD indicating a child quit
2018-10-17 07:59:36,679 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:36,679 - INFO - 132:MainThread - zato.server.base.parallel:421 - Preferred address of `server2@quickstart-1000480` (pid: 132) is `http://172.17.0.1:17011`

2018-10-17 07:59:37,681 INFO spawned: 'postgresql' with pid 148
2018-10-17 07:59:37,687 DEBG 'postgresql' stderr output:
postgresql: unrecognized service

2018-10-17 07:59:37,687 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140619366391752 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stderr)>
2018-10-17 07:59:37,687 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 140619366391960 for <Subprocess at 140619366735528 with name postgresql in state STARTING> (stdout)>
2018-10-17 07:59:37,687 INFO exited: postgresql (exit status 1; not expected)
2018-10-17 07:59:37,687 DEBG received SIGCLD indicating a child quit
2018-10-17 07:59:38,688 INFO gave up: postgresql entered FATAL state, too many start retries too quickly
2018-10-17 07:59:59,225 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,224 - INFO - 132:MainThread - zato.server.base.parallel:187 - Found extra services to deploy: booking.cancel, booking.cancel, booking.create, booking.create, booking.delete, booking.delete, booking.get, booking.get, booking.list, booking.list, booking.locate, booking.locate, booking.restore, booking.restore, booking.update, booking.update, guest.bookings, guest.bookings, guest.create, guest.create, guest.delete, guest.delete, guest.get, guest.get, guest.list, guest.list, guest.restore, guest.restore, guest.update, guest.update, login.create, login.create, login.delete, login.delete, login.get, login.get, login.list, login.list, login.update, login.update, login.validate, login.validate, rate.create, rate.create, rate.delete, rate.delete, rate.get, rate.get, rate.list, rate.list, rate.update, rate.update, room.create, room.create, room.delete, room.delete, room.get, room.get, room.list, room.list, room.update, room.update

2018-10-17 07:59:59,291 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,291 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 07:59:59,311 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,311 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 07:59:59,608 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,608 - INFO - 132:MainThread - zato.hot-deploy.create:156 - Uploaded package id:`252`, payload_name:`login.py`

2018-10-17 07:59:59,610 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,609 - INFO - 132:MainThread - zato.server.base.parallel:226 - Deployed extra services found: [u'login.create', u'login.create', u'login.delete', u'login.delete', u'login.get', u'login.get', u'login.list', u'login.list', u'login.update', u'login.update', u'login.validate', u'login.validate']

2018-10-17 07:59:59,639 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,639 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 07:59:59,659 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,658 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 07:59:59,901 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,900 - INFO - 132:MainThread - zato.hot-deploy.create:156 - Uploaded package id:`253`, payload_name:`room.py`

2018-10-17 07:59:59,902 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,902 - INFO - 132:MainThread - zato.server.base.parallel:226 - Deployed extra services found: [u'room.create', u'room.create', u'room.delete', u'room.delete', u'room.get', u'room.get', u'room.list', u'room.list', u'room.update', u'room.update']

2018-10-17 07:59:59,930 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,929 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 07:59:59,949 DEBG 'zato-server2' stdout output:
2018-10-17 07:59:59,949 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 08:00:00,193 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,192 - INFO - 132:MainThread - zato.hot-deploy.create:156 - Uploaded package id:`254`, payload_name:`rate.py`

2018-10-17 08:00:00,194 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,194 - INFO - 132:MainThread - zato.server.base.parallel:226 - Deployed extra services found: [u'rate.create', u'rate.create', u'rate.delete', u'rate.delete', u'rate.get', u'rate.get', u'rate.list', u'rate.list', u'rate.update', u'rate.update']

2018-10-17 08:00:00,225 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,225 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 08:00:00,245 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,245 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 08:00:00,618 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,617 - INFO - 132:MainThread - zato.hot-deploy.create:156 - Uploaded package id:`255`, payload_name:`booking.py`

2018-10-17 08:00:00,619 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,618 - INFO - 132:MainThread - zato.server.base.parallel:226 - Deployed extra services found: [u'booking.cancel', u'booking.cancel', u'booking.create', u'booking.create', u'booking.delete', u'booking.delete', u'booking.get', u'booking.get', u'booking.list', u'booking.list', u'booking.locate', u'booking.locate', u'booking.restore', u'booking.restore', u'booking.update', u'booking.update']

2018-10-17 08:00:00,649 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,648 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 08:00:00,668 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:00,668 - INFO - 132:MainThread - zato.hot-deploy.create:376 - Creating tar archive

2018-10-17 08:00:01,007 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:01,007 - INFO - 132:MainThread - zato.hot-deploy.create:156 - Uploaded package id:`256`, payload_name:`guest.py`

2018-10-17 08:00:01,009 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:01,008 - INFO - 132:MainThread - zato.server.base.parallel:226 - Deployed extra services found: [u'guest.bookings', u'guest.bookings', u'guest.create', u'guest.create', u'guest.delete', u'guest.delete', u'guest.get', u'guest.get', u'guest.list', u'guest.list', u'guest.restore', u'guest.restore', u'guest.update', u'guest.update']

2018-10-17 08:00:01,035 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:01,034 - INFO - 132:MainThread - zato.server.base.parallel:477 - First worker of `server2` is 132

2018-10-17 08:00:01,047 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:01,047 - INFO - 132:DummyThread-3 - zato.helpers.input-logger:885 - {u'impl_name': u'zato.server.service.internal.helpers.InputLogger', u'name': u'zato.helpers.input-logger', u'cid': '5c1f3a5dafb46e6cac03db8c', u'invocation_time': datetime.datetime(2018, 10, 17, 8, 0, 1, 46627), u'job_type': None, u'data_format': None, u'slow_threshold': 99999, u'request.payload': 'Sample payload for a startup service (first worker)', u'wsgi_environ': {u'zato.request_ctx.async_msg': Bunch(action='101802', channel='startup-service', cid='5c1f3a5dafb46e6cac03db8c', msg_type='0001', payload='Sample payload for a startup service (first worker)', service='zato.helpers.input-logger'), u'zato.request_ctx.fanout_cid': None, u'zato.request_ctx.in_reply_to': None, u'zato.request_ctx.parallel_exec_cid': None}, u'environ': {}, u'usage': 62, u'channel': 'startup-service'}

2018-10-17 08:00:01,247 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:01,247 - INFO - 132:DummyThread-4 - zato.common.ipc.subscriber.Subscriber:100 - Established sub/bind to ipc:///tmp/zato-ipc-quickstart_1000480_server2_132 (self.pid: 132)

2018-10-17 08:00:01,446 DEBG 'zato-server2' stdout output:
2018-10-17 08:00:01,446 - INFO - 132:MainThread - zato.server.base.parallel:513 - Started `server2@quickstart-1000480` (pid: 132)

2018-10-17 08:00:04,094 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:04,093 - INFO - 168:MainThread - __main__:61 - Scheduler starting (https://0.0.0.0:31530)

2018-10-17 08:00:04,639 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:04,638 - INFO - 168:DummyThread-1 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19262e50; cb_kwargs:`{u'service': u'zato.stats.summary.create-summary-by-week', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`2`; interval:`<Interval at 0x7fab19279a50; days:`0`; hours:`0`; in_seconds:`600.0`; minutes:`10`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.summary.create-summary-by-week'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 7, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:04,840 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:04,840 - INFO - 168:DummyThread-2 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19279210; cb_kwargs:`{u'service': u'zato.outgoing.sql.auto-ping', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`6`; interval:`<Interval at 0x7fab19279290; days:`0`; hours:`0`; in_seconds:`180.0`; minutes:`3`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.outgoing.sql.auto-ping'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 2, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:05,042 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:05,042 - INFO - 168:DummyThread-3 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab192794d0; cb_kwargs:`{u'service': u'zato.stats.aggregate-by-day', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`10`; interval:`<Interval at 0x7fab19008d10; days:`0`; hours:`0`; in_seconds:`3600.0`; minutes:`60`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.aggregate-by-day'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 47, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:05,245 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:05,244 - INFO - 168:DummyThread-4 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19279690; cb_kwargs:`{u'service': u'zato.stats.aggregate-by-hour', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`3`; interval:`<Interval at 0x7fab19279350; days:`0`; hours:`0`; in_seconds:`600.0`; minutes:`10`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.aggregate-by-hour'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 7, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:05,447 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:05,446 - INFO - 168:DummyThread-5 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19279710; cb_kwargs:`{u'service': u'zato.stats.aggregate-by-minute', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`8`; interval:`<Interval at 0x7fab19279410; days:`0`; hours:`0`; in_seconds:`60.0`; minutes:`0`; seconds:`60`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.aggregate-by-minute'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 0, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:05,649 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:05,648 - INFO - 168:DummyThread-6 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19279810; cb_kwargs:`{u'service': u'zato.stats.aggregate-by-month', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`7`; interval:`<Interval at 0x7fab19279590; days:`0`; hours:`0`; in_seconds:`3600.0`; minutes:`60`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.aggregate-by-month'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 47, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:05,851 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:05,850 - INFO - 168:DummyThread-7 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19279990; cb_kwargs:`{u'service': u'zato.stats.process-raw-times', u'extra': 'max_batch_size=99999'}`; cron_definition:`None`; current_run:`0`; id:`9`; interval:`<Interval at 0x7fab192796d0; days:`0`; hours:`0`; in_seconds:`90.0`; minutes:`0`; seconds:`90`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.process-raw-times'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 1, 8)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:06,053 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:06,052 - INFO - 168:DummyThread-8 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19279b10; cb_kwargs:`{u'service': u'zato.stats.summary.create-summary-by-day', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`4`; interval:`<Interval at 0x7fab19279750; days:`0`; hours:`0`; in_seconds:`600.0`; minutes:`10`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.summary.create-summary-by-day'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 7, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:06,255 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:06,255 - INFO - 168:DummyThread-9 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19279bd0; cb_kwargs:`{u'service': u'zato.stats.summary.create-summary-by-month', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`1`; interval:`<Interval at 0x7fab192798d0; days:`0`; hours:`0`; in_seconds:`3600.0`; minutes:`60`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.summary.create-summary-by-month'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 47, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:06,458 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:06,457 - INFO - 168:DummyThread-10 - zato.scheduler.backend:264 - Job starting `<Job at 0x7fab19284e90; cb_kwargs:`{u'service': u'zato.stats.summary.create-summary-by-year', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`5`; interval:`<Interval at 0x7fab19262e10; days:`0`; hours:`0`; in_seconds:`3600.0`; minutes:`60`; seconds:`0`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.summary.create-summary-by-year'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 47, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:06,659 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:06,658 - INFO - 168:DummyThread-11 - zato.scheduler.backend:466 - Scheduler started

2018-10-17 08:00:38,478 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:38,478 - INFO - 168:DummyThread-5 - zato.scheduler.backend:217 - Job entering main loop `<Job at 0x7fab19279710; cb_kwargs:`{u'service': u'zato.stats.aggregate-by-minute', u'extra': ''}`; cron_definition:`None`; current_run:`0`; id:`8`; interval:`<Interval at 0x7fab19279410; days:`0`; hours:`0`; in_seconds:`60.0`; minutes:`0`; seconds:`60`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.aggregate-by-minute'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 0, 38)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

2018-10-17 08:00:38,482 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:00:38,482 - INFO - 168:DummyThread-12 - zato.scheduler.backend:419 - Job executed `zato.stats.aggregate-by-minute`, `{u'interval_in_seconds': 60.0, u'name': u'zato.stats.aggregate-by-minute', u'current_run': 1, u'cid': '2c1d02567433789d416b2192', u'start_time': '2018-10-17T08:00:38', u'max_repeats_reached': False, u'max_repeats': None, u'cb_kwargs': {u'service': u'zato.stats.aggregate-by-minute', u'extra': ''}, u'type': u'interval_based', u'id': 8}`

2018-10-17 08:01:08,898 DEBG 'zato-scheduler' stdout output:
2018-10-17 08:01:08,897 - INFO - 168:DummyThread-7 - zato.scheduler.backend:217 - Job entering main loop `<Job at 0x7fab19279990; cb_kwargs:`{u'service': u'zato.stats.process-raw-times', u'extra': 'max_batch_size=99999'}`; cron_definition:`None`; current_run:`0`; id:`9`; interval:`<Interval at 0x7fab192796d0; days:`0`; hours:`0`; in_seconds:`90.0`; minutes:`0`; seconds:`90`>`; is_active:`True`; keep_running:`True`; max_repeats:`None`; max_repeats_reached:`False`; max_repeats_reached_at:`None`; name:`u'zato.stats.process-raw-times'`; old_name:`None`; start_time:`datetime.datetime(2018, 10, 17, 8, 1, 8)`; type:`u'interval_based'`; wait_iter_cb:`None`; wait_iter_cb_args:`()`; wait_sleep_time:`1`>`

Left one of the servers down:

jsabater@kasumi:~$ docker exec -u 0 -it 511a3594b3dc bash
root@kasumi:/opt/zato/env/qs-1# su - zato
zato@kasumi:~$ ps x -u zato   
  PID TTY      STAT   TIME COMMAND
   11 pts/0    S      0:00 /bin/bash /opt/zato/zato_start_web_admin
   12 pts/0    Sl     0:00 redis-server *:6379
   15 pts/0    S      0:00 /bin/bash /opt/zato/zato_start_server2
   18 pts/0    S      0:00 /bin/bash /opt/zato/zato_start_scheduler
   19 pts/0    S      0:00 /bin/bash /opt/zato/zato_start_load_balancer
   27 pts/0    S      0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/server2 --fg
   30 pts/0    S      0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/web-admin --fg
   31 pts/0    S      0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/load-balancer --fg
   69 pts/0    Sl     0:02 /opt/zato/3.0/code/bin/python -m zato.admin.main sync_internal=FalseZATO_ZATO_ZATOsecret_key=ZATO_ZATO_ZATOfg=True
   70 pts/0    Sl     0:02 gunicorn: master [gunicorn]
   85 pts/0    S      0:01 /opt/zato/3.0/code/bin/python -m zato.agent.load_balancer.main /opt/zato/env/qs-1/load-balancer/config/repo sync_internal=FalseZATO_ZATO_ZATOsecret_key=ZATO_ZATO_ZATOfg=True
  130 ?        Ss     0:00 haproxy -D -f /opt/zato/env/qs-1/load-balancer/config/repo/zato.config -p /opt/zato/env/qs-1/load-balancer/pidfile
  132 pts/0    Sl     0:17 gunicorn: worker [gunicorn]
  134 pts/1    S      0:00 -su
  159 pts/0    S      0:01 /opt/zato/3.0/code/bin/python2.7 /opt/zato/current/bin/zato start /opt/zato/env/qs-1/scheduler --fg
  168 pts/0    Sl     0:02 /opt/zato/3.0/code/bin/python -m zato.scheduler.main sync_internal=FalseZATO_ZATO_ZATOsecret_key=ZATO_ZATO_ZATOfg=True
  183 pts/1    R+     0:00 ps x -u zato

As we can see, it was server1 in this case that could not start.